思考要在空白頁: 04/2014

2014/04/05

Coroutine in Tornado Web Framework

Coroutine 可以讓我們在程式中按照自己的意思去安排執行順序，有點像是 jump 的概念，它允許短暫離開 function 並且保留 local variable 的狀態，等到某個時間點再跳回來，從上一次離開的地方繼續。第一次接觸到 coroutine 的概念是從 python，coroutine 是一種語言特性，從 wiki 可以看到很多語言都有這種特性。

那我們沒事幹嘛讓程式跳來跳去的？思考一種狀況，當在 single thread 下，你執行到一個 blocking function，這時候如果讓 CPU 去做其他事情是不是很好，等到 I/O 有回應的，再跳回來原本的地方繼續執行。等等，這不就是 event-driven 的 programming 嗎？它們的表達方式還是有點區別。

舉一個例子，我們發出一個 HTTP request 去抓 Yahoo weather 的資訊，然後利用 XML parser 從回應得資料中取出溫度，以下是利用一般非同步的方式去撰寫：

from tornado.ioloop import IOLoop
from tornado.web import Application, RequestHandler, asynchronous
from tornado.httpclient import AsyncHTTPClient
from xml.dom import minidom

class MainHandler(RequestHandler):
    url = "http://weather.yahooapis.com/forecastrss?w=2306179&u=c"

    @asynchronous
    def get(self):
        http_client = AsyncHTTPClient()
        http_client.fetch(self.url, callback=self._on_fetch)

    def _on_fetch(self, response):
        degree = self._parse_xml(response.body)
        self.finish("Taipei: %d" % degree)

    def _parse_xml(self, xml):
        xml_doc = minidom.parseString(xml)
        weather_list = xml_doc.getElementsByTagName('yweather:condition')
        degree = float(weather_list[0].attributes['temp'].value)
        return degree

if __name__ == "__main__":
    application = Application([
        (r"/", MainHandler),
    ])
    application.listen(8888)
    IOLoop.instance().start()

第 12 行：在發出 request 同時指定 callback
第 14 行：在收到 server 回應後，執行 _on_fetch()

換成 coroutine 的方式

from tornado.ioloop import IOLoop
from tornado.web import Application, RequestHandler, asynchronous
from tornado.httpclient import AsyncHTTPClient
import tornado.gen as gen
from xml.dom import minidom

class MainHandler(RequestHandler):
    url = "http://weather.yahooapis.com/forecastrss?w=2306179&u=c"

    @gen.coroutine
    def get(self):
        http_client = AsyncHTTPClient()
        response = yield http_client.fetch(self.url)
        degree = self._parse_xml(response.body)
        self.finish("Taipei: %d" % degree)

    def _parse_xml(self, xml):
        xml_doc = minidom.parseString(xml)
        weather_list = xml_doc.getElementsByTagName('yweather:condition')
        degree = float(weather_list[0].attributes['temp'].value)
        return degree

if __name__ == "__main__":
    application = Application([
        (r"/", MainHandler),
    ])
    application.listen(8888)
    IOLoop.instance().start()

第 13 行：程式執行完 yield 後面的 statement 這個 function 就會立刻 return，直到 tornado io loop 收到 server 回應，然後跳回第 13 行，把 fetch() 的結果 assign 給 response，然後繼續執行下去。
底層一樣是非同步I/O，但這種表達方式擁有在寫同步 I/O 般的直覺。

單一個 callback 可能顯示不出直覺在哪裡，如果連存取 database/memcach... 任何跟I/O相關的事情都採用非同步方式，那就會需要在 callback 中執行另一個 callback

class MainHandler(RequestHandler):
    @tornado.web.asynchronous
    def get(self):
        req1(argument1, callback=self._res1)

    @tornado.web.asynchronous
    def _res1(self, response1):
        ...do something with response
        req2(argument2, callback=self._res2)

    def _res2(self, response2):
        ...do something with response
        self.finish("result...")

改用 coroutine 的方式

class MainHandler(RequestHandler):    
    @tornado.gen.coroutine
    def get(self):
        response1 = yield req1(argument1)
        ...do something with response1
        response2 = yield req2(argument2)
        ...do something with response2
        self.finish("result...")

是不是直覺很多！