Coroutines

New in version 2.0.

Scrapy has partial support for thecoroutine syntax.

Warning

asyncio support in Scrapy is experimental. Future Scrapyversions may introduce related API and behavior changes without adeprecation period or warning.

Supported callables

The following callables may be defined as coroutines using async def, andhence use coroutine syntax (e.g. await, async for, async with):

The following are known caveats of the current implementation that we aimto address in future versions of Scrapy:

  • The callback output is not processed until the whole callback finishes.

As a side effect, if the callback raises an exception, none of itsoutput is processed.

If you need to output multiple items or requests and you are usingPython 3.5, return an iterable (e.g. a list) instead.

Usage

There are several use cases for coroutines in Scrapy. Code that wouldreturn Deferreds when written for previous Scrapy versions, such as downloadermiddlewares and signal handlers, can be rewritten to be shorter and cleaner:

  1. class DbPipeline:
  2. def _update_item(self, data, item):
  3. item['field'] = data
  4. return item
  5.  
  6. def process_item(self, item, spider):
  7. dfd = db.get_some_data(item['id'])
  8. dfd.addCallback(self._update_item, item)
  9. return dfd

becomes:

  1. class DbPipeline:
  2. async def process_item(self, item, spider):
  3. item['field'] = await db.get_some_data(item['id'])
  4. return item

Coroutines may be used to call asynchronous code. This includes othercoroutines, functions that return Deferreds and functions that returnawaitable objects such as Future. This means you can usemany useful Python libraries providing such code:

  1. class MySpider(Spider):
  2. # ...
  3. async def parse_with_deferred(self, response):
  4. additional_response = await treq.get('https://additional.url')
  5. additional_data = await treq.content(additional_response)
  6. # ... use response and additional_data to yield items and requests
  7.  
  8. async def parse_with_asyncio(self, response):
  9. async with aiohttp.ClientSession() as session:
  10. async with session.get('https://additional.url') as additional_response:
  11. additional_data = await r.text()
  12. # ... use response and additional_data to yield items and requests

Note

Many libraries that use coroutines, such as aio-libs, require theasyncio loop and to use them you need toenable asyncio support in Scrapy.

Common use cases for asynchronous code include:

  • requesting data from websites, databases and other services (in callbacks,pipelines and middlewares);
  • storing data in databases (in pipelines and middlewares);
  • delaying the spider initialization until some external event (in thespider_opened handler);
  • calling asynchronous Scrapy methods like ExecutionEngine.download (seethe screenshot pipeline example).