从redis启动Spider

scrapy_redis.spiders下有两个类RedisSpider和RedisCrawlSpider,能够使spider从Redis读取start_urls

spider从redis中读取要爬的start_urls,然后执行爬取,若爬取过程中返回更多的request url,那么它会继续进行直至所有的request完成之后,再从redis start_urls中读取下一个url,循环这个过程

RedisSpider

examplemycrawler_redis.py举例

  • 运行
  1. scrapy runspider example/spiders/myspider_redis.py
  • push urls to redis:
  1. redis-cli lpush myspider:start_urls http://baidu.com

RedisCrawlSpider

examplemycrawler_redis.py举例

  • run the spider:
  1. scrapy runspider example/spiders/mycrawler_redis.py
  • push urls to redis:
  1. redis-cli lpush mycrawler:start_urls http://baidu.com