Gain
github 地址: https://github.com/gaojiuli/gain/
gain 是为了让每大家能够轻松编写 python 爬虫, 它使用了 asyncio, uvloop 和 aiohttp.
准备
- Python3.5+
安装
pip install gain
用法
Write spider.py:
from gain import Css, Item, Parser, Spider class Post(Item): title = Css('.entry-title') cOntent= Css('.entry-content') async def save(self): with open('scrapinghub.txt', 'a+') as f: f.writelines(self.results['title'] + '\n') class MySpider(Spider): start_url = 'https://blog.scrapinghub.com/' parsers = [Parser('https://blog.scrapinghub.com/page/\d+/'), Parser('https://blog.scrapinghub.com/\d{4}/\d{2}/\d{2}/[a-z0-9\-]+/', Post)] MySpider.run() run python spider.py

案例
案例在 /example/ 目录下.
github 地址: https://github.com/gaojiuli/gain/
