Javascript 爬虫方案有推荐的吗? - V2EX
dcsuibian

Javascript 爬虫方案有推荐的吗?

  •  
  •   dcsuibian Oct 19, 2021 4281 views
    This topic created in 1669 days ago, the information mentioned may be changed or developed.

    目前在模仿一个别的网站练手,模拟数据比较困难,就想着爬点下来。(纯练手,非商业用途)

    之前用的 Scrapy,挺好用的,但是自己本身已经对 js 、ts 比较熟悉了,而且不太喜欢 python 。

    所以就想问问有么有什么 Javascript 的替代方案之类的?比如相关的框架之类的?支持 ts 更好

    10 replies    2021-10-19 16:00:01 +08:00
    veike
        1
    veike  
       Oct 19, 2021 via Android
    puppeteer ?
    gavingeng
        2
    gavingeng  
       Oct 19, 2021
    微软的 playwright,团队就是原先的 puppeteer,于 2019 跳到 ms
    unclemcz
        3
    unclemcz  
       Oct 19, 2021
    crawler
    rust
        4
    rust  
       Oct 19, 2021
    直接走 CDP 协议
    mxT52CRuqR6o5
        5
    mxT52CRuqR6o5  
       Oct 19, 2021
    (axios/got/其他 http 请求库)+cheerio
    puppeteer/playwright
    iiqiu
        6
    iiqiu  
       Oct 19, 2021
    puppeteer
    ntdll
        7
    ntdll  
       Oct 19, 2021   4
    不知当讲不当讲,cloudflare workers 去爬其他用了 cf 的网站,直接穿透 waf 。看起来是 cf 对自己的 IP 做了白名单处理。免费日 10 万次调用也是非常的良心。
    zhuzongxing
        8
    zhuzongxing  
       Oct 19, 2021
    我是用的比较土的方法,axios 加 cheerio
    xiangyuecn
        9
    xiangyuecn  
       Oct 19, 2021
    直接用 XMLHttpRequest 快的一逼,手撸。。,,,主要是因为别的工具也不会,写其他代码没有 js 简单
    dcsuibian
        10
    dcsuibian  
    OP
       Oct 19, 2021   1
    感谢各位的回复
    自己也去调研了下,目前我的观点是靠 axios 、cheerio,以后可能会用 playwright
    axios 用过很多次了,cheerio 处理 dom 。
    puppeteer 、playwright 拓宽了我的知识面,非常有兴趣但暂时用不到(目前只抓静态页面)。以后要用的话倾向于 playwright,主要看中跨平台和微软出品( TypeScript )
    node-crawler 听人说似乎停止维护了。
    About     Help     Advertise     Blog     API     FAQ     Solana     2964 Online   Highest 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 52ms UTC 13:10 PVG 21:10 LAX 06:10 JFK 09:10
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86