爬虫工作 4~5 个小时,就报错了,不明白什么原因导致的,帮忙看一下 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
wsds

爬虫工作 4~5 个小时,就报错了,不明白什么原因导致的,帮忙看一下

  •  
  •   wsds Jun 13, 2018 8730 views
    This topic created in 2877 days ago, the information mentioned may be changed or developed.

    报错很长,但看上去大概是这个原因:socket.gaierror: [Errno -3] Temporary failure in name resolution

    阿里云上运行的

    Traceback (most recent call last): File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn (self.host, self.port), self.timeout, **extra_kw) File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 67, in create_connection for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM): File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -3] Temporary failure in name resolution During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen body=body, headers=headers) File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 354, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/lib/python3.5/http/client.py", line 1106, in request self._send_request(method, url, body, headers) File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request self.endheaders(body) File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders self._send_output(message_body) File "/usr/lib/python3.5/http/client.py", line 934, in _send_output self.send(msg) File "/usr/lib/python3.5/http/client.py", line 877, in send self.connect() File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 162, in connect cOnn= self._new_conn() File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn self, "Failed to establish a new connection: %s" % e) requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send timeout=timeout File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen _stacktrace=sys.exc_info()[2]) File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 273, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='http://www.xiangshu.com/', port=80): Max retries exceeded with url: http://www.xiangshu.com/3603751.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)) During handling of the above exception, another excepton occurred: Traceback (most recent call last): File "getimg.py", line 102, in <module> GetImg().getdata() File "getimg.py", line 76, in getdata base_url + j['href'], headers=self.headers) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 480, in get return self.request('GET', url, **kwargs) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send r = adapter.send(request, **kwargs) File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='http://www.xiangshu.com/', port=80): Max retries exceeded with url: http://www.xiangshu.com/3603751.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)) 
    21 replies    2018-06-14 17:07:44 +08:00
    golmic
        1
    golmic  
       Jun 13, 2018 via Android
    是所有的都报错还是偶尔有报错? 像是触发反爬
    wsds
        2
    wsds  
    OP
       Jun 13, 2018
    @golmic 基本爬几个小时就会报这个错
    wsds
        3
    wsds  
    OP
       Jun 13, 2018
    @golmic 才爬了 1 万张不到
    lululau
        4
    lululau  
       Jun 13, 2018
    像是域名解析偶发抽风
    xxxy
        5
    xxxy  
       Jun 13, 2018
    dns 也有频率限制的
    golmic
        6
    golmic  
       Jun 13, 2018
    @lululau #4 解析出错会报 DNS 错误吧

    大量报错就处理一下反爬,偶尔报的话重试就行
    Cooky
        7
    Cooky  
       Jun 13, 2018
    换个好点的 dns ?
    lerry
        8
    lerry  
       Jun 13, 2018
    本地装个 dnsmasq 配置成系统默认 DNS, 可以改善 dns 查询
    baday
        9
    baday  
       Jun 13, 2018
    请求头 connection 设置为 close 试试
    wsds
        10
    wsds  
    OP
       Jun 13, 2018
    @lululau 网上查了些,说是这么回事
    wsds
        11
    wsds  
    OP
       Jun 13, 2018
    @Cooky 好点的是哪种?
    wsds
        12
    wsds  
    OP
       Jun 13, 2018
    @lerry 这是阿里云上
    ihancheng
        13
    ihancheng  
       Jun 13, 2018 via Android
    不想吐槽套路云了,正在学 python 爬虫,我用腾讯云就没问题,阿里云抛异常死活解决不了…… 不知道是不是自己的问题,但是我在网上找了方法还是无法解决。
    owenliang
        14
    owenliang  
       Jun 13, 2018 via Android
    异常是可以捕获的
    wsds
        15
    wsds  
    OP
       Jun 13, 2018
    @owenliang 这个已经是捕获后又抛出的了,你没看到 n 个 another exception occurred
    Cooky
        16
    Cooky  
       Jun 13, 2018
    @wsds 阿里云不能装 dnsmasq ?
    hicdn
        17
    hicdn  
       Jun 13, 2018
    DNS 解析问题。如果爬的是几个固定域名,改 hosts 文件。
    dapengzhao
        18
    dapengzhao  
       Jun 13, 2018
    我的爬虫运行一段时间也会报这个错我的解决方法时如果 ip 不被封就捕获这个异常睡一会然后在 while true 下 break 结束此次循环重新开始。
    gamecreating
        19
    gamecreating  
       Jun 13, 2018
    异常 捕获一下 处理吧...

    爬虫本来就不能保证全部连接成功 爬取成功
    JCZ2MkKb5S8ZX9pq
        20
    JCZ2MkKb5S8ZX9pq  
       Jun 13, 2018
    自己写个 request,把 requests 包进去,常用的异常处理重试随机 ua 自动代理等等的都包进去,一劳永逸。
    beforeuwait
        21
    beforeuwait  
       Jun 14, 2018
    Failed to establish a new connection
    遇到这种问题,写个 try except,报错休息 20 秒,再请求
    About     Help     Advertise     Blog     API     FAQ     Solana     5466 Online   Highest 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 63ms UTC 03:17 PVG 11:17 LAX 20:17 JFK 23:17
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86