请教 selenium+Chrome 爬网页的问题 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
saximi
V2EX    Python

请教 selenium+Chrome 爬网页的问题

  •  
  •   saximi 2017-09-30 23:38:13 +08:00 5679 次点击
    这是一个创建于 2937 天前的主题,其中的信息可能已经有所发展或是发生改变。
    from selenium import webdriver from selenium.webdriver.remote.webelement import WebElement url="http://buy.ccb.com/searchproducts/pv_0_0_0_0_000.jhtml?query=*&selectCatId=12001001&catId=12001001&isBH=false&area=" driver =webdriver.PhantomJS(executable_path=r'D:\phantomjs\bin\phantomjs.exe') #语句 1 driver.get(url) pg=driver.find_element_by_xpath('//div[@class="main"]/div[@class="right"]/div[@class="page"]/a[@click][contains(text(),"下一页")]').click() 上面的代码用于对 url 下方页码控件中的“下一页”按钮进行点击,以页码为 1 的网页为例,下一页按钮对应的元素是 <div class="main">下面的<div class="right">下面的<div class="page">下面的<a Onclick="queryListByPage('2')">下一页</a> 代码使用 PhantomJS 浏览器可以正常运行。 但是如果改用 Chrome 浏览器,即把语句 1 改写为 driver =webdriver.Chrome() 后就报出现如下错误: Traceback (most recent call last): File "d:\Python3\t1.py", line 37, in <module> pg=driver.find_element_by_xpath('//div[@class="main"]/div[@class="right"]/div[@class="page"]/a[@onclick][contains(text(),"下一页")]').click() File "D:\Python3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 78, in click self._execute(Command.CLICK_ELEMENT) File "D:\Python3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 499, in _execute return self._parent.execute(command, params) File "D:\Python3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 297, in execute self.error_handler.check_response(response) File "D:\Python3\lib\site-packages\selenium\webdriver\remote\errorhandler..py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: Element <a Onclick="queryListByPage('2')">...</a> is not clickable at point (1056, 589). Other element would receive the click: <dt>...</dt> (Session info: chrome=60.0.3112.113) (Driver info: chromedriver=2.32.498550 (9dec58e66c31bcc53a9ce3c7226f0c1c5810906a),platform=Windows NT 6.1.7601 SP1 x86) 为何改用 Chrome 浏览器就会提示这个元素不可点击呢?请大家指点,谢谢! 
    12 条回复    2017-10-11 19:59:33 +08:00
    O14
        1
    O14  
       2017-10-01 10:51:43 +08:00 via Android
    kerberos
        2
    kerberos  
       2017-10-02 16:28:46 +08:00
    你也得指定 executable_path=‘’
    woshichuanqilz
        3
    woshichuanqilz  
       2017-10-02 18:27:07 +08:00
    把你的 chrome 源码发下我看看
    woshichuanqilz
        4
    woshichuanqilz  
       2017-10-02 18:59:12 +08:00
    # -*- coding: utf-8 -*-
    from selenium import webdriver
    from selenium.webdriver.remote.webelement import WebElement
    import time

    url="http://buy.ccb.com/searchproducts/pv_0_0_0_0_000.jhtml?query=*&selectCatId=12001001&catId=12001001&isBH=false&area="

    driver =webdriver.Chrome()

    driver.get(url)

    driver.maximize_window()
    lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    match=False
    while(match==False):
    lastCount = lenOfPage
    time.sleep(3)
    lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    if lastCount==lenOfPage:
    match=True

    driver.find_element_by_css_selector('body > div.main > div.right > div.page > a:nth-child(8)').click()


    我试了这样的代码也不行
    mlyy
        5
    mlyy  
       2017-10-04 11:30:34 +08:00 via iPad
    shn7798
        6
    shn7798  
       2017-10-07 13:51:22 +08:00
    > selenium.common.exceptions.WebDriverException: Message: unknown error: Element <a Onclick="queryListByPage('2')">...</a> is not clickable at point (1056, 589). Other element would receive the click: <dt>...</dt>

    看着错误提示应该是元素层叠并且被遮盖了,可能 chrome 对网页还原的比较好。。。
    不过也有可能窗口太小了,元素都挤在一起了,试试调整一下串口大小。
    saximi
        7
    saximi  
    OP
       2017-10-09 23:16:00 +08:00
    @shn7798 请问如何调整窗口大小,我用了语句 driver.maximize_window() ,结果报错说窗口已经是最大了。
    saximi
        8
    saximi  
    OP
       2017-10-09 23:21:48 +08:00
    @woshichuanqilz 不好意思,我贴出的代码中 URL 有错,正确的应该是 http://buy.ccb.com/searchproducts/pv_0_0_0_0_1.jhtml?query=*&selectCatId=12001001&catId=12001001&isBH=false&area=

    但是这个地址还是无法用 Chrome()来爬,依然说元素无法 click
    saximi
        9
    saximi  
    OP
       2017-10-09 23:48:11 +08:00
    @woshichuanqilz 把 URL 改为正确的 http:// buy.ccb.com/searchproducts/pv_0_0_0_0_1.jhtml?query=*&selectCatId=12001001&catId=12001001&isBH=false&area=
    这样之后您的代码可以正常运行了,但是我的代码还是提示元素不能 click,请问是怎么回事呢
    saximi
        10
    saximi  
    OP
       2017-10-10 19:35:12 +08:00
    @mlyy 非常感谢,用帖子中提到的某个方法解决了。问题的原因是对于 Chrome 浏览器,当访问的页面大到需要拖动滚动条才能完全浏览时,对于未拖动滚动条时没出现的元素,是不能 click 的。我觉得这应该是 Chrome 对开发者不够友好的地方吧。
    woshichuanqilz
        11
    woshichuanqilz  
       2017-10-11 08:08:19 +08:00
    @saximi 我的代码里面有拖动这个页面为什么也不行呢?
    saximi
        12
    saximi  
    OP
       2017-10-11 19:59:33 +08:00
    @woshichuanqilz 我试了你的代码是可以执行的,出错的原因是我主贴中给出的 URL 是错误的,你使用这个正确的 URL 再看看? buy 之前的空格要去掉
    http:// buy.ccb.com/searchproducts/pv_0_0_0_0_1.jhtml?query=*&selectCatId=12001001&catId=12001001&isBH=false&area=
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     1203 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 24ms UTC 17:31 PVG 01:31 LAX 10:31 JFK 13:31
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86