爬虫小白求教如何得到东方财富股吧第一页的全部阅读数? - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
yellowtail

爬虫小白求教如何得到东方财富股吧第一页的全部阅读数?

  •  
  •   yellowtail Oct 10, 2019 2596 views
    This topic created in 2393 days ago, the information mentioned may be changed or developed.

    http://guba.eastmoney.com/list,600519.html

    要开始学些什么。。

    13 replies    2019-10-10 17:03:53 +08:00
    soho176
        1
    soho176  
       Oct 10, 2019
    python 正则,再或者简单的办法 火车头 直接抓取
    di1012
        2
    di1012  
       Oct 10, 2019
    正则匹配,xpath
    biu7
        3
    biu7  
       Oct 10, 2019


    xpath 正则
    None123
        4
    None123  
       Oct 10, 2019
    requests 获取网页
    xpath / re 解析
    silencefent
        5
    silencefent  
       Oct 10, 2019
    //div[@id='articlelistnew']//div/span[@class="l1 a1"]
    yellowtail
        6
    yellowtail  
    OP
       Oct 10, 2019
    还是不太明白,,用过 selenium,通过 findbyname 定位到“一个”标签,然后输入用户民密码,登录,refresh。。给论坛刷积分 但是这种的没看出来怎么定位到“一个”。。。各位大佬能不能简单写个栗子 主要想学会怎么看这个问题
    yellowtail
        7
    yellowtail  
    OP
       Oct 10, 2019
    @silencefent 这可以拿出来第一页的全部目标元素吗...
    None123
        8
    None123  
       Oct 10, 2019
    @yellowtail

    driver.find_element_by_xpath()
    lspvic
        9
    lspvic  
       Oct 10, 2019 via Android   1
    爬虫可以看看有没有对应的移动版网页,网页简洁许多,好解析,速度快,效率高,甚至有些直接有 api 可用
    yellowtail
        10
    yellowtail  
    OP
       Oct 10, 2019
    @None123 全部阅读数应该是自己一个一个的加。。这样拿到的是一个数组吗
    None123
        11
    None123  
       Oct 10, 2019
    @yellowtail 什么意思?
    houzhimeng
        12
    houzhimeng  
       Oct 10, 2019   1
    from bs4 import BeautifulSoup
    import requests

    html = "http://guba.eastmoney.com/list,600519.html"
    r = requests.get(html).content
    soup = BeautifulSoup(r,"lxml")
    yuedu = soup.find_all('span',{'class':'l1 a1'})
    for i in yuedu:
    print(i.get_text())
    yellowtail
        13
    yellowtail  
    OP
       Oct 10, 2019
    @houzhimeng 感谢
    About     Help     Advertise     Blog     API     FAQ     Solana     3249 Online   Highest 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 39ms UTC 13:21 PVG 21:21 LAX 06:21 JFK 09:21
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86