不同的 post value 应该得到不同的页面结果, 得到的结果却是一样的, 请教爬虫问题 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautifl Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
woshichuanqilz

不同的 post value 应该得到不同的页面结果, 得到的结果却是一样的, 请教爬虫问题

  •  
  •   woshichuanqilz Jul 1, 2020 2074 views
    This topic created in 2125 days ago, the information mentioned may be changed or developed.

    我想要抓取这个网站的数据, http://ouhe.aiball365.com/league-center/detail?leagueId=31

    每次点击页面的时候, 后台会看到一个 post 请求, "http://backend.aiball365.com/web/leagueSummaryWeb"

    我把 header 和 data 复制出来, 模拟一个请求的时候。 post data 为: {"channel":"web","os":"browser","leagueId":"31","season":"2019-2020","round":2}

    这个 round 是根据比赛轮数变化的, 因为每一页都是一个新的轮数, 所以也可以认为一页修改一个 round 值。

    我写的代码是这样的

    import requests import json url = "http://backend.aiball365.com/web/leagueSummaryWeb" headers = { 'Accept': 'application/json', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-US;q=0.7,ja;q=0.6', 'Content-Length': '69', 'Content-Type': 'application/json;charset=', 'Host': 'backend.aiball365.com', 'Origin': 'http://ouhe.aiball365.com', 'Proxy-Connection': 'keep-alive', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36', } for i in range(1, 3): data = {"channel": "web", "os": "browser", "leagueId": "31", "season": "2019-2020", "round": i} respOnse= requests.get(url, headers=headers, data=json.dumps(data)) with open('{}.txt'.format(i), 'w+', encoding='utf-8') as the_file: the_file.write(response.text) 

    我这个代码应该获取到第一页和第二页的, 但是我实际上获取到的是第 32 页的而且两次获取的内容一样请问是怎么回事?

    11 replies    2020-07-02 08:50:19 +08:00
    ClericPy
        1
    ClericPy  
       Jul 1, 2020
    要么把 for 循环放在 with open 里面
    要么把 'w+', 改成 'a'

    不要想当然以为 w+ 就是追加

    甚至... 你 print 出来都会发现是变的
    woshichuanqilz
        2
    woshichuanqilz  
    OP
       Jul 1, 2020
    @ClericPy
    ```
    data = {"channel": "web", "os": "browser", "leagueId": "31", "season": "2019-2020", "round": 1}
    respOnse= requests.get(url, headers=headers, data=json.dumps(data))
    with open('{}.txt'.format(1), 'w+', encoding='utf-8') as the_file:
    the_file.write(response.text)

    data = {"channel": "web", "os": "browser", "leagueId": "31", "season": "2019-2020", "round": 2}
    respOnse= requests.get(url, headers=headers, data=json.dumps(data))
    with open('{}.txt'.format(2), 'w+', encoding='utf-8') as the_file:
    the_file.write(response.text)
    ```
    我这边这么弄的两次结果都是一样的
    ClericPy
        3
    ClericPy  
       Jul 1, 2020
    @woshichuanqilz 对不住, 眼花了看错, 你没错

    你这个请求是点击赛程赛果的 英超 第 x 轮 的吗, 我抓包不是那个... 甚至没找到你这个请求, 只看到个 leagueMatchRoundWeb 的
    ClericPy
        4
    ClericPy  
       Jul 1, 2020
    把你的 URL 换成
    url = "http://backend.aiball365.com/web/leagueMatchRoundWeb"
    试试

    如果提交 json 的时候, 其实参数里直接 json = dict_data 就可以了, requests 会帮你转 JSON
    woshichuanqilz
        5
    woshichuanqilz  
    OP
       Jul 1, 2020 via Android
    @ClericPy 我直接 f12 看的 我问下你这个链接怎么看到的? 用的什么抓包工具? 你不是看的开发者页面吗
    woshichuanqilz
        6
    woshichuanqilz  
    OP
       Jul 1, 2020 via Android
    @ClericPy 我就是在网页上上直接点击一个页面比如 31 然后开发者页面网络那块就出来我说的这个 post 连接了
    ClericPy
        7
    ClericPy  
       Jul 1, 2020
    @woshichuanqilz 我是看 chrome 开发者工具里 Network 啊, 你的我反而看不到, 可能不同系统? 我 win10 chrome 最新
    woshichuanqilz
        8
    woshichuanqilz  
    OP
       Jul 1, 2020 via Android
    @ClericPy 能不能发下你的代码谢谢
    ClericPy
        9
    ClericPy  
       Jul 2, 2020   1
    https://paste.ubuntu.com/p/j23fZqPnqV/
    就是你的代码改了 url 啊... 没区别都
    summerwar
        10
    summerwar  
       Jul 2, 2020 via iPhone   1
    到底是 post 还是 get ?前面说是 post 请求,代码里直接 requests.get
    krixaar
        11
    krixaar  
       Jul 2, 2020
    同上,url 只看到一个 leagueMatchRoundWeb,换了 url 一切正常
    About     Help     Advertise     Blog     API     FAQ     Solana     958 Online   Highest 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 35ms UTC 22:46 PVG 06:46 LAX 15:46 JFK 18:46
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86