在网上下了一个 python 爬虫程序,怎么运行? - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
grey5659

在网上下了一个 python 爬虫程序,怎么运行?

  •  1
     
  •   grey5659 Jul 1, 2016 7193 views
    This topic created in 3590 days ago, the information mentioned may be changed or developed.

    豆瓣图书爬虫程序

    Supplement 1    Jul 2, 2016
    我换 linux 环境现在可以运行了,运行$ python doubanSpider.py 后一直在下载,是什么意思额?
    /usr/local/lib/python2.7/dist-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

    To get rid of this warning, change this:

    BeautifulSoup([your markup])

    to this:

    BeautifulSoup([your markup], "html.parser")

    markup_type=markup_type))
    Downloading Information From Page 1
    Downloading Information From Page 2
    Downloading Information From Page 3
    Downloading Information From Page 4
    Downloading Information From Page 5
    Downloading Information From Page 6
    WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
    Downloading Information From Page 7
    Downloading Information From Page 8
    Downloading InformationFrom Page 9
    Downloading Information From Page 10
    Downloading Information From Page 11
    Downloading Information From Page 12
    Downloading Information From Page 13
    Downloading Information From Page 14
    Downloading Information From Page 15
    Downloading Information From Page 16
    Downloading Information From Page 17
    Downloading Information From Page 18
    Downloading Information From Page 19
    Downloading Information From Page 20
    Downloading Information From Page 21
    Downloading Information From Page 22
    Downloading Information From Page 23
    Downloading Information From Page 24
    15 replies    2016-07-03 20:15:38 +08:00
    upczww
        1
    upczww  
       Jul 1, 2016 via Smartisan T1
    没有代码别人怎么帮你?
    grey5659
        2
    grey5659  
    OP
       Jul 1, 2016
    AnonymousID
        3
    AnonymousID  
       Jul 1, 2016 via Android
    难道不是直接运行那个 py 文件?
    grey5659
        4
    grey5659  
    OP
       Jul 1, 2016
    @AnonymousID 不会吧?
    AnonymousID
        5
    AnonymousID  
       Jul 1, 2016 via Android
    @grey5659 明明就是啊,不就那一个文件可以执吗
    AnonymousID
        6
    AnonymousID  
       Jul 1, 2016 via Android
    楼上漏了个 行 字
    niboy
        7
    niboy  
       Jul 1, 2016
    首先要安装 python ,到 python.org 上下载安装,然后双击运行 python 文件或者 python ***.py
    grey5659
        8
    grey5659  
    OP
       Jul 1, 2016
    @niboy 安装了 双击一闪而过,改名成 doubanSpider.pyw 用 IDLE 打开后,运行 run module 提示 Traceback (most recent call last):
    File "C:\Users\lenovo\Desktop\DouBanSpider-master\doubanSpider.pyw", line 7, in <module>
    import requests
    ImportError: No module named requests
    niboy
        9
    niboy  
       Jul 1, 2016
    @grey5659
    你缺少 requests 依赖包。。 http://blog.csdn.net/alpha5/article/details/24964009

    其他的,你自己想办法解决吧,比如下面这些。。
    import numpy as np
    from bs4 import BeautifulSoup
    from openpyxl import Workbook
    upczww
        10
    upczww  
       Jul 1, 2016
    就一个文件,直接运行就好了
    ksupertu
        11
    ksupertu  
       Jul 1, 2016
    安装 python2.7 然后 cmd 运行 pip install requests ,省事的话就虚拟个 ubuntu 去运行,不然 windows 各种 bug 让你抓狂
    luyuncheng
        12
    luyuncheng  
       Jul 1, 2016
    你不应该先学学 python 入门?
    grey5659
        13
    grey5659  
    OP
       Jul 2, 2016
    @luyuncheng 只是当工具用 用
    grey5659
        14
    grey5659  
    OP
       Jul 2, 2016
    @niboy @ksupertu
    我换 linux 环境现在可以运行了,运行$ python doubanSpider.py 后一直在下载,是什么意思额?
    /usr/local/lib/python2.7/dist-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

    To get rid of this warning, change this:

    BeautifulSoup([your markup])

    to this:

    BeautifulSoup([your markup], "html.parser")

    markup_type=markup_type))
    Downloading Information From Page 1
    Downloading Information From Page 2
    Downloading Information From Page 3
    Downloading Information From Page 4
    Downloading Information From Page 5
    Downloading Information From Page 6
    WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
    Downloading Information From Page 7
    Downloading Information From Page 8
    Downloading Information From Page 9
    Downloading Information From Page 10
    Downloading Information From Page 11
    Downloading Information From Page 12
    Downloading Information From Page 13
    Downloading Information From Page 14
    Downloading Information From Page 15
    Downloading Information From Page 16
    Downloading Information From Page 17
    Downloading Information From Page 18
    Downloading Information From Page 19
    Downloading Information From Page 20
    Downloading Information From Page 21
    Downloading Information From Page 22
    Downloading Information From Page 23
    Downloading Information From Page 24
    ksupertu
        15
    ksupertu  
       Jul 3, 2016 via Android
    没什么大问题,就是 beautifulsoup 这个库报了个警告,因为没显式指定 html 解析器,爬虫已经在工作了
    About     Help     Advertise     Blog     API     FAQ     Solana     2646 Online   Highest 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 72ms UTC 11:41 PVG 19:41 LAX 04:41 JFK 07:41
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86