主要问题在这:
for txt in list_: txts = txt.get_text() #在这里 print(txts), 结果还是完整的 download_run(title1, title2, title3, title4, txts) #在这里 print(txts), 数据只剩最后一个段落了
代码如下: def download(href_urls): for url in href_urls: mod_titles = [] ses = requests.session() html = ses.get(url, headers = header(), verify = False) soup = BeautifulSoup(html.content, 'html.parser') title_list = soup.find(class_ = 'g-ctnBar').find_all('a') title1 = title_list[2].get_text() title2 = title_list[3].get_text() title3 = title_list[4].get_text() title4 = title_list[5].get_text() list_ = soup.find_all('div', class_ = 'detail-mod J_floor')[:-3] for txt in list_: txts = txt.get_text() download_run(title1, title2, title3, title4, txts)
def download_run(title1, title2, title3, title4, txts): path = 'C:/Users/Desktop/run/%s/%s/%s' %(title1, title2, title3) if not os.path.exists(path): os.makedirs(path) with open('C:/Users/Desktop/run/%s/%s/%s/%s.txt' %(title1, title2, title3, title4), 'w')as f: f.write(txts)
![]() | 1 chaneyccy OP 排版有点乱,更新一下 def download(href_urls): for url in href_urls: mod_titles = [] ses = requests.session() html = ses.get(url, headers = header(), verify = False) soup = BeautifulSoup(html.content, 'html.parser') title_list = soup.find(class_ = 'g-ctnBar').find_all('a') title1 = title_list[2].get_text() title2 = title_list[3].get_text() title3 = title_list[4].get_text() title4 = title_list[5].get_text() list_ = soup.find_all('div', class_ = 'detail-mod J_floor')[:-3] for txt in list_: txts = txt.get_text() download_run(title1, title2, title3, title4, txts) def download_run(title1, title2, title3, title4, txts): path = 'C:/Users/Desktop/run/%s/%s/%s' %(title1, title2, title3) if not os.path.exists(path): os.makedirs(path) with open('C:/Users/Desktop/run/%s/%s/%s/%s.txt' %(title1, title2, title3, title4), 'w')as f: f.write(txts) |
![]() | 2 JCZ2MkKb5S8ZX9pq 2020-01-14 15:24:21 +08:00 via iPhone ``` 你的代码 ``` 这样可以保留模式。回复时无效。详情可查 markdown 格式。 |
![]() | 3 JCZ2MkKb5S8ZX9pq 2020-01-14 15:24:47 +08:00 via iPhone 格式 |
![]() | 4 chaneyccy OP @JCZ2MkKb5S8ZX9pq 好的,平时没有用 markdown 写内容的习惯~ 我去研究下 |
![]() | 5 cxyfreedom 2020-01-14 15:31:05 +08:00 你遍历循环又每次写入,你循环完成后,txts 本来就只有一部分的数据,写入到文件中当然就只有最后一部分。 |
![]() | 6 Vegetable 2020-01-14 15:31:27 +08:00 |
![]() | 7 Vegetable 2020-01-14 15:31:58 +08:00 ![]() @cxyfreedom #5 代码我读了,是因为每次都 open(file,'w')写入的原因。 |
8 Wuuuu 2020-01-14 15:32:56 +08:00 感觉应该是 txts = txts+"\n"+txt.get_txt()? |
9 Wuuuu 2020-01-14 15:35:13 +08:00 @Vegetable py 靠缩进……这样的不知道到底是 for txt in list_: txts = txt.get_text() download_run(title1, title2, title3, title4, txts) 还是 for txt in lis_: txts = txt.get_text() download_run(title1, title2, title3, title4, txts) 但大概率是第二种写法吧。 |
10 Wuuuu 2020-01-14 15:35:53 +08:00 for txt in list_: \t txts = txt.get_text() \t download_run(title1, title2, title3, title4, txts) for txt in list_: \t txts = txt.get_text() download_run(title1, title2, title3, title4, txts) |
![]() | 11 sDG9xz87SqqCC3mN 2020-01-14 15:37:00 +08:00 把 w 改成 a |
![]() | 12 zhejiangblue 2020-01-14 15:37:09 +08:00 打开模式用 a,w 会重新创建文件的 |
![]() | 14 cxyfreedom 2020-01-14 15:40:50 +08:00 @Vegetable 如果 for 循环下面的话那就是这个问题。追加写入不能用 w。按这个代码逻辑不如一次性写入呢 |
![]() | 15 vincexu 2020-01-14 15:49:38 +08:00  |
![]() | 16 JCZ2MkKb5S8ZX9pq 2020-01-14 17:23:51 +08:00 via iPhone 如果是 aw 的问题 建议拼接 txt 然后 w a 万一重复执行会重复写入 |
![]() | 17 cherbim 2020-01-14 17:39:08 +08:00 你文件写入的是 w,每次都会重写,用 a 追加啊 |
![]() | 18 shunfa52000 2020-01-15 12:03:16 +08:00 上次我程序结尾没有加 f.close,结果循环次数多的时候报错了 |