
如下,我想获取 a 标签下的文本,aaabbbccc 作为列表一个值,而不是["aaa","bbb","ccc"],该如何处理呢?
from lxml import etree html_str=''' <span class="til"> <a href="http://www.xxxx.com"> "aaa" <br> "bbb" "ccc" <br> </a> </span> ''' html = etree.HTML(html_str) cOntent= html.xpath('//a/text()') print(content) """ output: ['\n "aaa"\n ', '\n "bbb"\n "ccc"\n ', '\n '] """ 1 ch2 2021-03-24 16:14:32 +08:00 改用 BeautifulSoup,取 node.text |
2 QuinceyWu 2021-03-24 16:28:28 +08:00 price = [x.strip() for x in content if x.strip() != ''] str1 = price[1].replace(" ", "").replace("\n", '').replace('"', "") str2 = price[0].replace('"', '') print(str2+str1) |
3 meiyoumingzi6 2021-03-24 16:32:24 +08:00 列表都拿到了, 拼起来不就好了? |
4 mekingname 2021-03-24 16:35:27 +08:00 cOntent= ''.join(x.strip() for x in html.xpath('//a/text()')) |
5 polarpy 2021-03-24 16:41:29 +08:00 拿出来的值替换换行跟空格 |
6 mrleohe 2021-03-24 16:48:05 +08:00 ''.join([i.strip() for i in ''.join(html.xpath('//a/text()')).split('"') ]) |
7 CLCLCLCLCL 2021-03-25 12:04:46 +08:00 html = etree.HTML(html_str) cOntent= html.xpath('string(//a)') 直接用 string 就行 |
8 2bin OP @CLCLCLCLCL 试了下,貌似只能提取第一个 a 标签的,有多个 a 后面不知道怎么提取出来 |
9 zyb201314 2021-03-26 00:31:45 +08:00 via Android #这样? html = etree.HTML(html_str) lst=[] for a in html.xpath('//span//a'): cOntent= a.xpath('.//text()') l=''.join("".join(content).split()).replace('"',"") lst.append(l) print(lst) |
10 CLCLCLCLCL 2021-03-26 11:07:34 +08:00 @2bin 是的, 循环一下 a 标签就行, 看你想用哪个了 |
11 dongxiao 2021-03-26 15:36:17 +08:00 html.xpath("string(//a)") |
12 2bin OP |