
<table class="tabledataformat" cellspacing="0" > <tr> <td style="vertical-align:top;">Copper, Cu </td> <td class="dataCell" style="vertical-align:top;"><= 0.03 %<span class="dataCondition"></span></td> <td class="dataCell" style="vertical-align:top;"><= 0.03 %<span class="dataCondition"></span></td> <td class="dataComment" style="vertical-align:top;"></td> </tr> </table> response.xpath('//table[@class="tabledataformat"]/tr').extract() 只能获取到
<tr> &l;td style="vertical-align:top;">Copper, Cu </td> <td class="dataCell" style="vertical-align:top;"></td> <td class="dataCell" style="vertical-align:top;"></td> <td class="dataComment" style="vertical-align:top;"></td> </tr> <= 0.03 % 和 消失不见,为什么呢?
1 imn1 Mar 4, 2017 因为<=的写法不符合 xml 标准 |
2 leavic Mar 4, 2017 这部分数据可能是 Javascript 异步请求显示的,也就是 ajax 内容, scrapy 是看不到的。 |
3 dsg001 Mar 4, 2017 ''' <tr> <td style="vertical-align:top;">Copper, Cu </td> <td class="dataCell" style="vertical-align:top;"><= 0.03 %<span class="dataCondition"></span></td> <td class="dataCell" style="vertical-align:top;"><= 0.03 %<span class="dataCondition"></span></td> <td class="dataComment" style="vertical-align:top;"></td> </tr> ''' 测试 lxml 能输出, scrapy 应该也没问题,查看 html 源码吧 |
4 crazypig14 Mar 7, 2017 scrapy 爬下来用 beautifulsoup 处理,我觉得方便些 |