Scrapy 中 xpath 用到中文报错

推荐学习书目

Learn Python the Hard Way

Python Sites

PyPI - Python Package Index

http://diveintopython.org/toc/index.html

Pocoo

值得关注的项目

PyPy

Celery

Jinja2

Read the Docs

gevent

pyenv

virtualenv

Sentry

Shovel

Pyflakes

pytest

Python 编程

pep8 Checker

Styles

PEP 8

Google Python Style Guide

Code Style from The Hitchhiker's Guide

This topic created in 3249 days ago, the information mentioned may be changed or developed.

问题描述

links = sel.xpath('//i[contains(@title,"置顶")]/following-sibling::a/@href').extract()

报错：ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

xpath

报错

links

unicode

4 replies

revotu

Jun 28, 2017

参见文章：[解决 Scrapy 中 xpath 用到中文报错问题][1]

## 解决方法 ##
方法一：将整个 xpath 语句转成 Unicode
```Python
links = sel.xpath(u'//i[contains(@title,"置顶")]/following-sibling::a/@href').extract()
```
方法二：xpath 语句用已转成 Unicode 的 title 变量
```Python
title = u"置顶"
links = sel.xpath('//i[contains(@title,"%s")]/following-sibling::a/@href' %(title)).extract()
```
方法三：直接用 xpath 中变量语法(`$`符号加变量名)`$title`, 传参 title 即可
```Python
links = sel.xpath('//i[contains(@title,$title)]/following-sibling::a/@href', title="置顶").extract()
```

[1]: http://www.revotu.com/solve-unicode-erros-using-xpath-in-scrapy.html

bsns

Jun 28, 2017

我一般是加 u

mingyun

Jun 28, 2017

@revotu nice

NaVient

Jun 29, 2017

独立爬虫项目，请用 py3