python2 爬虫怎么处理字符串编码？

推荐学习书目

Learn Python the Hard Way

Python Sites

PyPI - Python Package Index

http://diveintopython.org/toc/index.html

Pocoo

值得关注的项目

PyPy

Celery

Jinja2

Read the Docs

gevent

pyenv

virtualenv

Sentry

Shovel

Pyflakes

pytest

Python 编程

pep8 Checker

Styles

PEP 8

Google Python Style Guide

span class="chevron"> Code Style from The Hitchhiker's Guide

This topic created in 3866 days ago, the information mentioned may be changed or developed.

import urllib2 s = urllib2.urlopen('http://www.zhihu.com').read() s.decode('utf-8','ignore')
输出在 windows cmd 和 ubuntu 下面都是乱码，如\u89c1 等等

urllib2

字符串

utf-8

import

14 replies 2015-10-15 22:15:50 +08:00

lixia625

Oct 10, 2015 via Android

使用 python3
233333...

Tink

PRO

Oct 10, 2015 via iPhone

这不是乱码，你在转一次啊

jessynt

Oct 10, 2015

#coding:utf-8
1.最前面加上这段
2.查看文件编码是不是 Utf-8
3.查看网页编码
另外:建议使用 Requests 来做爬虫,或者如楼上所说，使用 Python3

binux

Oct 10, 2015

你需要先去了解下什么是编码，编码不只在 decode 除发生，在输出时也会涉及编码

binux

Oct 10, 2015

1 、所有教你加 coding:utf-8 ，改文件编码的，都是取网上抄代码，不理解为什么的。这两个只影响源代码里面的中文如何解码的问题，源代码里不带或不使用源代码中的字符串，这个设定是没有用的。
2 、我们说，要三码合一，或者三码对应，就是，输入编码，程序（内存）内编码，输出编码；要分别与输入，程序，输出环境编码对应。
3 、最后，你这个根本不是乱码，只是你输出的不是文字的「字面意」而是字符串「表达式」(represent)

tonic

Oct 10, 2015

@binux 宽神好赞! 已经不止一次科普过...

rungo

Oct 10, 2015

print s.decode('utf-8','ignore')

glasslion

Oct 10, 2015

btw, windows cmd 下的乱码可能更不不是 python 的问题， windows 没有一个代码页是支持 utf-8 的

n6DD1A640

Oct 10, 2015

+1024 用 python 3

Delbert

Oct 11, 2015 via Android

Windows cmd 的 code page 是 cp936 ，输出再 encode 为 gkb 就好，不清楚 Ubuntu 的 terminal 是啥编码

zog

Oct 14, 2015

import urllib2
s = urllib2.urlopen('http://www.zhihu.com').read()
print(s) ---->str
a = type(s.decode('utf-8','ignore')) -->unicode

print s
你看看那是不是乱码了？不是了吧。交互环境下调用的是__repr__()。尝试比较 print s.__repr__()和 s.__str__()。

zog

Oct 14, 2015

建议补充交互式环境下输出知识， repr 函数， str 函数，以及什么是 unicode ，什么是 utf8 。

a358003542

Oct 15, 2015

题主的意思主要是指终端显示那块，这个主要参考一下两个网页：

https://docs.python.org/3.4/howto/unicode.html

http://stackoverflow.com/questions/10569438/how-to-print-unicode-character-in-python

我因为对终端具体打印效果不太关心，了解这个问题，但从来没有实际解决过。楼主如果有兴趣，可以看到 wikipedia 这个项目， util 里面就有这个函数：

# from http://stackoverflow.com/questions/3627793/best-output-type-and-encoding-practices-for-repr-functions
def stdout_encode(u, default='UTF8'):
encoding = sys.stdout.encoding or default
if sys.version_info > (3, 0):
return u.encode(encoding).decode(encoding)
return u.encode(encoding)

a358003542

Oct 15, 2015

```python
def test():
pass
```

这网站还不支持代码显示？又不是什么难问题。