Think before you speak, read before you think.

python encode decode编码

utf8编码的涵义

UTF-8 is one of the most commonly used encodings. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit numbers are used in the encoding.

早在1968年,ASCII代码发表了,代表了0到127的字母数字,但仍表示不了其他国家的字母,1980s之后,处理器变为8-bit,变为了0-255,后来为为了16-bit,说明2^16 = 65,536。之后utf-8出现了。

可以用type或isinstance来判断变量是什么类型

>>> s = '杨'
>>> type(s)
<type 'str'>
>>> isinstance(s, str)
True
>>> isinstance(s, unicode)
False

如果前面加一个u符号指定用unicode编码

>>> a = u'杨'
>>> type(a)
<type 'unicode'>
>>> isinstance(a, str)
False
>>> isinstance(a, unicode)
True

python encode unicode编码

>>> str = "上海"
 >>> print str
 上海
 >>> print data.encode("unicode_escape")
 \\u4ea4\\u6362\\u673a
 >>> print data.encode("raw_unicode_escape")
 \u4ea4\u6362\u673a

python decode unicode编码

>>> data = "\u4ea4\u6362\u673a"
>>> type(data)
<type 'str'>
 >>> print data.decode('unicode_escape')
 交换机
 当字符串本身有\时,使用
 >>> print data.decode('raw_unicode_escape')
 交换机

参考文档:

https://docs.python.org/2/howto/unicode.html


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *