jpuyy的网志

Think before you speak, read before you think.

python encode decode编码

Sep 12, 2014

—

utf8编码的涵义

UTF-8 is one of the most commonly used encodings. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit numbers are used in the encoding.

早在1968年，ASCII代码发表了，代表了0到127的字母数字，但仍表示不了其他国家的字母，1980s之后，处理器变为8-bit，变为了0-255，后来为为了16-bit，说明2^16 = 65,536。之后utf-8出现了。

可以用type或isinstance来判断变量是什么类型

>>> s = '杨'
>>> type(s)
<type 'str'>
>>> isinstance(s, str)
True
>>> isinstance(s, unicode)
False

如果前面加一个u符号指定用unicode编码

>>> a = u'杨'
>>> type(a)
<type 'unicode'>
>>> isinstance(a, str)
False
>>> isinstance(a, unicode)
True

python encode unicode编码

>>> str = "上海"
 >>> print str
 上海
 >>> print data.encode("unicode_escape")
 \\u4ea4\\u6362\\u673a
 >>> print data.encode("raw_unicode_escape")
 \u4ea4\u6362\u673a

python decode unicode编码

>>> data = "\u4ea4\u6362\u673a"
>>> type(data)
<type 'str'>
 >>> print data.decode('unicode_escape')
 交换机
 当字符串本身有\时，使用
 >>> print data.decode('raw_unicode_escape')
 交换机

参考文档:

https://docs.python.org/2/howto/unicode.html

jpuyy的网志

python encode decode编码

Comments

Leave a Reply