Python Unicode Encode Decode Issue

advertisements

Lets take a simple variable -

var =  u' \u2013 2'

Lets try decoding it -

var.decode('utf-8')

I get the following error -

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 7: ordinal not in range(128)

Lets try encoding it -

var.encode('utf-8')

I get the following error -

'ascii' codec can't decode byte 0xe2 in position 8: ordinal not in range(128)

One solution is to do -

sys.setdefaultencoding('utf-8')

Let me know, what others are doing?


Lets try decoding [a Unicode string]

You decode bytes to Unicode. You encode Unicode to bytes.

You cannot decode a unicode string.

If you try, Python tries to help you out by automatically converting the Unicode string to something it can decode: a byte string. As this is implicit, it uses the default encoding for your platform, which is ASCII. ASCII can't encode U+2013 so you have an error.

(With hindsight, this attempt at “do what I mean” behaviour was a mistake. Python 3 no longer allows it.)

I get 'ascii' codec can't decode byte 0xe2 in position 8: ordinal not in range(128)

You're doing something else there you haven't shown us, then, because encoding works fine:

>>> u' \u2013 2'.encode('utf-8')
' \xe2\x80\x93 2'

One solution is to do sys.setdefaultencoding('utf-8')

This was never a proper solution to anything, which is why Python takes some steps to prevent you doing it.