How to find and replace special characters utf-8 in Python?

advertisements

I'm a Python beginner, and I have a utf-8 problem.

I have a utf-8 string and I would like to replace all german umlauts with ASCII replacements (in German, u-umlaut 'ü' may be rewritten as 'ue').

u-umlaut has unicode code point 252, so I tried this:

>>> str = unichr(252) + 'ber'
>>> print repr(str)
u'\xfcber'
>>> print repr(str).replace(unichr(252), 'ue')
u'\xfcber'

I expected the last string to be u'ueber'.

What I ultimately want to do is replace all u-umlauts in a file with 'ue':

import sys
import codecs
f = codecs.open(sys.argv[1],encoding='utf-8')
for line in f:
    print repr(line).replace(unichr(252), 'ue')

Thanks for your help! (I'm using Python 2.3.)


repr(str) returns a quoted version of str, that when printed out, will be something you could type back in as Python to get the string back. So, it's a string that literally contains \xfcber, instead of a string that contains über.

You can just use str.replace(unichr(252), 'ue') to replace the ü with ue.

If you need to get a quoted version of the result of that, though I don't believe you should need it, you can wrap the entire expression in repr:

repr(str.replace(unichr(252), 'ue'))