I'm a Python beginner, and I have a utf-8 problem.
I have a utf-8 string and I would like to replace all german umlauts with ASCII replacements (in German, u-umlaut 'ü' may be rewritten as 'ue').
u-umlaut has unicode code point 252, so I tried this:
>>> str = unichr(252) + 'ber'
>>> print repr(str)
u'\xfcber'
>>> print repr(str).replace(unichr(252), 'ue')
u'\xfcber'
I expected the last string to be u'ueber'
.
What I ultimately want to do is replace all u-umlauts in a file with 'ue':
import sys
import codecs
f = codecs.open(sys.argv[1],encoding='utf-8')
for line in f:
print repr(line).replace(unichr(252), 'ue')
Thanks for your help! (I'm using Python 2.3.)
repr(str)
returns a quoted version of str
, that when printed out, will be something you could type back in as Python to get the string back. So, it's a string that literally contains \xfcber
, instead of a string that contains über
.
You can just use str.replace(unichr(252), 'ue')
to replace the ü
with ue
.
If you need to get a quoted version of the result of that, though I don't believe you should need it, you can wrap the entire expression in repr
:
repr(str.replace(unichr(252), 'ue'))