I'm a Python beginner, and I have a utf-8 problem.
I have a utf-8 string and I would like to replace all german umlauts with ASCII replacements (in German, u-umlaut 'ü' may be rewritten as 'ue').
u-umlaut has unicode code point 252, so I tried this:
>>> str = unichr(252) + 'ber' >>> print repr(str) u'\xfcber' >>> print repr(str).replace(unichr(252), 'ue') u'\xfcber'
I expected the last string to be
What I ultimately want to do is replace all u-umlauts in a file with 'ue':
import sys import codecs f = codecs.open(sys.argv,encoding='utf-8') for line in f: print repr(line).replace(unichr(252), 'ue')
Thanks for your help! (I'm using Python 2.3.)
repr(str) returns a quoted version of
str, that when printed out, will be something you could type back in as Python to get the string back. So, it's a string that literally contains
\xfcber, instead of a string that contains
You can just use
str.replace(unichr(252), 'ue') to replace the
If you need to get a quoted version of the result of that, though I don't believe you should need it, you can wrap the entire expression in