Python Unicode Encode Error ordinal not in range & lt; 128 & gt; with Euro Sign

advertisements

I have to read an XML file in Python and grab various things, and I ran into a frustrating error with Unicode Encode Error that I couldn't figure out even with googling.

Here are snippets of my code:

#!/usr/bin/python
# coding: utf-8
from xml.dom.minidom import parseString
with open('data.txt','w') as fout:
   #do a lot of stuff
   nameObj = data.getElementsByTagName('name')[0]
   name = nameObj.childNodes[0].nodeValue
   #... do more stuff
   fout.write(','.join((name,bunch of other stuff))

This spectacularly crashes when a name entry I am parsing contains a Euro sign. Here is the error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 60: ordinal not in range(128)

I understand why Euro sign will screw it up (because it's at 128, right?), but I thought doing # coding: utf-8 would fix that. I also tried adding .encode(utf-8) so that the name looks instead like

name = nameObj.childNodes[0].nodeValue.encode(utf-8)

But that doesn't work either. What am I doing wrong? (I am using Python 2.7.3 if anyone wants to know)

EDIT: Python crashes out on the fout.write() line -- it will go through fine where the name field is like:

<name>United States, USD</name>

But will crap out on name fields like:

<name>France, € </name>


when you are opening a file in python using the open built-in function you will always read the file in ascii. To access it in another encoding you have to use codecs:

import codecs
fout = codecs.open('data.txt','w','utf-8')