[pygtk] Problem in fetching Unicode from URL and displaying it in PyGTK widget

Bertrand Kintanar b3rxkintanar at gmail.com
Fri Jul 17 19:44:31 WST 2009

On 7/17/09 6:52 PM, John Finlay wrote:
> I misunderstood what you wanted. I thought you just wanted to save the 
> html file contents into the DB and were having a problem with the text 
> encoding between the html and the DB but it sounds like you want to do 
> something different.
> John
ok let me put it this way. lets say i have a string variable which i get 
from reading from an html file.

data = '&#xE1'

i want it be able to convert the above string to

data = u'\xE1'

in order for me to just

print data.encode('utf-8')

and I can get its correct value which is


if i do a data.replace('&#', '\\') it will put two backslashes on to the 
string instead of only one. And if i just put one backslash, it will 
spit an error since backslash is an escape character. why does python 
treat backslash as an escape character but when used in replace string 
method, it doesn't escape the other backslash?

if i do the above command and print data i get

'\\xE1;' instead of just '\xE1'

so is there a specific way of converting this?

