Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode Decode Errors #1

Open
matou opened this issue Jan 31, 2011 · 3 comments
Open

Unicode Decode Errors #1

matou opened this issue Jan 31, 2011 · 3 comments

Comments

@matou
Copy link
Owner

matou commented Jan 31, 2011

I still have Unicode Decode Errors when there are non-ascii characters in filenames.

@cmichi
Copy link

cmichi commented Jan 31, 2011

I had these problems too, I think changing the code to this may help you:

 135     print "these files are the same: "
 136     for path in db:
 137         path = unicode(path[0]).encode('utf-8')

But actually it is strange that we have to encode the path back to utf-8 (after storing in in utf-8). I think there has to be a better solution than to convert everything back and forth (collations?).

@matou
Copy link
Owner Author

matou commented Feb 9, 2011

According to [0] the default type of text in a sqlite db ist unicode in python. I thought this should be enough. But apperantly it isn't. Maybe the "utf-8" part is missing. I don't know enough about encoding to be able to judge that. I will test your suggestion (and do further reading of [0]) as soon as I find time.

Thx for your help!

[0] http://docs.python.org/library/sqlite3.html#introduction

@matou
Copy link
Owner Author

matou commented Feb 9, 2011

Using python3 solved some problems. The current one appears when writing to the database (on an openBSD machine):

Traceback (most recent call last):
  File "/home/matou/duplicatefiles.py", line 104, in <module>
    db.execute("INSERT INTO files VALUES(?, ?)", (size, f))
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf1' in position 110: surrogates not allowed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants