-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect encoding detection #41
Comments
Hi there, It looks like you need to specify the encoding in a config file. Just add the
where I’m not 100% sure what’s happening from your description. If you attach the first few lines of the file I can try it out for myself. Perhaps this is a system default? This is all handled by Python’s csv module. The config allows you to supply an explicit encoding to use. Please let me know if this helps. Regards |
FWIW, Google turned up this description of how to determine the file encoding that might be worth trying. |
At the current version (0.3.0) csv-reconcile doesn't try to guess the encoding of a CSV file. There is a separate python library for that called chardet. It may be worth a try to guess the encoding of a file when no user specific encoding is given. There is also the csv.Sniffer class that helps detecting the correct delimiter without relying on user parameters for every deviation from the defaults. |
@b2m - Thanks for the tips. I’ll have a look. I don’t believe the last release did either unless something changed in Python’s csv module. |
Exactly, the comment was meant as tips for improvement of the usability of csv-reconcile to avoid most of the csv encoding/reading problems =) |
Oh, thank you! I wasn't aware of this config option. I guess this closes this issue.
The file is not large, I am attaching it whole in its original form (i.e., before I re-encoded it). Perhaps the right question here is, what is the default encoding csv-reconcile is expecting in the .tsv file? For me, it was apparently cp-1250, but this must be wrong for the vast majority of files, so could it be platform-dependent? |
Thanks. I’ll try to work in @b2m’s tips to auto-discover the encoding, but for now you should be able to use the override. Would you please let me know if you’re up and running using the override so I can close this issue? Also, thanks for the file. I’ll use it to test the suggested features. |
@jmacura FWIW, I did check that the tsv in the file you posted is using |
@gitonthescene Great, thank you for this improvement! Beside that, I can confirm that appending a line |
FYSA: Windows uses the I had solved it before I saw the above issue/solution, by opening the file in the text editor, and saving the reps.tsv file as 'UTF-8 with BOM'. However, I expect changing the configuration file as suggested in the issue is a more robust and lasting solution. |
FWIW, this has gone out in the latest release. |
Hello,
at first, let me thank you for this great reconciliation tool!
I've been trying to use csv-reconcile with the csv-reconcile-geo plugin and I am not confident, where the error comes from so feel free to direct me elsewhere, if the problem does not occur at your site.
So, the problem was that the "budovy_wdqs.tsc" file I was providing was using UTF-8 encoding, while the program apparently expect it to be in cp-1250 for some reason. When I have resaved the .tsv in cp-1250, the bug went gone.
The text was updated successfully, but these errors were encountered: