-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues with character encoding in non-UTF-8 environments #36
Comments
The failure occurs at line 200 of |
We had addressed this issue in the new release 2.0.2 by providing explicit UTF-8 encoding in all open text file functions (both write and read). In addition in |
By having non-ASCII characters in the Host range files host_range_literature_plasmidDB.csv, host_range_ncbirefseq_plasmidDB.csv, this means that any stray print statements involving the data from these files has a chance to error out due to non-ascii characters. I have stripped the non-ascii characters from both files and we will need to update the code to properly address non-ascii characters at run time for all input data to be safe. |
I did not realize about potential issues of printing non-ascii characters on systems with non-unicode encodings ... I've tried to re-create such behaviour in a Linux container with POSIX locale but so far did not see any issues during the tests. The caveat was that these host range tests did not target hits with non-ascii characters (e.g. IncI1-Iγ). I would like to create a couple of test functions that would test correct behaviour of the host range database in a non-ascii environments. Issue: Partial solution: I've used Also in
I've patched both files and submitted commit |
closing this for now but will re-open if it pops up again |
If LANG on the system is not UTF-8 and the output contains special characters, then mob_recon fails to read the mob_typer files with the following error 'UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 824: ordinal not in range(128)' . This does not occur when LC_ALL=UTF-8 is explicitly set for the environment. All MOB-suite functions which read data from disk should check character encoding and handle these cases.
The text was updated successfully, but these errors were encountered: