Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Funcotator - SimpleXsv parser needs improved error handling for different encodings #4006

Closed
jonn-smith opened this issue Dec 21, 2017 · 1 comment
Assignees

Comments

@jonn-smith
Copy link
Collaborator

jonn-smith commented Dec 21, 2017

Currently the SimpleKeyXsvFuncotationFactory needs to be improved to throw better error messages when the encoding of a file is inconsistent.

This is really an issue involving how Files.lines() deals with the encodings. When using the PathLineIterator the encoding issue is not found until calling it.next() and getting a line with inconsistent encodings. This issue is then manifested as a java.nio.charset.MalformedInputException.

This page has some information on a fix:

https://stackoverflow.com/questions/26064689/files-lines-to-skip-broken-lines-in-java8

@jonn-smith
Copy link
Collaborator Author

This (or something similar) is also a problem in XsvLocatableTableCodec.

jonn-smith added a commit that referenced this issue Aug 16, 2018
Added a new catch block in `PathLineIterator` for character encoding
errors, along with a new error message to be given to the user for such
cases.

Fixes #4006
jonn-smith added a commit that referenced this issue Aug 16, 2018
Added a new catch block in `PathLineIterator` for character encoding
errors, along with a new error message to be given to the user for such
cases.

Added unit test for malformed xsv locatable files.

Fixes #4006
jonn-smith added a commit that referenced this issue Aug 16, 2018
Added a new catch block in `PathLineIterator` for character encoding
errors, along with a new error message to be given to the user for such
cases.

Added unit test for malformed xsv locatable files.

Fixes #4006
jonn-smith added a commit that referenced this issue Aug 20, 2018
Added a new catch block in `PathLineIterator` for character encoding
errors, along with a new error message to be given to the user for such
cases.

Added unit test for malformed xsv locatable files.

Fixes #4006
jonn-smith added a commit that referenced this issue Aug 20, 2018
Added a new catch block in `PathLineIterator` for character encoding
errors, along with a new error message to be given to the user for such
cases.

Added unit test for malformed xsv locatable files.

Fixes #4006
jonn-smith added a commit that referenced this issue Aug 22, 2018
* Added new catch block for character encoding error cases.

Added a new catch block in `PathLineIterator` for character encoding
errors, along with a new error message to be given to the user for such
cases.

Added unit test for malformed xsv locatable files.

Fixes #4006
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant