Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in CSV reader error handling #3656

Closed
andygrove opened this issue Feb 3, 2023 · 1 comment · Fixed by #3657
Closed

Regression in CSV reader error handling #3656

andygrove opened this issue Feb 3, 2023 · 1 comment · Fixed by #3657
Assignees
Labels
arrow Changes to the arrow crate bug help wanted

Comments

@andygrove
Copy link
Member

andygrove commented Feb 3, 2023

Describe the bug

I have a tool that reads CSV files using DataFusion. I have a file with an invalid UTF-8 character, apparently.

With DataFusion 16.1.0 (Arrow 29), I get this error:

Arrow error: Parser error: Error parsing line 27: Error(Utf8 { pos: Some(Position { byte: 3458, line: 28, record: 27 }), err: Utf8Error { field: 14, valid_up_to: 1 } })"))

With DataFusion 17.0.0 (Arrow 31), I get this error:

ArrowError("underlying Arrow error: Csv error: Encountered invalid UTF-8 data: invalid utf-8 sequence of 1 bytes from index 2911"))

The first error is much more helpful because it gives me the line and field numbers. I was quickly able to locate the bad character.

To Reproduce

Try reading a CSV file with invalid UTF-8 characters (perhaps try reading some random binary file).

Expected behavior

Additional context

@andygrove andygrove added the bug label Feb 3, 2023
@tustvold tustvold self-assigned this Feb 3, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Feb 3, 2023
tustvold added a commit that referenced this issue Feb 4, 2023
* Include line and field number in CSV UTF-8 error (#3656)

* Additional test case
@tustvold tustvold added the arrow Changes to the arrow crate label Feb 10, 2023
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #3657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants