We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug
I have a tool that reads CSV files using DataFusion. I have a file with an invalid UTF-8 character, apparently.
With DataFusion 16.1.0 (Arrow 29), I get this error:
Arrow error: Parser error: Error parsing line 27: Error(Utf8 { pos: Some(Position { byte: 3458, line: 28, record: 27 }), err: Utf8Error { field: 14, valid_up_to: 1 } })"))
With DataFusion 17.0.0 (Arrow 31), I get this error:
ArrowError("underlying Arrow error: Csv error: Encountered invalid UTF-8 data: invalid utf-8 sequence of 1 bytes from index 2911"))
The first error is much more helpful because it gives me the line and field numbers. I was quickly able to locate the bad character.
To Reproduce
Try reading a CSV file with invalid UTF-8 characters (perhaps try reading some random binary file).
Expected behavior
Additional context
The text was updated successfully, but these errors were encountered:
Include line and field number in CSV UTF-8 error (apache#3656)
d51b829
Include line and field number in CSV UTF-8 error (#3656) (#3657)
7deb358
* Include line and field number in CSV UTF-8 error (#3656) * Additional test case
label_issue.py automatically added labels {'arrow'} from #3657
label_issue.py
Sorry, something went wrong.
tustvold
Successfully merging a pull request may close this issue.
Describe the bug
I have a tool that reads CSV files using DataFusion. I have a file with an invalid UTF-8 character, apparently.
With DataFusion 16.1.0 (Arrow 29), I get this error:
With DataFusion 17.0.0 (Arrow 31), I get this error:
The first error is much more helpful because it gives me the line and field numbers. I was quickly able to locate the bad character.
To Reproduce
Try reading a CSV file with invalid UTF-8 characters (perhaps try reading some random binary file).
Expected behavior
Additional context
The text was updated successfully, but these errors were encountered: