Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8486: ParquetDecodingException: could not read bytes at offset #2898

Merged
merged 1 commit into from
Apr 10, 2024

Conversation

rymarm
Copy link
Member

@rymarm rymarm commented Mar 28, 2024

DRILL-8486: fix handling of long variable length entries during bulk parquet reading

Description

Drill, during a bulk reading of a parquet file, unproperly handles a long value of parquet file entry. Drill reads the value, but after he finds that he can’t handle the value in the current batch, he just moves on, without persisting the read value. Since the value wasn’t pushed back to the reader object, the total read and left-to-read records counts are now in unproper state which causes data reading to fail in the future.

This issue hasn’t been faced before, because the conditions to get into this state are rare.

Solution

Push back the value to the reader object to read it in the next iteration, if the current batch can’t hold it.

Documentation

-

Testing

Manual testing with a parquet file from the Jira ticket: DRILL-8486. It's hard to reproduce this particular issue with random data.

@rymarm rymarm added the bug label Mar 28, 2024
@rymarm rymarm requested review from cgivre and jnturton March 28, 2024 18:35
@cgivre cgivre added minor-update backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there labels Mar 31, 2024
Copy link
Contributor

@cgivre cgivre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@cgivre cgivre merged commit af7cfcd into apache:master Apr 10, 2024
8 checks passed
jnturton pushed a commit to jnturton/drill that referenced this pull request Apr 12, 2024
jnturton pushed a commit that referenced this pull request Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there bug minor-update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants