Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle trailing padding when skipping repetition levels (#3911) #4319

Merged
merged 2 commits into from
Jun 1, 2023

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #3911

Rationale for this change

The length of RLE encoded data is ambiguous if the final block is bit packed. As such the decoder might think it has more level data than it actually has when skipping repetition levels, which conceivably could cause the decoder to get out of sync

What changes are included in this PR?

Are there any user-facing changes?

No, these APIs are not public

@github-actions github-actions bot added the parquet Changes to the parquet crate label May 31, 2023
@tustvold
Copy link
Contributor Author

for _ in 0..10 {
let mut rng = thread_rng();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this into the for loop makes it more likely to encounter issues inherent to the RLE data, as opposed to the selection of it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make sense to add a note to the test giving the rationale for the loop (and maybe name the test 'test_ski-_fuzz` or something

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I really understand the guts of the parquet decoder, but I reviewed the test coverage and that looks solid to me. 👍 Thanks for the cleanup @tustvold

Also thank you for the improvements to readability (comments and more descriptive variable names)

for _ in 0..10 {
let mut rng = thread_rng();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make sense to add a note to the test giving the rationale for the loop (and maybe name the test 'test_ski-_fuzz` or something

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I really understand the guts of the parquet decoder, but I reviewed the test coverage and that looks solid to me. 👍 Thanks for the cleanup @tustvold

Also thank you for the improvements to readability (comments and more descriptive variable names)

@tustvold tustvold merged commit 0baf99a into apache:master Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent Parquet test_skip Failures
2 participants