-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle trailing padding when skipping repetition levels (#3911) #4319
Handle trailing padding when skipping repetition levels (#3911) #4319
Conversation
for _ in 0..10 { | ||
let mut rng = thread_rng(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this into the for loop makes it more likely to encounter issues inherent to the RLE data, as opposed to the selection of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might make sense to add a note to the test giving the rationale for the loop (and maybe name the test 'test_ski-_fuzz` or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I really understand the guts of the parquet decoder, but I reviewed the test coverage and that looks solid to me. 👍 Thanks for the cleanup @tustvold
Also thank you for the improvements to readability (comments and more descriptive variable names)
for _ in 0..10 { | ||
let mut rng = thread_rng(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might make sense to add a note to the test giving the rationale for the loop (and maybe name the test 'test_ski-_fuzz` or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I really understand the guts of the parquet decoder, but I reviewed the test coverage and that looks solid to me. 👍 Thanks for the cleanup @tustvold
Also thank you for the improvements to readability (comments and more descriptive variable names)
Which issue does this PR close?
Closes #3911
Rationale for this change
The length of RLE encoded data is ambiguous if the final block is bit packed. As such the decoder might think it has more level data than it actually has when skipping repetition levels, which conceivably could cause the decoder to get out of sync
What changes are included in this PR?
Are there any user-facing changes?
No, these APIs are not public