-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeltaBitPackEncoder Pads Miniblock BitWidths With Arbitrary Values #1416
Comments
tustvold
added a commit
to tustvold/arrow-rs
that referenced
this issue
Mar 10, 2022
Ignore non-zero padded bit widths in DeltaBitPackDecoder (apache#1417)
tustvold
added a commit
to tustvold/arrow-rs
that referenced
this issue
Mar 10, 2022
Ignore non-zero padded bit widths in DeltaBitPackDecoder (apache#1417)
tustvold
added a commit
to tustvold/arrow-rs
that referenced
this issue
Mar 10, 2022
Ignore non-zero padded bit widths in DeltaBitPackDecoder (apache#1417)
tustvold
added a commit
to tustvold/arrow-rs
that referenced
this issue
Mar 10, 2022
Ignore non-zero padded bit widths in DeltaBitPackDecoder (apache#1417)
alamb
pushed a commit
that referenced
this issue
Mar 14, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
https://github.com/apache/arrow-rs/blob/master/parquet/src/encodings/encoding.rs#L577 skips over the miniblock bit widths, and then only goes back and writes a value for the miniblocks that contain a non-zero number of values. The empty miniblocks are left with whatever value happens to be in the encoder's buffer.
To Reproduce
This is one of the underlying bugs behind apache/datafusion#1976
Expected behavior
Whilst the specification technically allows for arbitrary padding, it seems like a good idea to avoid non-deterministic output where possible
The text was updated successfully, but these errors were encountered: