Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Parquet V2 page reading when max def/rep levels are 0 #9939

Closed
wants to merge 1 commit into from

Conversation

yingsu00
Copy link
Collaborator

@yingsu00 yingsu00 commented May 27, 2024

The existing code did not skip the definition and repetition level arrays if the maxDefine_ and maxRepeat_ are both zero but these arrays were actually more than 0 bytes. This commit fixes this problem and always uses pageHeader.data_page_header_v2.definition_levels_byte_length and pageHeader.data_page_header_v2.repetition_levels_byte_length as the array lengths because these values are guaranteed to exist.

Fixes #9924

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 27, 2024
Copy link

netlify bot commented May 27, 2024

Deploy Preview for meta-velox ready!

Name Link
🔨 Latest commit 01d536b
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6656b47349b4f900081ca4ea
😎 Deploy Preview https://deploy-preview-9939--meta-velox.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

The existing code did not skip the definition and repetition level
arrays if the maxDefine_ and maxRepeat_ are both zero but these arrays
were actually more than 0 bytes. This commit fixes this problem and
always uses pageHeader.data_page_header_v2.definition_levels_byte_length
and pageHeader.data_page_header_v2.repetition_levels_byte_length as
the array lengths because these values are guranteed to exist.
@yingsu00 yingsu00 force-pushed the ParquetV2PageFix branch from b2200f1 to 01d536b Compare May 29, 2024 04:52
@yingsu00 yingsu00 requested review from majetideepak and Yuhta May 29, 2024 21:10
@yingsu00 yingsu00 marked this pull request as ready for review May 29, 2024 21:10
@facebook-github-bot
Copy link
Contributor

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Yuhta merged this pull request in 0b4f3e2.

Copy link

Conbench analyzed the 1 benchmark run on commit 0b4f3e27.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

karteekmurthys pushed a commit to karteekmurthys/velox that referenced this pull request Jun 4, 2024
…cubator#9939)

Summary:
The existing code did not skip the definition and repetition level arrays if the `maxDefine_` and `maxRepeat_` are both zero but these arrays were actually more than 0 bytes. This commit fixes this problem and always uses `pageHeader.data_page_header_v2.definition_levels_byte_length` and `pageHeader.data_page_header_v2.repetition_levels_byte_length` as the array lengths because these values are guaranteed to exist.

Fixes facebookincubator#9924

Pull Request resolved: facebookincubator#9939

Reviewed By: pedroerp

Differential Revision: D57975496

Pulled By: Yuhta

fbshipit-source-id: d9a8dde1973942a3b9cc5212db24e38c2ff3983d
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…cubator#9939)

Summary:
The existing code did not skip the definition and repetition level arrays if the `maxDefine_` and `maxRepeat_` are both zero but these arrays were actually more than 0 bytes. This commit fixes this problem and always uses `pageHeader.data_page_header_v2.definition_levels_byte_length` and `pageHeader.data_page_header_v2.repetition_levels_byte_length` as the array lengths because these values are guaranteed to exist.

Fixes facebookincubator#9924

Pull Request resolved: facebookincubator#9939

Reviewed By: pedroerp

Differential Revision: D57975496

Pulled By: Yuhta

fbshipit-source-id: d9a8dde1973942a3b9cc5212db24e38c2ff3983d
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…cubator#9939)

Summary:
The existing code did not skip the definition and repetition level arrays if the `maxDefine_` and `maxRepeat_` are both zero but these arrays were actually more than 0 bytes. This commit fixes this problem and always uses `pageHeader.data_page_header_v2.definition_levels_byte_length` and `pageHeader.data_page_header_v2.repetition_levels_byte_length` as the array lengths because these values are guaranteed to exist.

Fixes facebookincubator#9924

Pull Request resolved: facebookincubator#9939

Reviewed By: pedroerp

Differential Revision: D57975496

Pulled By: Yuhta

fbshipit-source-id: d9a8dde1973942a3b9cc5212db24e38c2ff3983d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged parquet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet PageReader incorrectly skips rep/def levels when the max values are 0
3 participants