Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading empty Parquet DataPage #10121

Closed
wants to merge 4 commits into from

Conversation

majetideepak
Copy link
Collaborator

@majetideepak majetideepak commented Jun 10, 2024

Resolves the compression issue 7 here #9560

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2024
Copy link

netlify bot commented Jun 10, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 0dc43c9
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/667a74d27e2b1d0008e83cf8

@majetideepak
Copy link
Collaborator Author

@yingsu00 can you take a look at this fix? Thanks!

@majetideepak majetideepak force-pushed the fix-snappy branch 3 times, most recently from ea56126 to fb84bca Compare June 10, 2024 10:38
@majetideepak
Copy link
Collaborator Author

@yingsu00 can you take a look at this PR?

@majetideepak
Copy link
Collaborator Author

@yingsu00, @nmahadevuni Can you review this fix as well?

TEST_F(ParquetReaderTest, testEmptyDataPage) {
const std::string sample(getExampleFilePath("snappy.parquet"));

facebook::velox::dwio::common::ReaderOptions readerOptions{leafPool_.get()};
Copy link
Contributor

@pedroerp pedroerp Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should put the entire test suite under a facebook::velox::dwio::test namespace so you don't need to specify the full path to everything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this to dwio::common::ReaderOptions. There are a couple of using namespace ... on top. This is now consistent with the velox/dwio/dwrf/test/ReaderTest.cpp test. Both are not under any namespace.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned up the remaining as well.

Copy link
Collaborator

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majetideepak Thanks for fixing the bug! It looks good, just one nit.

@@ -1215,3 +1208,24 @@ TEST_F(ParquetReaderTest, testLzoDataPage) {
.str(),
"31232");
}

TEST_F(ParquetReaderTest, testEmptyDataPage) {
const std::string sample(getExampleFilePath("snappy.parquet"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this file to empty_v2datapage.parquet?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Collaborator

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @majetideepak !

@majetideepak majetideepak added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Jun 25, 2024
@facebook-github-bot
Copy link
Contributor

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@bikramSingh91 merged this pull request in 06c8a0c.

Copy link

Conbench analyzed the 1 benchmark run on commit 06c8a0c6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@majetideepak majetideepak deleted the fix-snappy branch June 26, 2024 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants