Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to read varbinary column from Parquet fixed length byte array #9887

Closed

Conversation

majetideepak
Copy link
Collaborator

Resolves: #9757

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 22, 2024
Copy link

netlify bot commented May 22, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit e0a21df
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/664ebfd90452260008b22091

@majetideepak majetideepak force-pushed the support-flba-varbinary branch from 7681904 to cb13fbe Compare May 22, 2024 06:19
@majetideepak majetideepak marked this pull request as draft May 22, 2024 06:20
@majetideepak majetideepak force-pushed the support-flba-varbinary branch from cb13fbe to 35199e3 Compare May 22, 2024 06:21
@majetideepak majetideepak marked this pull request as ready for review May 22, 2024 07:25
@majetideepak majetideepak force-pushed the support-flba-varbinary branch from ee0e977 to 61e0667 Compare May 22, 2024 07:27
@majetideepak majetideepak requested review from yingsu00, Yuhta and yma11 May 22, 2024 07:28
Copy link
Contributor

@yma11 yma11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

velox/dwio/parquet/reader/StringDecoder.h Outdated Show resolved Hide resolved
@Yuhta Yuhta added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label May 22, 2024
Copy link
Collaborator

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just some nits.


lastSafeWord_(end - simd::kPadding) {}
StringDecoder(const char* start, const char* end, int length)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just set default value -1 for length parameter and no need to add a second constructor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

const char* bufferStart_;
const char* bufferEnd_;
const char* const lastSafeWord_;
const int length_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename it to fixedLength_ to distinguish between the local var length in readString()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@majetideepak majetideepak force-pushed the support-flba-varbinary branch from 6c6402b to e0a21df Compare May 23, 2024 04:02
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

Seeing test failure:

[ RUN      ] ParquetReaderTest.readVarbinaryFromFLBA

E0524 07:50:27.656497  8491 Exceptions.h:67] Line: fbcode/velox/dwio/parquet/reader/ParquetReader.cpp:136, Function:ReaderBase, Expression: fileLength_ > 0 (0 vs. 0) Parquet file is empty, Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (0 vs. 0) Parquet file is empty
Retriable: False
Expression: fileLength_ > 0
Function: ReaderBase
File: fbcode/velox/dwio/parquet/reader/ParquetReader.cpp
Line: 136

@majetideepak
Copy link
Collaborator Author

majetideepak commented May 25, 2024

Parquet file is empty

@mbasmanova, The CI test passed. The test passes locally as well. The file being empty is strange since I can see that the file is not empty in this PR. Likely the file did not download correctly in the Meta test infrastructure. Can you please check again?

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

@majetideepak Deepak, I'm seeing that after importing this PR, the new file varbinary_flba.parquet is empty. I'll need some time to figure out why this is happening and how to fix that.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 188f5b9.

Copy link

Conbench analyzed the 1 benchmark run on commit 188f5b9d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@majetideepak majetideepak deleted the support-flba-varbinary branch May 30, 2024 10:40
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…rray (facebookincubator#9887)

Summary:
Resolves: facebookincubator#9757

Pull Request resolved: facebookincubator#9887

Reviewed By: Yuhta, kgpai

Differential Revision: D57776408

Pulled By: mbasmanova

fbshipit-source-id: 9a282b68be810b1b99391105157b0777db7e568f
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…rray (facebookincubator#9887)

Summary:
Resolves: facebookincubator#9757

Pull Request resolved: facebookincubator#9887

Reviewed By: Yuhta, kgpai

Differential Revision: D57776408

Pulled By: mbasmanova

fbshipit-source-id: 9a282b68be810b1b99391105157b0777db7e568f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Core dump happens in StringColumnReader::processFilter during parquet read
6 participants