-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47838: [C++][Parquet] Set Variant specification version to 1 to align with the variant spec #47835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
@wgtmac Can you take a look? |
|
|
a7c4d0d to
3450f1c
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it not the case for our own reader? |
You mean Parquet C++ reader? Currently Parquet C++ reader hasn't implemented Variant read/write yet. The only thing it added is to support Variant logical type and we are not doing the check. Currently Variant spec version is version 1 (and this is the only version) and the other readers may/may not add the check. |
4b65ada to
38a919d
Compare
Sorry, because I might be missing something obvious, I am not too familiar with this part of the codebase, but if we haven't implemented Parquet C++ Variant write yet, I am not sure I understand how can a user would be able to create Variant files with a logical type annotation that will be incorrect with Parquet C++ if we release without this fix. |
The engines will implement the reader/writer parts but will use the variant type defined in Arrow Parquet. That would cause the engines to write incorrect annotation. That's what I'm seeing internally. |
|
We haven't been able to publish Python 3.14 wheels for PyArrow and the community is eager to get those. @pitrou @wgtmac can you help with this issue? |
| } | ||
|
|
||
| std::shared_ptr<const LogicalType> VariantLogicalType::Make(const int8_t specVersion) { | ||
| auto logical_type = std::shared_ptr<VariantLogicalType>(new VariantLogicalType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use std::make_shared? It's more terse and more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been addressed or resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the VariantLogicalType constructor is private, so std::make_shared cannot work with it.
You mean help with the 3.14 wheels? I'm not sure I understand your message correctly.
I would not call it a blocker entirely, but we would certainly rather have it. |
No, sorry, I meant this issue which is currently holding the release and holding things like publishing the Python 3.14 wheels. |
38a919d to
cc1a45b
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 now that comments are addressed. Let's wait for CI.
|
CI failures are unrelated, I'll merge. Thanks for spotting and fixing this @aihuaxu ! |
|
@raulcd I'll let you make the final decision, but it would be nice if a new RC could be issued with this fix. |
|
I'll create a new RC. Thanks everyone for the quick fix! |
…ign with the variant spec (#47835) ### Rationale for this change According to the [Variant specification](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md), the specification_version field must be set to 1 to indicate Variant encoding version 1. Currently, this field defaults to 0, which violates the specification. Parquet readers that strictly enforce specification version validation will fail to read files containing Variant types. <img width="624" height="185" alt="image" src="https://github.com/user-attachments/assets/b0f1deb9-0301-4b94-a472-17fd9cc0df5d" /> ### What changes are included in this PR? The change includes defaulting the specification version to 1. ### Are these changes tested? The change is covered by unit test. ### Are there any user-facing changes? The Parquet files produced the variant logical type annotation `VARIANT(1)`. ``` Schema: message schema { optional group V (VARIANT(1)) = 1 { required binary metadata; required binary value; } } ``` * GitHub Issue: #47838 Lead-authored-by: Aihua <aihua.xu@snowflake.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 5f616db. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them. |
… to align with the variant spec (apache#47835) ### Rationale for this change According to the [Variant specification](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md), the specification_version field must be set to 1 to indicate Variant encoding version 1. Currently, this field defaults to 0, which violates the specification. Parquet readers that strictly enforce specification version validation will fail to read files containing Variant types. <img width="624" height="185" alt="image" src="https://github.com/user-attachments/assets/b0f1deb9-0301-4b94-a472-17fd9cc0df5d" /> ### What changes are included in this PR? The change includes defaulting the specification version to 1. ### Are these changes tested? The change is covered by unit test. ### Are there any user-facing changes? The Parquet files produced the variant logical type annotation `VARIANT(1)`. ``` Schema: message schema { optional group V (VARIANT(1)) = 1 { required binary metadata; required binary value; } } ``` * GitHub Issue: apache#47838 Lead-authored-by: Aihua <aihua.xu@snowflake.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
According to the Variant specification, the specification_version field must be set to 1 to indicate Variant encoding version 1. Currently, this field defaults to 0, which violates the specification. Parquet readers that strictly enforce specification version validation will fail to read files containing Variant types.

What changes are included in this PR?
The change includes defaulting the specification version to 1.
Are these changes tested?
The change is covered by unit test.
Are there any user-facing changes?
The Parquet files produced the variant logical type annotation
VARIANT(1).