Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-468: Clarify MAP logical type #469
GH-468: Clarify MAP logical type #469
Changes from 1 commit
de7f06d
bc0ba89
fa44174
8516fbc
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that the value can be emitted 🤔 When would a case be like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've never seen an actual use-case implemented for this one either. I am open to require having a value but since it is out there for a while now, I am not sure if we can do it. Maybe a suggestion to always specify a value and let it be an optional with all nulls if one would not use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move omitted value field to the backward compatibility section, which is all about cases that are not suggested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure backward compatibility section would be the best place for this. That section is mainly for supporting Parquet files that were written before the proper spec.
Anyway, it seems similar to the nested key question. @JFinis, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps when using a
MAP
to encode a set? 🤷There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parquet-cpp does not support set, it uses a list instead: https://github.com/apache/arrow/blob/c3601a97a0718ae47726e6c134cbed4b98bd1a36/cpp/src/parquet/arrow/schema.cc#L578-L583
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wgtmac, @etseidl, WDYT, shall I simply remove this last sentence about handling the case of "missing" values? I don't think we can change the original spec because it is clearly stated that value can be omitted. Then, it is up to the implementation how they support/not support this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the below one?
In case of not present, it can be represented as a map with all null values or as a set of keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original discussion backs up this interpretation. I think it's good to leave the explanation in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have the one below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think so. At the topmost element we need either a
MAP
or aMAP_KEY_VALUE
(for backward compatibility).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this? I'm just deducing possible types from the new sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new sentence should not change anything. Just wanted to highlight why it was originally invented. If it causes confusion, it does not worth it.
(At the repeated level we do not say anything about its potential logical type. So, a reader should ignore anything there anyway.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wgtmac, do you think this sentence is misleading? I'm happy to remove it. The implementation do not need to check the logical type of the repeated field anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is. Especially when we have said that the name of
key_value
is not required in the previous paragraph.