-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11692][SQL] Support for Parquet logical types, JSON and BSON (embedded types) #9658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
retest this please |
|
cc @liancheng |
|
Test build #45724 has finished for PR 9658 at commit
|
|
Test build #45726 has finished for PR 9658 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, again, what's the +?
|
Test build #45732 has finished for PR 9658 at commit
|
|
Test build #45731 has finished for PR 9658 at commit
|
|
Test build #45804 has finished for PR 9658 at commit
|
|
retest this please |
|
Test build #45827 has finished for PR 9658 at commit
|
|
retest this please |
|
Test build #45835 has finished for PR 9658 at commit
|
|
All the builds pass all the tests at |
|
retest this please |
|
Test build #45843 has finished for PR 9658 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that I missed this part during the last review. Please always use === instead of == for better assertion error messages.
|
Thanks! I changed this. |
|
Test build #45963 has finished for PR 9658 at commit
|
|
Thanks! Merging to master. |
|
We've added support for reading these types as strings, but we can't round trip data without losing the annotation which might be kind of confusing for users. Perhaps we should also be reading/writing this info to/from the metadata. |
|
Please note that #9754 updated unintentionally this to clean up at mater branch however, that is supposed to be merged with branch 1.6 and for this version 1.7. |
|
Hm, I don't quite get it... So this PR is only for master (targeting 1.7). I don't think we need to backport this one to anywhere else. |
|
Ah. I just got confused for a bit. It doesn't need to. |
|
@marmbrus Did you mean the metadata stored in Parquet key-value user defined metadata, or the schema metadata in And for "support for reading these types as strings", are you referring to |
That is an unfortunate limitation of our metadata, but it does seem like it could be worked around. Though that said this is a minor concern.
I'm just saying thats what this patch does. It just reads them in as a text/binary string of opaque bytes. |
…embedded types) Parquet supports some JSON and BSON datatypes. They are represented as binary for BSON and string (UTF-8) for JSON internally. I searched a bit and found Apache drill also supports both in this way, [link](https://drill.apache.org/docs/parquet-format/). Author: hyukjinkwon <gurwls223@gmail.com> Author: Hyukjin Kwon <gurwls223@gmail.com> Closes #9658 from HyukjinKwon/SPARK-11692. (cherry picked from commit e388b39) Signed-off-by: Michael Armbrust <michael@databricks.com>
Parquet supports some JSON and BSON datatypes. They are represented as binary for BSON and string (UTF-8) for JSON internally.
I searched a bit and found Apache drill also supports both in this way, link.