[SPARK-3569][SQL] Add metadata field to StructField#2701
[SPARK-3569][SQL] Add metadata field to StructField#2701mengxr wants to merge 29 commits intoapache:masterfrom
Conversation
|
QA tests have started for PR 2701 at commit
|
|
QA tests have started for PR 2701 at commit
|
|
QA tests have finished for PR 2701 at commit
|
|
Test FAILed. |
|
QA tests have finished for PR 2701 at commit
|
|
Test FAILed. |
|
I think using immutable Map for metadata is enough. We can add an API like |
|
PySpark also need to be updated. |
|
QA tests have started for PR 2701 at commit
|
|
@liancheng The Python API is hard to update at this time because the schema SerDe is via https://github.com/apache/spark/blob/master/python/pyspark/sql.py#L1131 |
|
QA tests have finished for PR 2701 at commit
|
|
Test PASSed. |
|
#2563 has already replaced |
|
QA tests have started for PR 2701 at commit
|
|
QA tests have finished for PR 2701 at commit
|
|
Test FAILed. |
|
QA tests have started for PR 2701 at commit
|
|
QA tests have finished for PR 2701 at commit
|
|
Test PASSed. |
python/pyspark/sql.py
Outdated
There was a problem hiding this comment.
Use {} as default value will have side effects, such as:
>>> a = StructField('a', StringType(), True)
>>> b = StructField('b', StringType(), True)
>>> a.metadata['name'] = 'a'
>>> b.metadata['name']
'a'
So if the metadata could be modified somewhere, here you should use None as default value.
def xxx(xxx, metadata=None):
....
self.metadata = metadata or {}
|
QA tests have started for PR 2701 at commit
|
|
QA tests have finished for PR 2701 at commit
|
|
Test PASSed. |
|
Test build #443 has started for PR 2701 at commit
|
|
Test build #443 has finished for PR 2701 at commit
|
|
Test build #458 has started for PR 2701 at commit
|
|
Test build #465 has started for PR 2701 at commit
|
|
Test build #458 has finished for PR 2701 at commit
|
|
@mengxr, thanks for working on this! Overall LGTM. One minor thing: I think we should expose Metadata as a type variable in the |
|
Test build #465 has finished for PR 2701 at commit
|
|
Here's a PR to fix the package visibility. If that looks good to you I think this is ready to merge: mengxr#1 |
Expose Metadata and MetadataBuilder through the public scala and java packages.
|
QA tests have started for PR 2701 at commit
|
|
QA tests have finished for PR 2701 at commit
|
|
Test FAILed. |
|
Test build #22671 has started for PR 2701 at commit
|
|
Test build #22671 has finished for PR 2701 at commit
|
|
Test PASSed. |
|
Thanks! Merged to master. |
Add
metadata: MetadatatoStructFieldto store extra information of columns.Metadatais a simple wrapper overMap[String, Any]with value types restricted to Boolean, Long, Double, String, Metadata, and arrays of those types. SerDe is via JSON.Metadata is preserved through simple operations like
SELECT.@marmbrus @liancheng