-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-52651][SQL] Handle User Defined Type in Nested ColumnVector #51349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| val targetType = sparkReadType.map { | ||
| case udt: UserDefinedType[_] => udt.sqlType | ||
| case otherType => otherType | ||
| _.transformRecursively { case t: UserDefinedType[_] => t.sqlType } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question. What about ORC file format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dongjoon-hyun, good question. I have an umbrella ticket for udt improvements. Let me check other formats or readers with followups if necessary
|
cc @peter-toth |
|
Merged to master, thank you @dongjoon-hyun @peter-toth |
…ith null DataType ### What changes were proposed in this pull request? Check whether the parameter DataType is null in ColumnVector constructor before transforming it ### Why are the changes needed? A subclass of ColumnVector, e.g. Iceberg's [ConstantColumnVector](https://github.com/apache/iceberg/blob/main/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ConstantColumnVector.java#L41), could be created with null `DataType`. It throws NPE after #51349, which can be verified by failed tests in [integrating Spark 4.1.0-preview1 in Iceberg](apache/iceberg#14155) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52423 from manuzhang/SPARK-53678. Authored-by: manuzhang <owenzhang1990@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ith null DataType ### What changes were proposed in this pull request? Check whether the parameter DataType is null in ColumnVector constructor before transforming it ### Why are the changes needed? A subclass of ColumnVector, e.g. Iceberg's [ConstantColumnVector](https://github.com/apache/iceberg/blob/main/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ConstantColumnVector.java#L41), could be created with null `DataType`. It throws NPE after apache#51349, which can be verified by failed tests in [integrating Spark 4.1.0-preview1 in Iceberg](apache/iceberg#14155) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52423 from manuzhang/SPARK-53678. Authored-by: manuzhang <owenzhang1990@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
When I read a map column with a UDT nested, I encountered:
This PR adds a recursive loop to omit the UDT
Why are the changes needed?
Add UDT missing features
Does this PR introduce any user-facing change?
No
How was this patch tested?
New Tests
Was this patch authored or co-authored using generative AI tooling?
no