-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] Improved support for nested / structured types (Struct
, List
, ListArray
, and other Composite types)
#2326
Comments
This https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_plan/file_format/mod.rs#L238 is one reason of errors related to column projection. It compares the complete enum, failing on different field order. Arrow has a method to compare data types (https://github.com/apache/arrow-rs/blob/master/arrow/src/datatypes/datatype.rs#L674). I think this method should me made public, and used in above. Currently datafusion uses match_field_names (default true), https://github.com/apache/arrow-rs/blob/master/arrow/src/record_batch.rs#L153 causing the error. |
Thanks for the investigation @nl5887 -- that sounds definitely plausible. Feel free to file a PR with proposed changed -- we would love to review them |
This one is also related: #2581 |
Reminder to write docs: #1222 |
Struct
types / Composite type in DataFusionStruct
types / Composite type in DataFusion
Struct
types / Composite type in DataFusionStruct
, List
, ListArray
, and other Composite types)
Potential to add to list #7012 |
Hi, i think unnest support for struct can be an item in this epic right? |
That would make sense to me -- is there a ticket that describes what this means? |
i created a ticket: #10264 |
Thank you. I added this to the list in the ticket description |
I added an issue to support recursive unnest: #10660, i think it shoul belong to this epic |
Added |
I added an issue to check the duplicate or null name for struct: #11438 |
I think #11445 is related to this epic |
Thank you -- added |
Right now datafusion doesn't support struct evolution very well. Imagine you have a struct named
Feels like we should handle this more gracefully. cc @alamb I'm happy to make contributions if someone can point me to the right places to look. |
I agree
My suggestion is to start with filing a ticket with a self contained reproducer (either rust code or SQL) that shows what you are trying to do. This would likely become part of the test of any code improvement we make, as well as providing some more detail for other contributors to help point to the right place in the code |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This ticket is designed to capture the work needed to properly support Arrow
Struct
types in DataFusionhttps://arrow.apache.org/datafusion/user-guide/sql/sql_status.html says that nested types are not supported; The are not fully supported, but there are parts of the support already present such as a way to serialize them via ArrowWriter and using
field["nested_field"]
syntaxDescribe the solution you'd like
Research, and describe / implement what is else remains for proper support.
Array (
ListArray
) support:ARRAY
#6980FixedSizeList
in array methods #6560unnest
function #6555Map (
MapArray
) support:MAP
DataType #11429Struct (
StructArray
) support:Struct
table with explicit type and name #10207Union (
UnionArray
) supportUnion
as a function #10206Other
Known issues so far:
array_contains
#6557Signature
method for list datatypes. #6559AnyOp
andAllOp
operators #6602The text was updated successfully, but these errors were encountered: