You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
We're working on running some used-to-be-Spark pipelines through DataFusion. One case we've noticed where DataFusion doesn't support something is comparing lists. (Spark allows)[https://github.com/apache/spark/blame/d9394eee5ebbeb695baaec6122da2ed970842dfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L1025] comparing (==, !=, <, >, <=, >=, ..) columns of structs and lists, while in DataFusion those seem to throw:
Note that the comparison for nested type is not supported in arrow-rs apache/arrow-rs#5942, so we should implement them in datafusion.
First attempt #11091
Ideally we should make it configurable so we can support both Spark and Postgres like behaviour for nulls
Is your feature request related to a problem or challenge?
We're working on running some used-to-be-Spark pipelines through DataFusion. One case we've noticed where DataFusion doesn't support something is comparing lists. (Spark allows)[https://github.com/apache/spark/blame/d9394eee5ebbeb695baaec6122da2ed970842dfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L1025] comparing (==, !=, <, >, <=, >=, ..) columns of structs and lists, while in DataFusion those seem to throw:
For structs, from our internal testing:
For lists, this is shown in DataFusion's tests:
datafusion/datafusion/sqllogictest/test_files/array_query.slt
Line 44 in 3773fb7
Maybe this would need to be improved on Arrow directly, seeing that the error is coming from https://github.com/apache/arrow-rs/blob/087f34b70e97ee85e1a54b3c45c5ed814f500b0a/arrow-ord/src/cmp.rs#L219?
Describe the solution you'd like
Binary predicates to be allowed for structs and lists, preferably following same semantics as in Spark (mostly I think it's a DFS over all the fields https://github.com/apache/spark/blob/d9394eee5ebbeb695baaec6122da2ed970842dfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala#L285)
Describe alternatives you've considered
No response
Additional context
Related to #2326
The text was updated successfully, but these errors were encountered: