Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joins on Structs fail at runtime #9254

Open
jacksonrnewhouse opened this issue Feb 16, 2024 · 1 comment
Open

Joins on Structs fail at runtime #9254

jacksonrnewhouse opened this issue Feb 16, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@jacksonrnewhouse
Copy link

Describe the bug

If you attempt to join two tables on a struct field, the query will plan it successfully, albeit with the struct equality in a the filter, rather than in the on vector. However, when it runs it fails with "Invalid comparison operation". In particular, it triggers this error from arrow-rs: https://github.com/apache/arrow-rs/blob/db811083669df66992008c9409b743a2e365adb0/arrow-ord/src/cmp.rs#L202.

To Reproduce

I wrote a failing test that just does a self join at 35.0.0...ArroyoSystems:arrow-datafusion:bug_report/struct_join_fails_at_execution. The failure message is

thread 'user_defined::user_defined_aggregates::test_struct_join' panicked at datafusion/core/tests/user_defined/user_defined_aggregates.rs:172:60:
called `Result::unwrap()` on an `Err` value: Execution("Fail to build join indices in NestedLoopJoinExec, error:Arrow error: Invalid argument error: Invalid comparison operation: Struct([Field { name: \"value\", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"time\", data_type: Timestamp(Nanosecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]) == Struct([Field { name: \"value\", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"time\", data_type: Timestamp(Nanosecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }])")

Expected behavior

Either the join should fail at planning, reporting a clear error that joins on structs are not supported or, preferably, datafusion should support joins on two structs of the same type.

Additional context

This comes up with Arroyo where we want to join on time windows, e.g. sliding and tumbling windows.

@alamb
Copy link
Contributor

alamb commented Mar 3, 2024

👍 thanks -- looks like @jayzhan211 has been hard at work upstream in arrow-rs trying to get the appropriate support put in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants