-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consistent logical & physical NTILE
return types
#8270
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good.
It might be good to fix some other bugs in ntile. Do you think a separate issue should be opened for these or is it something maybe worth addressing in this PR?
DataFusion CLI v33.0.0
❯ create table t1 (a int);
0 rows in set. Query took 0.005 seconds.
❯ insert into t1 values (1),(2),(3);
+-------+
| count |
+-------+
| 3 |
+-------+
1 row in set. Query took 0.006 seconds.
-- Do these results make sense? All other databases return ntile values 1,2,3.
-- Tested at https://dbfiddle.uk/
❯ select ntile(9223377) OVER(ORDER BY a) from t1;
+--------------------------------------------------------------------------------------------------------+
| NTILE(Int64(9223377)) ORDER BY [t1.a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
+--------------------------------------------------------------------------------------------------------+
| 1 |
| 3074460 |
| 6148919 |
+--------------------------------------------------------------------------------------------------------+
-- This should return a regular error instead of an internal error
❯ select ntile(9223372036854775809) OVER(ORDER BY a) from t1;
Internal error: Cannot convert UInt64(9223372036854775809) to i64.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
-- This should not panic and crash the datafusion cli
❯ select ntile(-922337203685477580) OVER(ORDER BY a) from t1;
thread 'main' panicked at /home/ms/git/arrow-datafusion/datafusion/physical-expr/src/window/ntile.rs:100:23:
attempt to multiply with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
/home/ms/git/arrow-datafusion/datafusion-cli (ntile_output_type ✔) ᐅ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @korowa
Thanks @msirek . I recommend we file a ticket and fix these issues in a follow on PR. My rationale is that this PR is strictly better than main (even though you have identified other areas where it can be better). |
Which issue does this PR close?
Closes #7639.
Rationale for this change
At this moment, while logical plan building, return type of
ntile
considered to beUInt32
, and physical expression ofntile
returnsUInt64
-- this leads to incompatible schemas of created memory table and record batches on insertion.FWIW: another schema incompatibility left (non breaking in case of CTAS, though) -- nullability of output fields, should be fixed in #7638
What changes are included in this PR?
Return type of
ntile
logical expression isUInt64
nowAre these changes tested?
Unit test for return type & minimal reproducer from the issue in sqllogictests
Are there any user-facing changes?
Return type of
ntile
logical expression isUInt64
now