-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix window spec to support scalars #8402
Conversation
2 1 1 | ||
|
||
# test window spec PARTITION BY/ORDER BY for scalar values should be ignored, only scalars in window spec | ||
# known current issue with RANK with empty WINDOW spec. Fix test after RANK issue resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix it filed #8403
Filed #8403 to fix RANK function |
if col_type.is_numeric() | ||
|| is_utf8_or_large_utf8(col_type) | ||
|| matches!(col_type, DataType::Null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do added tests cover this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the test tests also nulls in window spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the null must be the first order by expression, does it?
datafusion/sql/src/expr/function.rs
Outdated
// ignore window spec ORDER BY for scalar values | ||
.filter(|e| { | ||
!matches!( | ||
e, | ||
OrderByExpr { | ||
expr: sqlparser::ast::Expr::Value { .. }, | ||
.. | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, as in coerce_window_frame
the first order by expression's datatype is used to coerce start bound and end bound of Range window frame. If the only one order by expr is scalar, which is filtered out here, won't it return error?
if let Some(col_type) = current_types.first() {
...
} else {
return internal_err!("ORDER BY column cannot be empty");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its okay, it doesn't fail.... I tried to add a test with pure scalars but it is not stable because of other reasons. I'll decompose the test for now for single rows to prove it doesn't fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you test it? Did you use RANGE frame? It must be RANGE that triggers the error:
query error DataFusion error: Error during planning: RANGE requires exactly one ORDER BY column
select a,
rank() over (partition by a order by 1 RANGE 10 PRECEDING) rnk
from (select 1 a union all select 2 a) q
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial tests are in the ticket description
#8386 (comment)
the small local test
❯ select rank() over (order by null) from (select 1 a union all select 2 a) q;
Error during planning: Sort operation is not applicable to scalar value NULL
In pg/duckdb
D select rank() over (order by null) from (select 1 a union all select 2 a) q;
┌─────────────────────────────┐
│ rank() OVER (ORDER BY NULL) │
│ int64 │
├─────────────────────────────┤
│ 1 │
│ 1 │
└─────────────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I run it now with pure null, I'm getting another issue
select rank() over (order by null) from (select 1 a union all select 2 a) q;
Arrow error: Invalid argument error: number of columns(2) must match number of fields(1) in schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird, it works for
select rank() over (order by 1) from (select 1 a union all select 2 a) q;
but fails for
select rank() over (order by null) from (select 1 a union all select 2 a) q;
Digging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mustafasrepo can I have your input please. The reason is the query fails
select rank() over (order by 1) rnk from (select 1 a union all select 2 a) x
Arrow error: Invalid argument error: number of columns(2) must match number of fields(1) in schema
The reason for that is optimize_projections
removes the unused fields and this breaks the consistency. The question for you, what do you think is expected behavior for such rare case?
PS. If add a column name to projection or to ORDER BY it will expectedly work.
select a, | ||
rank() over (partition by 10, 'a', null order by 20, 10, 'a', null) rnk, | ||
row_number() over (partition by 10, 'a', null order by 20, 10, 'a', null) rn | ||
from (select 1 a) q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can use order by a
to make the result stable.
select a,
rank() over (partition by 10, a, 'a', null order by 20, a, 10, 'a', null) rnk,
row_number() over (partition by 10, a, 'a', null order by 20, a, 10, 'a', null) rn
from (select 2 a union all select 1 a) q order by a;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @haohuaijin 'll do. Before that I was confused (#8403 (comment)) and thought it is another bug so made a row level workaround. I was able to reproduce the same in DuckDB so this is expected, not a bug I'll fix the test
Which issue does this PR close?
Closes #8386.
Rationale for this change
The window expression spec fails if
PARTITION BY
orORDER BY
contains scalar, whereas such query considered as valid in PGWhat changes are included in this PR?
Ignore scalars for
PARTITION BY
orORDER BY
as semantically it means nothingAre these changes tested?
Yes
Are there any user-facing changes?
No