-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collapse sort into window expr and do sort within logical phase #571
Conversation
Codecov Report
@@ Coverage Diff @@
## master #571 +/- ##
==========================================
+ Coverage 76.02% 76.10% +0.07%
==========================================
Files 156 156
Lines 27063 27145 +82
==========================================
+ Hits 20575 20658 +83
+ Misses 6488 6487 -1
Continue to review full report at Codecov.
|
906b62d
to
f4341d4
Compare
performance comparison:
consider that the test data are generated per run, i would ignore 5%-ish variations. |
with 1million records and 8k batch size
|
eef960a
to
2f83560
Compare
this will make way for #569 |
I am sorry I am behind on reviews in DataFusion -- I plan to work through the backlog tomorrow |
7845ac4
to
e48fdd3
Compare
ce6a559
to
7553b89
Compare
|
||
// at this moment we are guaranteed by the logical planner | ||
// to have all the window_expr to have equal sort key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the design that the planner will group all WindowExpr that have the same window (ORDER BY, PARTITION BY) into the same LogicalPlan:Window
node? That sounds like a good plan to me 👍
It might (eventually) be worth validating that invariant here and assert
ing (or returning an Error` if the sort keys are not the same for multiple window exprs (mostly to catch potential bugs faster in the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can add the assertion later. yes the invariant is real.
\n WindowAggr: windowExpr=[[MIN(#orders.qty)]]\ | ||
\n Sort: #orders.order_id DESC NULLS FIRST\ | ||
\n TableScan: orders projection=None"; | ||
\n WindowAggr: windowExpr=[[MAX(#orders.qty) ORDER BY [#orders.order_id ASC NULLS FIRST]]]\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is cool to see two different WindowAggr nodes with different ORDER BY clauses. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rust uses timsort, and it shall be quick (to reverse an already sorted vec).
@@ -1452,11 +1452,18 @@ impl fmt::Debug for Expr { | |||
} | |||
Expr::WindowFunction { | |||
fun, | |||
ref args, | |||
args, | |||
partition_by, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the partition_by and order_by more logically belong on the LogicalPlan::Window
node if they can be shared across Expr::WindowFunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a separate construct not yet available in the sql parser.
see https://www.postgresql.org/docs/current/tutorial-window.html
When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is duplicative and error-prone if the same windowing behavior is wanted for several functions. Instead, each windowing behavior can be named in a WINDOW clause and then referenced in OVER. For example:
SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the partition_by and order_by more logically belong on the LogicalPlan::Window node if they can be shared across Expr::WindowFunction
for logically planned window function node it's not necessarily the case, because sort order is defined by the concatenations of partition_by
and then order_by
, but consider:
select max(c1) over (partition by c2 order by c3), max(c1) over (order by c2, c3) from test;
both window functions will have the same sort key of [c2 asc nulls first, c3 asc nulls first]
but they mean different things, and they might be logically planned together (meaning same pre-sorting happens) but not the same evaluation will happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see -- this makes sense. Thank you for the clarification
Perhaps it is worth a comment (in a follow on PR) on the LogicalPlan
window to this effect
Looks like a good step forward to me |
7553b89
to
5b44dce
Compare
@@ -1452,11 +1452,18 @@ impl fmt::Debug for Expr { | |||
} | |||
Expr::WindowFunction { | |||
fun, | |||
ref args, | |||
args, | |||
partition_by, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see -- this makes sense. Thank you for the clarification
Perhaps it is worth a comment (in a follow on PR) on the LogicalPlan
window to this effect
@jimexist there appears to be a build failure now:
unless there are any other comments I'll plan to merge this tomorrow when the build issue gets fixed |
5b44dce
to
20c19ad
Compare
|
||
let sort_keys = get_sort_keys(&window_expr[0]); | ||
if window_expr.len() > 1 { | ||
debug_assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb added this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
let sort_keys = get_sort_keys(&window_expr[0]); | ||
if window_expr.len() > 1 { | ||
debug_assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks again @jimexist |
Which issue does this PR close?
Based on #564 so review that first
Closes #573
Closes #526
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?