-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Add additional docstrings to Window function implementations #6592
Conversation
@@ -69,6 +73,8 @@ pub trait BuiltInWindowFunctionExpr: Send + Sync + std::fmt::Debug { | |||
false | |||
} | |||
|
|||
/// If returns true, [`Self::create_evaluator`] must implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little unclear on why use_window_frame
was part of BuiltInWindowFunctionExpr
and not PartitionEvaluator
but I haven't looked into it in more detail
@@ -32,6 +32,7 @@ mod window_frame_state; | |||
pub use aggregate::PlainAggregateWindowExpr; | |||
pub use built_in::BuiltInWindowExpr; | |||
pub use built_in_window_function_expr::BuiltInWindowFunctionExpr; | |||
pub use partition_evaluator::PartitionEvaluator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PartitionEvaluator
now appears in public docstrings (and I propose to make it part of the user defined window function API -- see #5781 (comment))
/// There are two types of `PartitionEvaluator`: | ||
/// | ||
/// # Stateless `PartitionEvaluator` | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mustafasrepo / @ozankabak if you have time to help me describe more clearly what Stateful and Stateless PartitionEvaluator
s are , and specifically what is different between them, I would be most appreciative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some builtin window functions use window frame information inside the window expression (those are FIRST_VALUE
, LAST_VALUE
, NTH_VALUE
). However, for most of the window functions what is in the window frame is not important (those are ROW_NUMBER
, RANK
, DENSE_RANK
, PERCENT_RANK
, CUME_DIST
, LEAD
, LAG
). For the ones, using window_frame PartitionEvaluator::evaluate_inside_range
is called. For the ones that do not use window frame PartitionEvaluator::evaluate
is called (For rank calculations, PartitionEvaluator::evaluate_with_rank
is called since its API is quite different. However, it doesn't use window frame either.)
PartitionEvaluator::evaluate_stateful
is used only when we produce window result with bounded memory(When window functions are called from the BoundedWindowAggExec
). In this case window results are calculated in running fashion, hence we need to store previous state, to be able to calculate correct output (For instance, for ROW_NUMBER
function the current batch evaluator receive may not be the first batch. Hence we cannot start row_number from 0, we need to start from last ROW_NUMBER
produced for the previous batches received. Similarly, we need to store some information in the state. When we do not receive whole table as a single batch)
Currently, we have support for bounded(stateful) execution for FIRST_VALUE
, LAST_VALUE
, NTH_VALUE
, ROW_NUMBER
, RANK
, DENSE_RANK
, LEAD
, LAG
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mustafasrepo -- this is super helpful. I am incorporating this information into this PR
/// A window expr that takes the form of an aggregate function that | ||
/// can be incrementally computed over sliding windows. | ||
/// | ||
/// See comments on [`WindowExpr`] for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidated this description into WindowExpr
@mustafasrepo / @ozankabak I wonder if you have time to review this documentation and check if I got it right? This PR has no code changes in it |
/// Construct a new [`BuiltInWindowFunctionExpr`] that produces | ||
/// the same result as this function on a window with reverse | ||
/// order. The return value of this function is used by the | ||
/// DataFusion optimizer to avoid resorting the data when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use re-sorting
instead of resorting
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d56a5b2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb for this PR. This PR is LGTM!.
…pache#6592) * Add additional docstrings to Window function implementations * Update docs * updates * fix doc link * Change resorting --> re-sorting
Which issue does this PR close?
Related to #5781
Rationale for this change
As I begin working out how to expose User Defined Window Functions, I want to
What changes are included in this PR?
MakeUPDATE: This PR just adds documentation, it doesn't change any APIsPartitionEvaluator
publicAre these changes tested?
Yes, with CI
Are there any user-facing changes?
better rustdocs, hopefully