-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Add additional docstrings to Window function implementations #6592
Changes from 1 commit
5899adf
e1d6663
0d6e587
b6ef1aa
5e0217b
d56a5b2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,6 +32,7 @@ mod window_frame_state; | |
pub use aggregate::PlainAggregateWindowExpr; | ||
pub use built_in::BuiltInWindowExpr; | ||
pub use built_in_window_function_expr::BuiltInWindowFunctionExpr; | ||
pub use partition_evaluator::PartitionEvaluator; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
pub use sliding_aggregate::SlidingAggregateWindowExpr; | ||
pub use window_expr::PartitionBatchState; | ||
pub use window_expr::PartitionBatches; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,24 +25,70 @@ use datafusion_common::{DataFusionError, ScalarValue}; | |
use std::fmt::Debug; | ||
use std::ops::Range; | ||
|
||
/// Partition evaluator | ||
/// Partition evaluator for Window Functions | ||
/// | ||
/// An implementation of this trait is created and used for each | ||
/// partition defined by the OVER clause. | ||
/// | ||
/// For example, evaluating `window_func(val) OVER (PARTITION BY col)` | ||
/// on the following data: | ||
/// | ||
/// ```text | ||
/// col | val | ||
/// --- + ---- | ||
/// A | 1 | ||
/// A | 1 | ||
/// C | 2 | ||
/// D | 3 | ||
/// D | 3 | ||
/// ``` | ||
/// | ||
/// Will instantiate three `PartitionEvaluator`s, one each for the | ||
/// partitions defined by `col=A`, `col=B`, and `col=C`. | ||
/// | ||
/// There are two types of `PartitionEvaluator`: | ||
/// | ||
/// # Stateless `PartitionEvaluator` | ||
/// | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mustafasrepo / @ozankabak if you have time to help me describe more clearly what Stateful and Stateless There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some builtin window functions use window frame information inside the window expression (those are
Currently, we have support for bounded(stateful) execution for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you @mustafasrepo -- this is super helpful. I am incorporating this information into this PR |
||
/// In this case, [`PartitionEvaluator::evaluate`] is called for the | ||
/// entire partition / window function. | ||
/// | ||
/// # Stateful `PartitionEvaluator` | ||
/// | ||
/// This is used for XXXX. In this case YYYYY | ||
/// | ||
pub trait PartitionEvaluator: Debug + Send { | ||
/// Whether the evaluator should be evaluated with rank | ||
/// | ||
/// If `include_rank` is true, then [`Self::evaluate_with_rank`] | ||
/// will be called for each partition, which includes the | ||
/// `rank`. For example: | ||
/// | ||
/// ```text | ||
/// col | rank | ||
/// --- + ---- | ||
/// A | 1 | ||
/// A | 1 | ||
/// C | 2 | ||
/// D | 3 | ||
/// D | 3 | ||
/// ``` | ||
fn include_rank(&self) -> bool { | ||
false | ||
} | ||
|
||
/// Returns state of the Built-in Window Function | ||
/// Returns state of the Built-in Window Function (only used for stateful evaluation) | ||
fn state(&self) -> Result<BuiltinWindowState> { | ||
// If we do not use state we just return Default | ||
Ok(BuiltinWindowState::Default) | ||
} | ||
|
||
/// Updates the internal state for Built-in window function | ||
// state is useful to update internal state for Built-in window function. | ||
// idx is the index of last row for which result is calculated. | ||
// range_columns is the result of order by column values. It is used to calculate rank boundaries | ||
// sort_partition_points is the boundaries of each rank in the range_column. It is used to update rank. | ||
/// Updates the internal state for Built-in window function, if desired. | ||
/// | ||
/// `state`: is useful to update internal state for Built-in window function. | ||
/// `idx`: is the index of last row for which result is calculated. | ||
/// `range_columns`: is the result of order by column values. It is used to calculate rank boundaries | ||
/// `sort_partition_points`: is the boundaries of each rank in the range_column. It is used to update rank. | ||
fn update_state( | ||
&mut self, | ||
_state: &WindowAggState, | ||
|
@@ -54,15 +100,17 @@ pub trait PartitionEvaluator: Debug + Send { | |
Ok(()) | ||
} | ||
|
||
/// Sets the internal state for Built-in window function, if supported | ||
fn set_state(&mut self, _state: &BuiltinWindowState) -> Result<()> { | ||
Err(DataFusionError::NotImplemented( | ||
"set_state is not implemented for this window function".to_string(), | ||
)) | ||
} | ||
|
||
/// Gets the range where Built-in window function result is calculated. | ||
// idx is the index of last row for which result is calculated. | ||
// n_rows is the number of rows of the input record batch (Used during bound check) | ||
/// | ||
/// `idx`: is the index of last row for which result is calculated. | ||
/// `n_rows`: is the number of rows of the input record batch (Used during bound check) | ||
fn get_range(&self, _idx: usize, _n_rows: usize) -> Result<Range<usize>> { | ||
Err(DataFusionError::NotImplemented( | ||
"get_range is not implemented for this window function".to_string(), | ||
|
@@ -83,7 +131,9 @@ pub trait PartitionEvaluator: Debug + Send { | |
)) | ||
} | ||
|
||
/// evaluate the partition evaluator against the partition but with rank | ||
/// Evaluate the partition evaluator against the partition but with rank | ||
/// | ||
/// See [`Self::include_rank`] for more details | ||
fn evaluate_with_rank( | ||
&self, | ||
_num_rows: usize, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,12 +36,10 @@ use crate::{ | |
expressions::PhysicalSortExpr, reverse_order_bys, AggregateExpr, PhysicalExpr, | ||
}; | ||
|
||
/// A window expr that takes the form of an aggregate function | ||
/// Aggregate Window Expressions that have the form | ||
/// `OVER({ROWS | RANGE| GROUPS} BETWEEN UNBOUNDED PRECEDING AND ...)` | ||
/// e.g cumulative window frames uses `PlainAggregateWindowExpr`. Where as Aggregate Window Expressions | ||
/// that have the form `OVER({ROWS | RANGE| GROUPS} BETWEEN M {PRECEDING| FOLLOWING} AND ...)` | ||
/// e.g sliding window frames uses `SlidingAggregateWindowExpr`. | ||
/// A window expr that takes the form of an aggregate function that | ||
/// can be incrementally computed over sliding windows. | ||
/// | ||
/// See comments on [`WindowExpr`] for more details. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consolidated this description into |
||
#[derive(Debug)] | ||
pub struct SlidingAggregateWindowExpr { | ||
aggregate: Arc<dyn AggregateExpr>, | ||
|
@@ -72,10 +70,11 @@ impl SlidingAggregateWindowExpr { | |
} | ||
} | ||
|
||
/// peer based evaluation based on the fact that batch is pre-sorted given the sort columns | ||
/// and then per partition point we'll evaluate the peer group (e.g. SUM or MAX gives the same | ||
/// results for peers) and concatenate the results. | ||
|
||
/// Incrementally update window function using the fact that batch is | ||
/// pre-sorted given the sort columns and then per partition point. | ||
/// | ||
/// Evaluates the peer group (e.g. `SUM` or `MAX` gives the same results | ||
/// for peers) and concatenate the results. | ||
impl WindowExpr for SlidingAggregateWindowExpr { | ||
/// Return a reference to Any that can be used for downcasting | ||
fn as_any(&self) -> &dyn Any { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little unclear on why
use_window_frame
was part ofBuiltInWindowFunctionExpr
and notPartitionEvaluator
but I haven't looked into it in more detail