-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify and encapsulate window function state management #6621
Changes from all commits
0e8ca7d
6decd54
fb0c17e
7d3e361
4542799
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,7 +17,6 @@ | |
|
||
//! Partition evaluation module | ||
|
||
use crate::window::window_expr::BuiltinWindowState; | ||
use crate::window::WindowAggState; | ||
use arrow::array::ArrayRef; | ||
use datafusion_common::Result; | ||
|
@@ -78,8 +77,7 @@ use std::ops::Range; | |
/// | ||
/// In this case, [`Self::evaluate_stateful`] is called to calculate | ||
/// the results of the window function incrementally for each new | ||
/// batch, saving and restoring any state needed to do so as | ||
/// [`BuiltinWindowState`]. | ||
/// batch. | ||
/// | ||
/// For example, when computing `ROW_NUMBER` incrementally, | ||
/// [`Self::evaluate_stateful`] will be called multiple times with | ||
|
@@ -91,14 +89,6 @@ use std::ops::Range; | |
/// [`BuiltInWindowFunctionExpr`]: crate::window::BuiltInWindowFunctionExpr | ||
/// [`BuiltInWindowFunctionExpr::create_evaluator`]: crate::window::BuiltInWindowFunctionExpr::create_evaluator | ||
pub trait PartitionEvaluator: Debug + Send { | ||
/// Returns the internal state of the window function | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The point of the PR is to get of the functions on this trait that serialized / set state (which has a single use) |
||
/// | ||
/// Only used for stateful evaluation | ||
fn state(&self) -> Result<BuiltinWindowState> { | ||
// If we do not use state we just return Default | ||
Ok(BuiltinWindowState::Default) | ||
} | ||
|
||
/// Updates the internal state for window function | ||
/// | ||
/// Only used for stateful evaluation | ||
|
@@ -118,13 +108,16 @@ pub trait PartitionEvaluator: Debug + Send { | |
Ok(()) | ||
} | ||
|
||
/// Sets the internal state for window function | ||
/// When the window frame has a fixed beginning (e.g UNBOUNDED | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While writing this explanation it occurred to me that the term There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will leave it named |
||
/// PRECEDING), some functions such as FIRST_VALUE, LAST_VALUE and | ||
/// NTH_VALUE do not need the (unbounded) input once they have | ||
/// seen a certain amount of input. | ||
/// | ||
/// Only used for stateful evaluation | ||
fn set_state(&mut self, _state: &BuiltinWindowState) -> Result<()> { | ||
Err(DataFusionError::NotImplemented( | ||
"set_state is not implemented for this window function".to_string(), | ||
)) | ||
/// `memoize` is called after each input batch is processed, and | ||
/// such functions can save whatever they need and modify | ||
/// [`WindowAggState`] appropriately to allow rows to be pruned | ||
fn memoize(&mut self, _state: &mut WindowAggState) -> Result<()> { | ||
Ok(()) | ||
} | ||
|
||
/// Gets the range where the window function result is calculated. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -327,16 +327,7 @@ pub struct LeadLagState { | |
pub idx: usize, | ||
} | ||
|
||
#[derive(Debug, Clone, Default)] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It turns out this structure is not used anywhere else |
||
pub enum BuiltinWindowState { | ||
Rank(RankState), | ||
NumRows(NumRowsState), | ||
NthValue(NthValueState), | ||
LeadLag(LeadLagState), | ||
#[default] | ||
Default, | ||
} | ||
|
||
/// Holds the state of evaluating a window function | ||
#[derive(Debug)] | ||
pub struct WindowAggState { | ||
/// The range that we calculate the window function | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually more readable as well as cleaning up the trait