Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and encapsulate window function state management #6621

Merged
merged 5 commits into from
Jun 13, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 9, 2023

Which issue does this PR close?

Related to #5781

Rationale for this change

PartitionEvaluator currently has some methods that expose an enum BuiltinWindowState specific to the built in functions. If we choose to expose PartitionEvaluator externally it needs to not include details specifically about the built i window functions

It turns out there is exactly one use for this state, which is memoization (for nth value calculation). We can move that functionality into the trait and avoid the need for BuiltinWindowState entirely

What changes are included in this PR?

  1. Move the memoization code to the window function to the window function that uses it (nth_value)
  2. Remove PartitionEvaluator::get_state, PartitionEvaluator::set_state() and BuiltinWindowState

Are these changes tested?

existing tests

Are there any user-facing changes?

No these are all (currently) internal APIs

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jun 9, 2023
memoize_nth_value(state, nth_value_state)?;
evaluator.set_state(&evaluator_state)?;
}
evaluator.memoize(state)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually more readable as well as cleaning up the trait

@@ -100,14 +99,6 @@ pub trait PartitionEvaluator: Debug + Send {
false
}

/// Returns the internal state of the window function
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the PR is to get of the functions on this trait that serialized / set state (which has a single use)

@@ -327,16 +327,7 @@ pub struct LeadLagState {
pub idx: usize,
}

#[derive(Debug, Clone, Default)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out this structure is not used anywhere else

@@ -127,13 +117,16 @@ pub trait PartitionEvaluator: Debug + Send {
Ok(())
}

/// Sets the internal state for window function
/// When the window frame has a fixed beginning (e.g UNBOUNDED
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While writing this explanation it occurred to me that the term memoize, while technically accurate is not the entire story here. I wonder if there is a better name like adjust_window_state 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adjust_window_state feels like a more general functionality. However, this function is a special case for nth_value when window frame boundary has UNBOUNDED PRECEDING. Hence I think memoize is better for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave it named memoize

@alamb alamb requested a review from mustafasrepo June 9, 2023 21:50
if let BuiltinWindowState::NthValue(nth_value_state) = state {
self.state = nth_value_state.clone()
/// When the window frame has a fixed beginning (e.g UNBOUNDED
/// PRECEDING), some functions such as FIRST_VALUE, LAST_VALUE and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can replace this line with PRECEDING), for some functions such as FIRST_VALUE, LAST_VALUE and

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (BTW if you do "CTRL-G" you can get inline suggestions, to get the github UI to show the suggestion inline):

Suggested change
/// PRECEDING), some functions such as FIRST_VALUE, LAST_VALUE and
/// PRECEDING), for some functions such as FIRST_VALUE, LAST_VALUE and
Screenshot 2023-06-12 at 8 11 45 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip

Copy link
Contributor

@mustafasrepo mustafasrepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb. I have left some minor comments. Other than these this PR is LGTM!.

fn memoize(&mut self, state: &mut WindowAggState) -> Result<()> {
let out = &state.out_col;
let size = out.len();
let (is_prunable, new_prunable) = match self.state.kind {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that is is not introduced here (and introduced by me) but while reading the code maybe instead of new_prunable we can use is_last flag. I think, that way intent will be more clear.

NthValueKind::First => {
let n_range =
state.window_frame_range.end - state.window_frame_range.start;
(n_range > 0 && size > 0, true)
Copy link
Contributor

@mustafasrepo mustafasrepo Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case we use is_last flag, second flag should be false

state.window_frame_range.end - state.window_frame_range.start;
(n_range > 0 && size > 0, true)
}
NthValueKind::Last => (true, false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case we use is_last flag, second flag should be true

NthValueKind::Nth(n) => {
let n_range =
state.window_frame_range.end - state.window_frame_range.start;
(n_range >= (n as usize) && size >= (n as usize), true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we use is_last flag, second flag should be false

}
};
if is_prunable {
if self.state.finalized_result.is_none() && new_prunable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case we use is_last flag, condition should be if self.state.finalized_result.is_none() && !is_last {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the intent is clearer using the term is_last. Thank you for that suggestion . Done in 4542799

@alamb alamb merged commit a56ae74 into apache:main Jun 13, 2023
@alamb alamb deleted the alamb/simplify_state branch June 13, 2023 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Changes to the physical-expr crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants