implement rank and dense_rank function and refactor built-in window function evaluation #631

jimexist · 2021-06-27T07:23:15Z

Which issue does this PR close?

Closes #555

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb

This looks good to me @jimexist -- I had some suggestions on the code structure but I think this is also just fine as written.

The only thing I suggest is adding end-to-end tests (maybe as a postgres integration test)

alamb · 2021-06-27T11:24:36Z

datafusion/src/physical_plan/expressions/nth_value.rs

+}
+
+impl PartitionEvaluator for NthValueEvaluator {
+    fn evaluate_partition(&self, partition: Range<usize>) -> Result<ArrayRef> {


This interface makes sense (to pass in the range of rows), though it may make more sense to explicitly pass in values: Vec<ArrayRef> rather than assume whatever implements the Evaluator was constructed in a way they can be found

datafusion/src/physical_plan/expressions/rank.rs

alamb · 2021-06-27T11:35:01Z

datafusion/src/physical_plan/window_functions.rs

+    }
+
+    /// evaluate the partition evaluator against the partition
+    fn evaluate_partition(&self, _partition: Range<usize>) -> Result<ArrayRef>;


Another potential way to model this with a single evaluate function might be:

Suggested change

fn evaluate_partition(&self, _partition: Range<usize>) -> Result<ArrayRef>;

fn evaluate_partition(&self, _partition: Range<usize>, _ranks_in_partition: Option<&[Range<usize>])) -> Result<ArrayRef>;

Rather than having two separate functions with different signatures

Actually I was trying to avoid generation sort partition points because a majority of the functions do not need that. Nth value not needing them, row number not needing values at all - just length info

If it were to be consistent then the interface wouldn't need to exist - would reuse code with aggregation window functions.

having thought of this for a while, i think let's merge this as is.

when arrow 4.4 is released, the partition points is migrated to be an iterator. at that time i can unify both functions and let the laziness do its work (i.e. pass in the iterator in all cases, letting the consumer to decide).

alamb · 2021-06-27T11:37:05Z

datafusion/src/physical_plan/expressions/rank.rs

+            UInt64Array::from_iter_values(ranks_in_partition.iter().enumerate().flat_map(
+                |(index, range)| {
+                    let len = range.end - range.start;
+                    iter::repeat((index as u64) + 1).take(len)


👨‍🍳 ❤️ -- nice

jimexist · 2021-06-28T03:48:21Z

This looks good to me @jimexist -- I had some suggestions on the code structure but I think this is also just fine as written.

The only thing I suggest is adding end-to-end tests (maybe as a postgres integration test)

integration tests added in #638

datafusion/src/physical_plan/expressions/rank.rs

github-actions bot added the datafusion Changes in the datafusion crate label Jun 27, 2021

jimexist force-pushed the impl-rank branch from 32790c4 to ba68e1a Compare June 27, 2021 09:01

jimexist changed the title ~~WIP implement rank function and refactor built-in window function evaluation~~ implement rank function and refactor built-in window function evaluation Jun 27, 2021

jimexist marked this pull request as ready for review June 27, 2021 09:14

alamb changed the title ~~implement rank function and refactor built-in window function evaluation~~ implement rank and dense_rank function and refactor built-in window function evaluation Jun 27, 2021

alamb approved these changes Jun 27, 2021

View reviewed changes

jimexist force-pushed the impl-rank branch from 73405e2 to b07b203 Compare June 27, 2021 15:23

jimexist mentioned this pull request Jun 28, 2021

add integration tests for rank, dense_rank, fix last_value evaluation with rank #638

Merged

Dandandan reviewed Jun 28, 2021

View reviewed changes

datafusion/src/physical_plan/expressions/rank.rs Outdated Show resolved Hide resolved

Dandandan reviewed Jun 28, 2021

View reviewed changes

datafusion/src/physical_plan/expressions/rank.rs Outdated Show resolved Hide resolved

jimexist force-pushed the impl-rank branch 2 times, most recently from 18ce48f to 6d4cf41 Compare June 28, 2021 09:48

add rank and dense rank and refactor window built in functions

198e88f

jimexist force-pushed the impl-rank branch from 6d4cf41 to 198e88f Compare June 28, 2021 13:01

alamb approved these changes Jun 28, 2021

View reviewed changes

alamb merged commit 8e12e48 into apache:master Jun 28, 2021

houqp added the enhancement New feature or request label Jul 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement rank and dense_rank function and refactor built-in window function evaluation #631

implement rank and dense_rank function and refactor built-in window function evaluation #631

jimexist commented Jun 27, 2021 •

edited

Loading

alamb left a comment

alamb Jun 27, 2021

alamb Jun 27, 2021

jimexist Jun 27, 2021

jimexist Jun 27, 2021

jimexist Jun 28, 2021

alamb Jun 27, 2021

jimexist commented Jun 28, 2021

	fn evaluate_partition(&self, _partition: Range<usize>) -> Result<ArrayRef>;
	fn evaluate_partition(&self, _partition: Range<usize>, _ranks_in_partition: Option<&[Range<usize>])) -> Result<ArrayRef>;

implement rank and dense_rank function and refactor built-in window function evaluation #631

implement rank and dense_rank function and refactor built-in window function evaluation #631

Conversation

jimexist commented Jun 27, 2021 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

alamb Jun 27, 2021

Choose a reason for hiding this comment

alamb Jun 27, 2021

Choose a reason for hiding this comment

jimexist Jun 27, 2021

Choose a reason for hiding this comment

jimexist Jun 27, 2021

Choose a reason for hiding this comment

jimexist Jun 28, 2021

Choose a reason for hiding this comment

alamb Jun 27, 2021

Choose a reason for hiding this comment

jimexist commented Jun 28, 2021

jimexist commented Jun 27, 2021 •

edited

Loading