Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement rank and dense_rank function and refactor built-in window function evaluation #631

Merged
merged 1 commit into from
Jun 28, 2021

Conversation

jimexist
Copy link
Member

@jimexist jimexist commented Jun 27, 2021

Which issue does this PR close?

Closes #555

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Jun 27, 2021
@jimexist jimexist changed the title WIP implement rank function and refactor built-in window function evaluation implement rank function and refactor built-in window function evaluation Jun 27, 2021
@jimexist jimexist marked this pull request as ready for review June 27, 2021 09:14
@alamb alamb changed the title implement rank function and refactor built-in window function evaluation implement rank and dense_rank function and refactor built-in window function evaluation Jun 27, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me @jimexist -- I had some suggestions on the code structure but I think this is also just fine as written.

The only thing I suggest is adding end-to-end tests (maybe as a postgres integration test)

}

impl PartitionEvaluator for NthValueEvaluator {
fn evaluate_partition(&self, partition: Range<usize>) -> Result<ArrayRef> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface makes sense (to pass in the range of rows), though it may make more sense to explicitly pass in values: Vec<ArrayRef> rather than assume whatever implements the Evaluator was constructed in a way they can be found

}

/// evaluate the partition evaluator against the partition
fn evaluate_partition(&self, _partition: Range<usize>) -> Result<ArrayRef>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential way to model this with a single evaluate function might be:

Suggested change
fn evaluate_partition(&self, _partition: Range<usize>) -> Result<ArrayRef>;
fn evaluate_partition(&self, _partition: Range<usize>, _ranks_in_partition: Option<&[Range<usize>])) -> Result<ArrayRef>;

Rather than having two separate functions with different signatures

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was trying to avoid generation sort partition points because a majority of the functions do not need that. Nth value not needing them, row number not needing values at all - just length info

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it were to be consistent then the interface wouldn't need to exist - would reuse code with aggregation window functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having thought of this for a while, i think let's merge this as is.

when arrow 4.4 is released, the partition points is migrated to be an iterator. at that time i can unify both functions and let the laziness do its work (i.e. pass in the iterator in all cases, letting the consumer to decide).

UInt64Array::from_iter_values(ranks_in_partition.iter().enumerate().flat_map(
|(index, range)| {
let len = range.end - range.start;
iter::repeat((index as u64) + 1).take(len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👨‍🍳 ❤️ -- nice

@jimexist
Copy link
Member Author

This looks good to me @jimexist -- I had some suggestions on the code structure but I think this is also just fine as written.

The only thing I suggest is adding end-to-end tests (maybe as a postgres integration test)

integration tests added in #638

@jimexist jimexist force-pushed the impl-rank branch 2 times, most recently from 18ce48f to 6d4cf41 Compare June 28, 2021 09:48
@alamb alamb merged commit 8e12e48 into apache:master Jun 28, 2021
@houqp houqp added the enhancement New feature or request label Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement rank and dense rank window functions
4 participants