-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement rank and dense_rank function and refactor built-in window function evaluation #631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me @jimexist -- I had some suggestions on the code structure but I think this is also just fine as written.
The only thing I suggest is adding end-to-end tests (maybe as a postgres integration test)
} | ||
|
||
impl PartitionEvaluator for NthValueEvaluator { | ||
fn evaluate_partition(&self, partition: Range<usize>) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface makes sense (to pass in the range of rows), though it may make more sense to explicitly pass in values: Vec<ArrayRef>
rather than assume whatever implements the Evaluator was constructed in a way they can be found
} | ||
|
||
/// evaluate the partition evaluator against the partition | ||
fn evaluate_partition(&self, _partition: Range<usize>) -> Result<ArrayRef>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another potential way to model this with a single evaluate function might be:
fn evaluate_partition(&self, _partition: Range<usize>) -> Result<ArrayRef>; | |
fn evaluate_partition(&self, _partition: Range<usize>, _ranks_in_partition: Option<&[Range<usize>])) -> Result<ArrayRef>; |
Rather than having two separate functions with different signatures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I was trying to avoid generation sort partition points because a majority of the functions do not need that. Nth value not needing them, row number not needing values at all - just length info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it were to be consistent then the interface wouldn't need to exist - would reuse code with aggregation window functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having thought of this for a while, i think let's merge this as is.
when arrow 4.4 is released, the partition points is migrated to be an iterator. at that time i can unify both functions and let the laziness do its work (i.e. pass in the iterator in all cases, letting the consumer to decide).
UInt64Array::from_iter_values(ranks_in_partition.iter().enumerate().flat_map( | ||
|(index, range)| { | ||
let len = range.end - range.start; | ||
iter::repeat((index as u64) + 1).take(len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👨🍳 ❤️ -- nice
18ce48f
to
6d4cf41
Compare
Which issue does this PR close?
Closes #555
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?