-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid allocating vector of indices in lexicographical_partition_ranges #998
Conversation
Codecov Report
@@ Coverage Diff @@
## master #998 +/- ##
========================================
Coverage 82.31% 82.32%
========================================
Files 168 168
Lines 48763 49060 +297
========================================
+ Hits 40139 40388 +249
- Misses 8624 8672 +48
Continue to review full report at Codecov.
|
@jimexist I'd like to get your feedback on this change, especially whether the comment in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this code carefully and structurally it looks equivalent to me. However, I agree it would be great to get @jimexist 's take on it as well
@@ -77,24 +74,51 @@ impl<'a> LexicographicalPartitionIterator<'a> { | |||
/// see <https://en.wikipedia.org/wiki/Exponential_search> | |||
#[inline] | |||
fn exponential_search( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please document what start
and end
and bound
are? (specifically how the relate to each other -- I think as written it seems like bound
is some starting index and the search starts at start+bound
and stops at end
indexes?
I'll plan to merge this PR in later this week (arrow 6.5 would get created at the end of next week, so I think we have plenty of time to land this one). Perhaps @jimexist will have a chance to review by then |
Which issue does this PR close?
Closes #997.
Draft for now since I still have to update the code comments and maybe add some more tests.
Are there any user-facing changes?
I haven't run any benchmarks yet but this should improve performance of window functions with high-cardinality partition-by in datafusion.