Avoid allocating vector of indices in lexicographical_partition_ranges #998

jhorstmann · 2021-12-03T18:19:43Z

Which issue does this PR close?

Closes #997.

Draft for now since I still have to update the code comments and maybe add some more tests.

Are there any user-facing changes?

I haven't run any benchmarks yet but this should improve performance of window functions with high-cardinality partition-by in datafusion.

codecov-commenter · 2021-12-03T18:38:07Z

Codecov Report

Merging #998 (caa05c7) into master (e9be49d) will increase coverage by 0.00%.
The diff coverage is 97.36%.

@@           Coverage Diff            @@
##           master     #998    +/-   ##
========================================
  Coverage   82.31%   82.32%            
========================================
  Files         168      168            
  Lines       48763    49060   +297     
========================================
+ Hits        40139    40388   +249     
- Misses       8624     8672    +48

Impacted Files	Coverage Δ
arrow/src/compute/kernels/partition.rs	`97.45% <97.36%> (-0.21%)`	⬇️
arrow/src/array/array_string.rs	`97.08% <0.00%> (-0.83%)`	⬇️
parquet/src/record/reader.rs	`89.83% <0.00%> (-0.63%)`	⬇️
parquet/src/schema/printer.rs	`72.47% <0.00%> (-0.55%)`	⬇️
arrow/src/datatypes/datatype.rs	`65.95% <0.00%> (-0.43%)`	⬇️
arrow/src/datatypes/field.rs	`53.37% <0.00%> (-0.31%)`	⬇️
arrow/src/datatypes/schema.rs	`66.66% <0.00%> (-0.27%)`	⬇️
parquet/src/arrow/arrow_array_reader.rs	`77.87% <0.00%> (-0.15%)`	⬇️
arrow/src/compute/kernels/take.rs	`95.21% <0.00%> (-0.01%)`	⬇️
arrow/src/buffer/immutable.rs	`97.84% <0.00%> (ø)`
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9be49d...caa05c7. Read the comment docs.

jhorstmann · 2021-12-07T16:04:16Z

@jimexist I'd like to get your feedback on this change, especially whether the comment in LexicographicalPartitionIterator::next sounds correct.

alamb

I reviewed this code carefully and structurally it looks equivalent to me. However, I agree it would be great to get @jimexist 's take on it as well

alamb · 2021-12-09T18:08:29Z

arrow/src/compute/kernels/partition.rs

@@ -77,24 +74,51 @@ impl<'a> LexicographicalPartitionIterator<'a> {
 /// see <https://en.wikipedia.org/wiki/Exponential_search>
 #[inline]
 fn exponential_search(


Can you please document what start and end and bound are? (specifically how the relate to each other -- I think as written it seems like bound is some starting index and the search starts at start+bound and stops at end indexes?

alamb · 2021-12-14T20:19:36Z

I'll plan to merge this PR in later this week (arrow 6.5 would get created at the end of next week, so I think we have plenty of time to land this one). Perhaps @jimexist will have a chance to review by then

Avoid allocating vector of indices in lexicographical_partition_ranges

8899191

github-actions bot added the arrow Changes to the arrow crate label Dec 3, 2021

Adjust comments

caa05c7

jhorstmann marked this pull request as ready for review December 7, 2021 16:01

alamb approved these changes Dec 9, 2021

View reviewed changes

Improve comments and remove one unneeded parameter

828965c

alamb merged commit fc343e7 into apache:master Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid allocating vector of indices in lexicographical_partition_ranges #998

Avoid allocating vector of indices in lexicographical_partition_ranges #998

jhorstmann commented Dec 3, 2021

codecov-commenter commented Dec 3, 2021 •

edited

Loading

jhorstmann commented Dec 7, 2021

alamb left a comment

alamb Dec 9, 2021

alamb commented Dec 14, 2021

Avoid allocating vector of indices in lexicographical_partition_ranges #998

Avoid allocating vector of indices in lexicographical_partition_ranges #998

Conversation

jhorstmann commented Dec 3, 2021

Which issue does this PR close?

Are there any user-facing changes?

codecov-commenter commented Dec 3, 2021 • edited Loading

Codecov Report

jhorstmann commented Dec 7, 2021

alamb left a comment

Choose a reason for hiding this comment

alamb Dec 9, 2021

Choose a reason for hiding this comment

alamb commented Dec 14, 2021

codecov-commenter commented Dec 3, 2021 •

edited

Loading