perf: optimize array_distinct with batched row conversion#20364
Open
lyne7-sc wants to merge 2 commits intoapache:mainfrom
Open
perf: optimize array_distinct with batched row conversion#20364lyne7-sc wants to merge 2 commits intoapache:mainfrom
array_distinct with batched row conversion#20364lyne7-sc wants to merge 2 commits intoapache:mainfrom
Conversation
added 2 commits
February 15, 2026 12:56
Dandandan
reviewed
Feb 15, 2026
|
|
||
| // Convert all values to row format in a single batch for performance | ||
| let converter = RowConverter::new(vec![SortField::new(dt.clone())])?; | ||
| let rows = converter.convert_columns(&[Arc::clone(array.values())])?; |
Contributor
There was a problem hiding this comment.
I think as a follow-up, it might reuse the Rows and HashSet allocations between batches.
Contributor
Author
There was a problem hiding this comment.
Thanks for the suggestion! I'll look into it and try reusing the allocations. Not sure if thread_local will help, but any tips or better approaches you'd recommend?
Dandandan
approved these changes
Feb 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
This PR optimizes the
array_distinctfunction by batching value conversions and utilizing aHashSetfor deduplication.It is a follow-up to #20243.
What changes are included in this PR?
This PR optimizes
array_distinctby:Benchmark
Are these changes tested?
Yes, unit tests exist and pass.
Are there any user-facing changes?
Yes, there is a slight change in the output order. This new behavior is consistent with
array_unionandarray_intersect, where the output order is more intuitive as it preserves the original order of elements in the array.