-
Notifications
You must be signed in to change notification settings - Fork 1.7k
perf: Faster string_agg()
aggregate function (1000x speed for no DISTINCT and ORDER case)
#17837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Amazing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice!!
pub(crate) struct SimpleStringAggAccumulator { | ||
delimiter: String, | ||
// Updating during `update_batch()`. e.g. "foo,bar" | ||
in_progress_string: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud about the name here: I think acc
or accumulated
would also be conventional. But this name is fine!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, updated.
size_of_val(self) + self.delimiter.capacity() + self.in_progress_string.capacity() | ||
} | ||
|
||
fn state(&mut self) -> Result<Vec<ScalarValue>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just asking to understand the Accumulator trait: I see that this and evaluate are the same except for what they return - what is the difference between the two and when they are used, do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state is for per-partition intermediate result, and evaluate()
is the final result.
e.g. for group key1
, it's getting executed in 2 partitions.
partition 1:
-- [INPUT] (foo, bar) --state()--> "foo, bar"
partition 2:
-- [input] (baz) --state--> "baz"
and evaluate()
is called after merge_batch
to combine the above intermediates from all partitions, and get the final result "foo, bar, baz"
I think there is a detailed doc in the Accumulator
interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! And oh yeah, I should read the doc comments on the trait!
fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> { | ||
let string_arr = values.first().ok_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there only one element in values
? That was surprising
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's an array of arg1. The arg is validated during the planning time, and we can also assume it's the right type here (values[0]
is a string array)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Co-authored-by: Vegard Stikbakke <vegard.stikbakke@gmail.com>
Co-authored-by: Vegard Stikbakke <vegard.stikbakke@gmail.com>
CI should pass after #17855 is merged, we can re-run afterwards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @2010YOUY01 -- I agree this is great . I left a few suggestions.
I tested doing this:
# created data
tpchgen-cli -v --tables=partsupp --format=parquet --parts=2 -s 0.1
# run query:
time datafusion-cli -c "select ps_partkey, string_agg(ps_comment, ';') from 'partsupp' group by ps_partkey;"
main:
real 0m50.296s
This branch 😮
real 0m0.706s
// Case `SimpleStringAggAccumulator` | ||
Ok(vec![Field::new( | ||
format_state_name(args.name, "string_agg"), | ||
DataType::LargeUtf8, | ||
true, | ||
) | ||
.into()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to put this as part of SimpleStringAggAccumulator, something like
SimpleStringAggAccumulator::state_fields(args)
Ok(Box::new(SimpleStringAggAccumulator::new(delimiter))) | ||
} else { | ||
// general case | ||
let array_agg_acc = self.array_agg.accumulator(AccumulatorArgs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto here for encapsulating this
pub(crate) struct SimpleStringAggAccumulator { | ||
delimiter: String, | ||
/// Updated during `update_batch()`. e.g. "foo,bar" | ||
accumulated_string: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rater than has_value
perhaps using an option would be better / more rust idomatic and harder to misuse
accumulated_string: Option<String>,
/// because it accumulates the string directly, | ||
/// whereas `StringAggAccumulator` uses `ArrayAggAccumulator`. | ||
#[derive(Debug)] | ||
pub(crate) struct SimpleStringAggAccumulator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is likely much better than what we have. We can probably do better still with a GroupsAccumulator as well
if self.has_value { | ||
self.accumulated_string.push_str(&self.delimiter); | ||
} | ||
|
||
self.accumulated_string.push_str(value); | ||
self.has_value = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you used an option, this could be like
if self.has_value { | |
self.accumulated_string.push_str(&self.delimiter); | |
} | |
self.accumulated_string.push_str(value); | |
self.has_value = true; | |
if let Some(accumulated_value) = self.accumulated_value.as_mut() { | |
accumulated_string.push_str(&self.delimiter); | |
} else { | |
self.accumulated_valie = Some(String::from(&value)) | |
} |
self.accumulated_string.push_str(&self.delimiter); | ||
} | ||
|
||
self.accumulated_string.push_str(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push_str
is 💪
Which issue does this PR close?
part of #17789
Rationale for this change
string_agg
is slow, see the tracking issue for details.This PR added a new
Accumulator
implementation for the simple case: if there is noDISTINCT
andORDER BY
likestring_agg(distinct str, ',' ORDER BY str)
, useSimpleStringAggAccumulator
instead. The originalStringAggAccumulator
is used for the general case with potentialDISTINCT
andORDER BY
s.While @vegarsti is working on a
GroupsAccumulator
solution that can further speed it up potentially, I think this PR is still necessary because the single group case likeselect string_agg(str, ',') from t1
is still using theAccumulator
interface instead ofGroupsAccumulator
I haven't checked the original implementation yet, and I don't know why is it so slow. It's using
array_agg
internally, and memory bloat can be observed. I guess the reason is redundant transcoding, and incorrect operations onStringView
buffers.There are several ongoing work to improve
arrray_agg
: #17829Benchmark
It's around 1000x faster. See the original issue for data setup, scaling factor 0.1 is used for the table.
Before: ~50s
PR: 0.05s
What changes are included in this PR?
SimpleStringAggAccumulator
forstring_agg
DISTINCT
andORDER BY
in thestring_agg()
aggregate function.Are these changes tested?
Existing tests
Are there any user-facing changes?