-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix array_agg memory over use #16346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix array_agg memory over use #16346
Conversation
/// Compacts ([ScalarValue::compact]) the current [ScalarValue] and returns it. | ||
pub fn compacted(mut self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would there be ay benefit in adding #[inline] this since its a small function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I don't have enough evidence to justify that #[inline] is better here, the function is not really in the hot path of any operation, if you ask me I'd just trust the compiler to do what's right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gabotechs and @LiaCastaneda -- this makes sense to me.
I also made a small PR to improve the docs too
Ok(()) | ||
} | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified these tests cover the code in this PR -- they fail without the changes in the PR
assertion `left == right` failed
left: 2652
right: 732
// storing it here directly copied/compacted avoids over accounting memory | ||
// not used here. | ||
self.values | ||
.push(make_array(copy_array_data(&val.to_data()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this code confusing at first too so I tried to add some additional documentation
Another thing I found might make this code easier to understand would be to refactor this into a function so it looks more like
.push(make_array(copy_array_data(&val.to_data()))); | |
.push(copy_array(val)) |
Or something like that
/// Copies an array to a new array with mimimal memory overhead
fn copy_array(array: &dyn Array) -> ArrayRef {
..
}
Or something like that .
This is definitely not required just something that occured to me while reviewing
Thanks again @gabotechs and @LiaCastaneda |
* Fix array_agg memory over accounting * Add comment (cherry picked from commit 8a2d618)
* Fix array_agg memory over use (apache#16346) * Fix array_agg memory over accounting * Add comment (cherry picked from commit 8a2d618) * Fix test
Which issue does this PR close?
Rationale for this change
The different accumulators of the
array_agg
function store certain scalar values as part of their state, and for the same reason that the following PRwas needed for the first/last functions, it is also needed here.
What changes are included in this PR?
Reuses the tooling shipped in #15924 for compacting scalar values for the different
array_agg
accumulatorsAre these changes tested?
yes, by new and existing tests
Are there any user-facing changes?
If users are using a bounded memory pool, they might stop seeing certain errors due to failed memory allocations