Skip to content

Conversation

gabotechs
Copy link
Contributor

@gabotechs gabotechs commented Jun 9, 2025

Which issue does this PR close?

  • Closes #.

Rationale for this change

The different accumulators of the array_agg function store certain scalar values as part of their state, and for the same reason that the following PR

was needed for the first/last functions, it is also needed here.

What changes are included in this PR?

Reuses the tooling shipped in #15924 for compacting scalar values for the different array_agg accumulators

Are these changes tested?

yes, by new and existing tests

Are there any user-facing changes?

If users are using a bounded memory pool, they might stop seeing certain errors due to failed memory allocations

Comment on lines +3529 to +3530
/// Compacts ([ScalarValue::compact]) the current [ScalarValue] and returns it.
pub fn compacted(mut self) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would there be ay benefit in adding #[inline] this since its a small function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I don't have enough evidence to justify that #[inline] is better here, the function is not really in the hot path of any operation, if you ask me I'd just trust the compiler to do what's right.

@github-actions github-actions bot added common Related to common crate functions Changes to functions implementation labels Jun 9, 2025
@alamb alamb changed the title Fix array_agg memory over accounting Fix array_agg memory over use Jun 11, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gabotechs and @LiaCastaneda -- this makes sense to me.

I also made a small PR to improve the docs too

Ok(())
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified these tests cover the code in this PR -- they fail without the changes in the PR


assertion `left == right` failed
  left: 2652
 right: 732

// storing it here directly copied/compacted avoids over accounting memory
// not used here.
self.values
.push(make_array(copy_array_data(&val.to_data())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this code confusing at first too so I tried to add some additional documentation

Another thing I found might make this code easier to understand would be to refactor this into a function so it looks more like

Suggested change
.push(make_array(copy_array_data(&val.to_data())));
.push(copy_array(val))

Or something like that

/// Copies an array to a new array with mimimal memory overhead
fn copy_array(array: &dyn Array) -> ArrayRef {
..
}

Or something like that .

This is definitely not required just something that occured to me while reviewing

@alamb alamb merged commit 8a2d618 into apache:main Jun 11, 2025
29 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 11, 2025

Thanks again @gabotechs and @LiaCastaneda

gabotechs added a commit to DataDog/datafusion that referenced this pull request Jun 12, 2025
* Fix array_agg memory over accounting

* Add comment

(cherry picked from commit 8a2d618)
gabotechs added a commit to DataDog/datafusion that referenced this pull request Jun 12, 2025
* Fix array_agg memory over use (apache#16346)

* Fix array_agg memory over accounting

* Add comment

(cherry picked from commit 8a2d618)

* Fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants