Skip to content

Add a fine grain memory usage tracking / break down of memory usage *within* each operator #16904

@rluvaton

Description

@rluvaton

Is your feature request related to a problem or challenge?

Yes, debugging memory problems are hard, when running DF in production and the memory pool is unable to grow the memory it will return ResourcesExhausted error or panic depending on the memory pool implementation, however we don't know what takes all the memory.

In my case I had to manually patch DataFusion row_hash file to print on error what every accumulator takes and the internals

Describe the solution you'd like

I want to add new API - that has size function (not required just because I think it should be combined with explain_memory) and explain_memory function or something (similar to Debug trait) to get string with breakdown of the size with every thing that takes memory.

So for GroupedHashAggregateStream it would say how much group values takes, and for each accumulators it would call the explain memory on each

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions