You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
Yes, debugging memory problems are hard, when running DF in production and the memory pool is unable to grow the memory it will return ResourcesExhausted error or panic depending on the memory pool implementation, however we don't know what takes all the memory.
In my case I had to manually patch DataFusion row_hash file to print on error what every accumulator takes and the internals
Describe the solution you'd like
I want to add new API - that has size function (not required just because I think it should be combined with explain_memory) and explain_memory function or something (similar to Debug trait) to get string with breakdown of the size with every thing that takes memory.
So for GroupedHashAggregateStream it would say how much group values takes, and for each accumulators it would call the explain memory on each