-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Epic] Generate runtime errors if the memory budget is exceeded #3941
Comments
For join operator, we have sort-merge join. When the memory budget is exceeded if we should try sort-merge join first? |
Yes, that would be amazing @xudong963 -- I think it is covered by #1599 I think that getting the MemoryManager hooked into the join (so it knows when it would run out of memory) is probably a prerequisite to automatically switching to sort-merge join during execution. |
Should memory limit be optimistic? What I mean is that in case of aggregation we could first process record batch, compare memory before and after batch is process and request delta value from memory manager. Otherwise we'd need to do two passes over records batch one to calculate required memory other to do actual processing, or, request memory for every record, which may lead to contention on memory manager and trigger spill in middle of batch processing. End of batch processing would be a "safe point" which should have correct memory usage, or trigger spill. |
I think it's OK to have some slack room on top of the limit, which is somewhat controlled by the batch size. It's unlikely that we are going to account for every single byte anyways, since there might be some aux data structures here and there that are heap-allocated. So I would treat this is a "best effort limit w/o impacting performance (too much)". We could later (if there's demand for it) add a config option "strict memory enforcement" that impacts performance. |
One thing I cant really wrap my head around is use of Issue I have is with use of impl Stream for GroupedHashAggregateStreamV2 {
type Item = ArrowResult<RecordBatch>;
fn poll_next(
mut self: std::pin::Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Option<Self::Item>> {
let this = &mut *self;
// await` is only allowed inside `async` functions and blocks only allowed inside `async` functions and blocks
this.mem_manager.try_grow(200).await;
}
} anybody has any better idea how to integrate it? |
https://docs.rs/futures/latest/futures/stream/fn.unfold.html maybe? |
Yes, I agree this would be best strategy: as memory is needed, we request more memory from the memory manager incrementally.
Yes In general I think upgrading / improving the memory manager would likely be fine. |
There is also |
Thanks @alamb, As I said in previous comment, the biggest problem of integrating aggregation with memory manager is easy ways to work around
wdyt? |
I think decoupling try_grow and spill would be my preference. In general, if memory is exceeded we may not actually want to spill for all operators (e.g. an initial hash table might simply flush its output if it exceeded memory rather than trying to spill) cc @yjshen in case he has some additional thoughts |
Hi, sorry to join this party late. I don't quite get why the async is a problem for the memory manager while implementing aggregation. We could do memory-limited aggregation as follows:
The spill procedure would be:
Also, one could refer to Apache Spark's hash aggregate impl if interested. |
thanks for your comment @yjshen, concern raised in previous comments is not spill algorithm for aggregation, it is about interaction between As it is implemented impl Stream for GroupedHashAggregateStreamV2 {
type Item = ArrowResult<RecordBatch>;
fn poll_next(
mut self: std::pin::Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Option<Self::Item>> {
let this = &mut *self;
// await` is only allowed inside `async` functions and blocks only allowed inside `async` functions and blocks
this.mem_manager.try_grow(200).await;
// rest of the code which does aggregation and spill
}
} so the question is how to bridge the gap |
Thanks for the explanation! Sort in DataFusion is currently memory limited, and I think we could apply a similar approach in aggregation like that in sort: https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_plan/sorts/sort.rs#L799-L813. We could async |
I think we should consider reducing our hash implementation repetition prior to implementing spilling #2723 In general I think @yjshen 's algorithm for grouping in limited memory is 💯 I think we can implement it in stages, however, the first stage being tracking the current memory used and erroring when that is exceeded. Then in the second stage, rather than erroring we can implement the externalized / spilling strategy |
Also, as one might imagine given our renewed interest in this ticket, someone from the IOx team may start working on generating runtime errors in the next few days as well |
An update here: We have added memory limit enforcement for Sort / Grouping. I plan to write some sql-level tests for this shortly. The only remaining work item that will remain is memory limiting Joins given that joins are currently undergoing some serious rework (@liukun4515 @jackwener @mingmwang and others, eg. #4377) we don't plan to add memory limits in until that settles down |
I believe we have completed all initial planned work for generating runtime errors if the memory budge is execeeded, so closing this issue 🎉 |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The basic challenge is that DataFusion can use an unbounded amount of memory for running a plan which typically results in DataFusion being killed by some system memory protection limit (e.g. the OOM Killer on Linux). See #587 for more details
As a first step towards supporting larger datasets in DataFusion, if a plan will exceed the overall budget, it should generate a runtime error (resource exhausted) rather than exceeding the budget and risking being killed
There should be a way to keep the current behavior as well (do not error due to resource exhausted)
Describe the solution you'd like
MemoryManagerConfig
MemoryManager
via methods liketry_grow
Needed:
MemoryManager
inSortExec
, and return errors if the memory budget is exceeded: Add ability to disable DiskManager #4330MemoryManager
in Aggregate operators, and return errors if the memory budget is exceeded: Throw a runtime error if the memory allocated to GroupByHash exceeds a limit #3940MemoryManager
in Join operators, and return errors if the memory budget is exceeded #5220Describe alternatives you've considered
We can always increase the accuracy of the memory allocation accounting (e.g.
RecordBatch
es internal to operators, etc). However, for this initial epic I would like to get the major consumers of memory instrumented and using theMemoryManager
interface. Hopefully this will also allowAdditional context
cc @yjshen @crepererum
related to issues like https://github.com/influxdata/influxdb_iox/issues/5776 (and some internal issues of our own)
The text was updated successfully, but these errors were encountered: