Tracking issue for oom=panic (RFC 2116) #43596

Gankra · 2017-08-01T18:53:44Z

Update: This now the tracking issue for a mechanism to make OOM / memory allocation errors panic (and so by default unwind the thread) instead of aborting the process. This feature was accepted in RFC rust-lang/rfcs#2116.

It was separated from #48043, which tracks another feature of the same RFC, since the two features will likely be stabilized at different times.

Blockers

This should be blocked until we can audit all places in the code that allocate to ensure they safely handle unwinding on OOM.

Steps:

Implement the RFC
- std::alloc::set_allocation_error_hook (tracked at Tracking issue for the OOM hook (alloc_error_hook) #51245) could potentially be this mechanism, but that hook is currently document as not allowed to unwind.
Adjust documentation (see instructions on forge)
Stabilization PR (see instructions on forge)

Original feature request:

Several users who are invested in the "thread is a task" model that Rust was originally designed around have expressed a desire to unwind in the case of OOM. Specifically each task has large and unpredictable memory usage, and tearing down the first task that encounters OOM is considered a reasonable strategy.

Aborting on OOM is considered unacceptable because it would be expensive to discard all the work that the other tasks have managed to do.

We already (accidentally?) specified that all allocators can panic on OOM, including the global one, with the introduction of generic allocators on collections. So in theory no one "should" be relying on the default oom=abort semantics. This knob would only affect the default global allocator. Presumably there would just be a function somewhere in the stdlib which is cfg'd to be a panic or abort based on this flag?

This is not a replacement for proper fallible allocation handling routines requested in #29802 because most of the potential users of that API build with panic=abort. Also the requesters of this API are unwilling to go through the effort to ensure all their dependents are using fallible allocations everywhere.

I am dubious on this API but I'm filling an issue so that it's written down and everyone who wants it has a place to go and assert that it is in fact good and desirable.

The text was updated successfully, but these errors were encountered:

Gankra · 2017-08-01T18:58:33Z

Note that in previous topics I stated that libunwind doesn't handle OOM well, but this appears to be outdated or incorrect information. They preallocate their memory before an unwind is even requested. In the limit they might run out of memory and give up but it's not like we've lost anything for trying.

alexcrichton · 2017-08-23T21:57:48Z

I figured I'd comment on this with respect to #42808 and the outcome there along with the @rust-lang/libs team discussion. It was discovered in #42808 that the integration of the Alloc trait caused a number of code size related regressions which when diagnosed were almost exclusively related to unwind tables and landing pads. This investigation then caused #44049 as the only two stable implementations of allocators don't panic.

Note, though, that this is not necessarily directly related to OOM but just allocator implementations themselves. Should we forbid, in general, panics from allocators? Should we only allow the oom method to panic?

Unsure!

retep998 · 2017-08-23T22:05:40Z

I personally think it is reasonable to forbid the allocator itself from panicking and instead have the oom method be responsible for handling allocation failures and panicking or aborting as necessary. I also think it would be useful if oom was told the size of the allocation that failed so it could heuristically determine whether it is an actual memory exhaustion event or just someone accidentally specifying a silly large number.

pnkfelix · 2017-08-24T20:32:36Z

@retep998 fn oom does receive the AllocErr, which for AllocErr::Exhausted will carry the input Layout that was requested. Does that not suffice for what you describe?

(I personally wanted to try to stuff the Layout onto both AllocErr variants, but I worried about bloating the size of the error type after it was no longer an associated item for the allocator...)

retep998 · 2017-08-24T20:39:35Z

@pnkfelix Yep, that would do exactly what I want. It's my fault for not looking at the existing API and seeing that we already had that 😛

SimonSapin · 2018-06-27T12:40:45Z

RFC rust-lang/rfcs#2116 was accepted, I’ve repurposed this feature request into a tracking issue. See more in this thread’s original message.

SimonSapin · 2018-06-27T12:46:35Z

We now have std::alloc::set_alloc_error_hook (tracked at #51245) which is document as not allowed to unwind. If that restriction could be lifted, this hook could be the mechanism to implement this feature (by having the program register at start-up a hook that panics).

bstrie · 2018-10-15T05:22:13Z

Minor question: if someone sets oom=panic and panic=abort, should that function identically to oom=abort, or should it be a compilation error with the rationale that such a configuration is likely a mistake on the user's part?

SimonSapin · 2018-10-15T07:03:40Z

I expect it would be mostly identical, though the error message that’s printed might not be exactly the same. I don’t know if it’s worth forbidding explicitly even if it’s not particularly useful. An observable effect (though again not particularly useful since std::alloc::set_alloc_error_hook also exists) would be that a hook set through std::panic::set_hook is run on OOM.

ehsanul · 2020-05-11T02:37:09Z

Making allocation errors default to panic has undergone FCP successfully: #66741

I wonder if there's a point to this cargo-level feature anymore?

SimonSapin · 2020-05-11T13:42:51Z

#66741 is only relevant in programs where libstd is not used at all. This issue is about the behavior of libstd.

Ekleog · 2020-05-16T01:04:15Z

I had sent my ideas to rust-lang/wg-allocators#62 ; but as it looks like there already is a tracking issue, let's move the comments here.

Assume a rust server running multiple things on behalf of clients. It uses regular Rust code, that is not written to care specifically about this particular use case.

The server wants to limit the memory consumption clients could force upon it. It does so by having each client's allocations be in an arena allocator, and then wants to abort just that client's connection in case of “OOM.”

Unfortunately, this means that OOM must not abort the whole process, and AFAIU that is what the current behavior is, even with set_alloc_error_hook -- or at least so do the docs read like.

I'm not sure oom=panic as a global setting could work properly in all cases, as potentially panicking requires memory allocation? But I think that already allowing set_alloc_error_hook to panic would allow the user to swap the error hook at the same time as the allocator, and in case of OOM revert to the global allocator and then, once it's possible to allocate again, perform the panic.

Does that make sense?

Lokathor · 2020-05-16T01:17:03Z

It's probably the case that (a) the amount of memory needed to alloc panic data is fairly low and (b) you'd hit oom from asking for some large amount that can't be offered. In this situation, you'd probably be able to allocate the panic data.

However, if you fail to allocate while panicing you'll just get a panic, and a panic during panic is an abort. So at least you're not trapped in a loop ;P

But also: you can't carelessly change the global allocator while allocated things are present in the world or they'll get potentially freed on the wrong allocator, which is obviously bad.

Basically the entire scenario you want can't be done with rust's current conventions regarding memory allocation. You'd need specialized local-allocator using code, or waiting for alloc-wg to make the entire ecosystem more alloc aware over time.

Ekleog · 2020-05-16T12:37:03Z

Just for clarification: I was assuming the existence of a global allocator that is able to use different memory pools for allocation depending on some thread-local. Entering a task's context would set the thread-local to the task's memory pool, and either exiting it or hitting an OOM during it would reset it to the global memory pool. It knows which memory pool to free from without the memory pool being necessarily the same as the one that was configured during allocation, thanks to knowing the address of the memory block to free.

By doing this, I think that what I want to do can be achieved with the current conventions regarding memory allocations, but iff set_alloc_error_hook can be set not to abort, but to panic instead :)

Ekleog · 2020-05-16T12:40:50Z

Oh, and I forgot to answer your initial points: unfortunately, a client able to make the server abort is a DoS vulnerability, so… that's not a good thing :( And I can totally see the client being able to fill the pool by making the server allocate one byte at a time -- the server could try to prevent that by keeping its own counter of how much memory each task has used, but it requires potentially large changes in every library used by the server, while the solution based on allocators would work without requiring cooperation from every piece of the code (including those I don't control) :)

Implement -Z oom=panic This PR removes the `#[rustc_allocator_nounwind]` attribute on `alloc_error_handler` which allows it to unwind with a panic instead of always aborting. This is then used to implement `-Z oom=panic` as per RFC 2116 (tracking issue rust-lang#43596). Perf and binary size tests show negligible impact.

yaahc · 2022-06-15T18:41:39Z

@Amanieu what's the current status of this tracking issue? It looks like you implemented the feature in #88098, is this just waiting on an eventual push for stabilization?

Amanieu · 2022-06-16T11:42:39Z

I think this just needs to be stabilized. We might need to audit the standard library to check that we stay sound in the face of unwinding from allocations. Also I'm not sure what the state of the documentation is.

kornelski · 2022-10-12T12:40:33Z

What are the chances of stabilizing this soon?
I've written workaround for it #84612 a year ago, and I wonder if I should push for the workaround still.

nbdd0121 · 2022-10-12T20:29:54Z

I don't think std is ready for oom=panic yet. The unwinding path is not allocation-free, so likely currently oom=panic only works when a large allocation fail (but the allocator can still satisfy a few small ones).

Lokathor · 2022-10-12T20:56:55Z

that still seems like a strict improvement over the current situation.

kornelski · 2022-10-12T21:30:08Z

@nbdd0121 As long as this behavior is safe (I assume it causes abort like a panic during a panic), I don't think this issue is a blocker to stabilization, because this deficiency doesn't change the public interface of oom=panic. It's a quality-of-implementation issue, but even aborting sometimes is better than aborting every time.

nbdd0121 · 2022-10-12T21:36:40Z

Currently we panic with error message, and that will require allocation, and I consider that a publicly visible behaviour.

Aaron1011 · 2022-10-12T21:42:38Z

If the error message is fixed, we should be able to pre-allocate the necessary object (though it might require some hacky code in the panic internals).

nbdd0121 · 2022-10-12T21:53:02Z

Even for fixed error message we need to allocate a Box<dyn Any> for it, and since &str is not a ZST, there are actual heap allocation required. We couldn't preallocate it because we need to allocate one of them for each thread (not all threads are created by std APIs).

C++ doesn't have this issue because it uses special __cxa_allocate_exception and __cxa_free_exception instead of normal heap allocation, which we can't do because our API is just a plain Box.

We could make the panic payload a ZST, and that'll avoid allocation problem on unwind path entirely.

All other allocations that we currently have can be avoided, I think.

SimonSapin · 2022-10-12T22:28:50Z

We’re constrained to Box<dyn Any> because it’s part of std::thread::Result, right?

Amanieu · 2022-10-13T00:55:04Z

There are 2 levels of allocation when panicking, first a Box<String> or Box<&str> must be allocated to hold the panic message. Then (with dwarf unwinding) an exception objects needs to be allocated on the heap, which holds the Box<Any>.

nbdd0121 · 2022-10-13T00:57:06Z

The exception object doesn't need to be allocated on the heap. It can be #[thread_local].

fogti · 2022-10-13T07:37:09Z

@nbdd0121 What if the exception object is subsequently passed across a thread boundary, and the thread later terminates while the exception object is still present on some other thread stack or heap?

Amanieu · 2022-10-13T11:55:31Z

Libunwind exception objects can't be sent to another thread. They are freed when the exception is caught an not rethrown.

However the process of unwinding could trigger additional memory allocations due to drop impls, which could cause another out-of-memory exception to be thrown. C++ handles this by only having one OOM exception object in TLS, if that one is already taken then the exception is allocated normally (with an abort if that allocation fails).

fogti · 2022-10-13T19:36:28Z

wouldn't it make more sense to collapse multiple out-of-memory exceptions into one (or, like, have a simple counter for the occurrences, not that it matters much), and stop executing that drop impl (I'm not sure how that would affect invariants and such, but the unwinding would be there either way, so I don't think it would change much), are there downsides to that besides even more unpredictable control flow in OOM cases?

bjorn3 · 2022-10-13T19:51:12Z

and stop executing that drop impl

This breaks the pinning invariant for things on the stack that are pinned and as such is unsound.

Mark-Simulacrum added C-feature-request Category: A feature request, i.e: not implemented / a PR. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 3, 2017

SimonSapin added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. A-allocators Area: Custom and system allocators C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Jun 27, 2018

SimonSapin changed the title ~~oom=panic knob~~ Tracking issue for oom=panic (RFC 2116) Jun 27, 2018

SimonSapin removed the C-feature-request Category: A feature request, i.e: not implemented / a PR. label Jun 27, 2018

SimonSapin mentioned this issue Jun 27, 2018

Tracking issue for try_reserve: RFC 2116 fallible collection allocation #48043

Open

5 tasks

danielhenrymantilla mentioned this issue Mar 25, 2019

Replaced *mut T by Option<NonNull<T>> in box.rs kvark/copyless#1

Merged

prasannavl mentioned this issue May 28, 2019

Add middleware for catching panics http-rs/tide#265

Merged

8 tasks

SimonSapin mentioned this issue Jun 9, 2019

rustc: sane handling for failed memory allocation #46865

Closed

SimonSapin mentioned this issue Nov 13, 2019

Tracking issue for the #[alloc_error_handler] attribute (for no_std + liballoc) #51540

Open

4 tasks

ehsanul mentioned this issue May 11, 2020

Abort on some large allocation requests, Panic on other #26951

Open

Amanieu mentioned this issue May 15, 2020

Scoped allocators and unwinding on OOM rust-lang/wg-allocators#62

Closed

JLockerman mentioned this issue Jun 14, 2022

Likely undefined behavior in PanickingAllocator timescale/timescaledb-toolkit#294

Open

joshtriplett added S-tracking-ready-to-stabilize Status: This is ready to stabilize; it may need a stabilization report and a PR and removed S-tracking-impl-incomplete Status: The implementation is incomplete. labels Jul 27, 2022

piodul mentioned this issue Oct 1, 2022

Add WASM UDF implementation in Rust scylladb/scylladb#11351

Merged

joepio mentioned this issue Oct 31, 2022

Out of memory when dealing with large collections atomicdata-dev/atomic-server#529

Closed

Amanieu mentioned this issue Apr 4, 2024

VecDeque::shrink_to leads to UB if handle_alloc_error unwinds. #123369

Closed

alecmocatta mentioned this issue Jun 19, 2024

-Zoom=panic double panic due to panicking allocating #126683

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issue for oom=panic (RFC 2116) #43596

Tracking issue for oom=panic (RFC 2116) #43596

Gankra commented Aug 1, 2017 •

edited by RalfJung

Loading

Gankra commented Aug 1, 2017 •

edited

Loading

alexcrichton commented Aug 23, 2017

retep998 commented Aug 23, 2017

pnkfelix commented Aug 24, 2017

retep998 commented Aug 24, 2017

SimonSapin commented Jun 27, 2018

SimonSapin commented Jun 27, 2018

bstrie commented Oct 15, 2018

SimonSapin commented Oct 15, 2018

ehsanul commented May 11, 2020

SimonSapin commented May 11, 2020

Ekleog commented May 16, 2020

Lokathor commented May 16, 2020 •

edited

Loading

Ekleog commented May 16, 2020

Ekleog commented May 16, 2020

yaahc commented Jun 15, 2022 •

edited

Loading

Amanieu commented Jun 16, 2022

kornelski commented Oct 12, 2022

nbdd0121 commented Oct 12, 2022

Lokathor commented Oct 12, 2022

kornelski commented Oct 12, 2022

nbdd0121 commented Oct 12, 2022

Aaron1011 commented Oct 12, 2022

nbdd0121 commented Oct 12, 2022

SimonSapin commented Oct 12, 2022

Amanieu commented Oct 13, 2022

nbdd0121 commented Oct 13, 2022

fogti commented Oct 13, 2022

Amanieu commented Oct 13, 2022

fogti commented Oct 13, 2022

bjorn3 commented Oct 13, 2022

Tracking issue for oom=panic (RFC 2116) #43596

Tracking issue for oom=panic (RFC 2116) #43596

Comments

Gankra commented Aug 1, 2017 • edited by RalfJung Loading

Gankra commented Aug 1, 2017 • edited Loading

alexcrichton commented Aug 23, 2017

retep998 commented Aug 23, 2017

pnkfelix commented Aug 24, 2017

retep998 commented Aug 24, 2017

SimonSapin commented Jun 27, 2018

SimonSapin commented Jun 27, 2018

bstrie commented Oct 15, 2018

SimonSapin commented Oct 15, 2018

ehsanul commented May 11, 2020

SimonSapin commented May 11, 2020

Ekleog commented May 16, 2020

Lokathor commented May 16, 2020 • edited Loading

Ekleog commented May 16, 2020

Ekleog commented May 16, 2020

yaahc commented Jun 15, 2022 • edited Loading

Amanieu commented Jun 16, 2022

kornelski commented Oct 12, 2022

nbdd0121 commented Oct 12, 2022

Lokathor commented Oct 12, 2022

kornelski commented Oct 12, 2022

nbdd0121 commented Oct 12, 2022

Aaron1011 commented Oct 12, 2022

nbdd0121 commented Oct 12, 2022

SimonSapin commented Oct 12, 2022

Amanieu commented Oct 13, 2022

nbdd0121 commented Oct 13, 2022

fogti commented Oct 13, 2022

Amanieu commented Oct 13, 2022

fogti commented Oct 13, 2022

bjorn3 commented Oct 13, 2022

Gankra commented Aug 1, 2017 •

edited by RalfJung

Loading

Gankra commented Aug 1, 2017 •

edited

Loading

Lokathor commented May 16, 2020 •

edited

Loading

yaahc commented Jun 15, 2022 •

edited

Loading