Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for oom=panic (RFC 2116) #43596

Open
1 of 6 tasks
Gankra opened this issue Aug 1, 2017 · 54 comments
Open
1 of 6 tasks

Tracking issue for oom=panic (RFC 2116) #43596

Gankra opened this issue Aug 1, 2017 · 54 comments
Labels
A-allocators Area: Custom and system allocators B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Tracked Libs issues that are tracked on the team's project board. S-tracking-ready-to-stabilize Status: This is ready to stabilize; it may need a stabilization report and a PR T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Gankra
Copy link
Contributor

Gankra commented Aug 1, 2017

Update: This now the tracking issue for a mechanism to make OOM / memory allocation errors panic (and so by default unwind the thread) instead of aborting the process. This feature was accepted in RFC rust-lang/rfcs#2116.

It was separated from #48043, which tracks another feature of the same RFC, since the two features will likely be stabilized at different times.

Blockers

This should be blocked until we can audit all places in the code that allocate to ensure they safely handle unwinding on OOM.

Steps:

Original feature request:


Several users who are invested in the "thread is a task" model that Rust was originally designed around have expressed a desire to unwind in the case of OOM. Specifically each task has large and unpredictable memory usage, and tearing down the first task that encounters OOM is considered a reasonable strategy.

Aborting on OOM is considered unacceptable because it would be expensive to discard all the work that the other tasks have managed to do.

We already (accidentally?) specified that all allocators can panic on OOM, including the global one, with the introduction of generic allocators on collections. So in theory no one "should" be relying on the default oom=abort semantics. This knob would only affect the default global allocator. Presumably there would just be a function somewhere in the stdlib which is cfg'd to be a panic or abort based on this flag?

This is not a replacement for proper fallible allocation handling routines requested in #29802 because most of the potential users of that API build with panic=abort. Also the requesters of this API are unwilling to go through the effort to ensure all their dependents are using fallible allocations everywhere.

I am dubious on this API but I'm filling an issue so that it's written down and everyone who wants it has a place to go and assert that it is in fact good and desirable.

@Gankra
Copy link
Contributor Author

Gankra commented Aug 1, 2017

Note that in previous topics I stated that libunwind doesn't handle OOM well, but this appears to be outdated or incorrect information. They preallocate their memory before an unwind is even requested. In the limit they might run out of memory and give up but it's not like we've lost anything for trying.

@Mark-Simulacrum Mark-Simulacrum added C-feature-request Category: A feature request, i.e: not implemented / a PR. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 3, 2017
@alexcrichton
Copy link
Member

I figured I'd comment on this with respect to #42808 and the outcome there along with the @rust-lang/libs team discussion. It was discovered in #42808 that the integration of the Alloc trait caused a number of code size related regressions which when diagnosed were almost exclusively related to unwind tables and landing pads. This investigation then caused #44049 as the only two stable implementations of allocators don't panic.

Note, though, that this is not necessarily directly related to OOM but just allocator implementations themselves. Should we forbid, in general, panics from allocators? Should we only allow the oom method to panic?

Unsure!

@retep998
Copy link
Member

I personally think it is reasonable to forbid the allocator itself from panicking and instead have the oom method be responsible for handling allocation failures and panicking or aborting as necessary. I also think it would be useful if oom was told the size of the allocation that failed so it could heuristically determine whether it is an actual memory exhaustion event or just someone accidentally specifying a silly large number.

@pnkfelix
Copy link
Member

@retep998 fn oom does receive the AllocErr, which for AllocErr::Exhausted will carry the input Layout that was requested. Does that not suffice for what you describe?

(I personally wanted to try to stuff the Layout onto both AllocErr variants, but I worried about bloating the size of the error type after it was no longer an associated item for the allocator...)

@retep998
Copy link
Member

@pnkfelix Yep, that would do exactly what I want. It's my fault for not looking at the existing API and seeing that we already had that 😛

@SimonSapin SimonSapin added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. A-allocators Area: Custom and system allocators C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Jun 27, 2018
@SimonSapin SimonSapin changed the title oom=panic knob Tracking issue for oom=panic (RFC 2116) Jun 27, 2018
@SimonSapin SimonSapin removed the C-feature-request Category: A feature request, i.e: not implemented / a PR. label Jun 27, 2018
@SimonSapin
Copy link
Contributor

RFC rust-lang/rfcs#2116 was accepted, I’ve repurposed this feature request into a tracking issue. See more in this thread’s original message.

@SimonSapin
Copy link
Contributor

We now have std::alloc::set_alloc_error_hook (tracked at #51245) which is document as not allowed to unwind. If that restriction could be lifted, this hook could be the mechanism to implement this feature (by having the program register at start-up a hook that panics).

@bstrie
Copy link
Contributor

bstrie commented Oct 15, 2018

Minor question: if someone sets oom=panic and panic=abort, should that function identically to oom=abort, or should it be a compilation error with the rationale that such a configuration is likely a mistake on the user's part?

@SimonSapin
Copy link
Contributor

I expect it would be mostly identical, though the error message that’s printed might not be exactly the same. I don’t know if it’s worth forbidding explicitly even if it’s not particularly useful. An observable effect (though again not particularly useful since std::alloc::set_alloc_error_hook also exists) would be that a hook set through std::panic::set_hook is run on OOM.

@ehsanul
Copy link
Contributor

ehsanul commented May 11, 2020

Making allocation errors default to panic has undergone FCP successfully: #66741

I wonder if there's a point to this cargo-level feature anymore?

@SimonSapin
Copy link
Contributor

#66741 is only relevant in programs where libstd is not used at all. This issue is about the behavior of libstd.

@Ekleog
Copy link

Ekleog commented May 16, 2020

I had sent my ideas to rust-lang/wg-allocators#62 ; but as it looks like there already is a tracking issue, let's move the comments here.

Assume a rust server running multiple things on behalf of clients. It uses regular Rust code, that is not written to care specifically about this particular use case.

The server wants to limit the memory consumption clients could force upon it. It does so by having each client's allocations be in an arena allocator, and then wants to abort just that client's connection in case of “OOM.”

Unfortunately, this means that OOM must not abort the whole process, and AFAIU that is what the current behavior is, even with set_alloc_error_hook -- or at least so do the docs read like.

I'm not sure oom=panic as a global setting could work properly in all cases, as potentially panicking requires memory allocation? But I think that already allowing set_alloc_error_hook to panic would allow the user to swap the error hook at the same time as the allocator, and in case of OOM revert to the global allocator and then, once it's possible to allocate again, perform the panic.

Does that make sense?

@Lokathor
Copy link
Contributor

Lokathor commented May 16, 2020

It's probably the case that (a) the amount of memory needed to alloc panic data is fairly low and (b) you'd hit oom from asking for some large amount that can't be offered. In this situation, you'd probably be able to allocate the panic data.

However, if you fail to allocate while panicing you'll just get a panic, and a panic during panic is an abort. So at least you're not trapped in a loop ;P

But also: you can't carelessly change the global allocator while allocated things are present in the world or they'll get potentially freed on the wrong allocator, which is obviously bad.

Basically the entire scenario you want can't be done with rust's current conventions regarding memory allocation. You'd need specialized local-allocator using code, or waiting for alloc-wg to make the entire ecosystem more alloc aware over time.

@Ekleog
Copy link

Ekleog commented May 16, 2020

Just for clarification: I was assuming the existence of a global allocator that is able to use different memory pools for allocation depending on some thread-local. Entering a task's context would set the thread-local to the task's memory pool, and either exiting it or hitting an OOM during it would reset it to the global memory pool. It knows which memory pool to free from without the memory pool being necessarily the same as the one that was configured during allocation, thanks to knowing the address of the memory block to free.

By doing this, I think that what I want to do can be achieved with the current conventions regarding memory allocations, but iff set_alloc_error_hook can be set not to abort, but to panic instead :)

@Ekleog
Copy link

Ekleog commented May 16, 2020

Oh, and I forgot to answer your initial points: unfortunately, a client able to make the server abort is a DoS vulnerability, so… that's not a good thing :( And I can totally see the client being able to fill the pool by making the server allocate one byte at a time -- the server could try to prevent that by keeping its own counter of how much memory each task has used, but it requires potentially large changes in every library used by the server, while the solution based on allocators would work without requiring cooperation from every piece of the code (including those I don't control) :)

bors added a commit to rust-lang-ci/rust that referenced this issue Mar 18, 2022
Implement -Z oom=panic

This PR removes the `#[rustc_allocator_nounwind]` attribute on `alloc_error_handler` which allows it to unwind with a panic instead of always aborting. This is then used to implement `-Z oom=panic` as per RFC 2116 (tracking issue rust-lang#43596).

Perf and binary size tests show negligible impact.
bjorn3 pushed a commit to bjorn3/rust that referenced this issue Mar 20, 2022
Implement -Z oom=panic

This PR removes the `#[rustc_allocator_nounwind]` attribute on `alloc_error_handler` which allows it to unwind with a panic instead of always aborting. This is then used to implement `-Z oom=panic` as per RFC 2116 (tracking issue rust-lang#43596).

Perf and binary size tests show negligible impact.
bjorn3 pushed a commit to bjorn3/rust that referenced this issue Mar 26, 2022
Implement -Z oom=panic

This PR removes the `#[rustc_allocator_nounwind]` attribute on `alloc_error_handler` which allows it to unwind with a panic instead of always aborting. This is then used to implement `-Z oom=panic` as per RFC 2116 (tracking issue rust-lang#43596).

Perf and binary size tests show negligible impact.
@yaahc
Copy link
Member

yaahc commented Jun 15, 2022

@Amanieu what's the current status of this tracking issue? It looks like you implemented the feature in #88098, is this just waiting on an eventual push for stabilization?

@Amanieu
Copy link
Member

Amanieu commented Jun 16, 2022

I think this just needs to be stabilized. We might need to audit the standard library to check that we stay sound in the face of unwinding from allocations. Also I'm not sure what the state of the documentation is.

@joshtriplett joshtriplett added S-tracking-ready-to-stabilize Status: This is ready to stabilize; it may need a stabilization report and a PR and removed S-tracking-impl-incomplete Status: The implementation is incomplete. labels Jul 27, 2022
@kornelski
Copy link
Contributor

What are the chances of stabilizing this soon?
I've written workaround for it #84612 a year ago, and I wonder if I should push for the workaround still.

@nbdd0121
Copy link
Contributor

I don't think std is ready for oom=panic yet. The unwinding path is not allocation-free, so likely currently oom=panic only works when a large allocation fail (but the allocator can still satisfy a few small ones).

@Lokathor
Copy link
Contributor

that still seems like a strict improvement over the current situation.

@kornelski
Copy link
Contributor

@nbdd0121 As long as this behavior is safe (I assume it causes abort like a panic during a panic), I don't think this issue is a blocker to stabilization, because this deficiency doesn't change the public interface of oom=panic. It's a quality-of-implementation issue, but even aborting sometimes is better than aborting every time.

@nbdd0121
Copy link
Contributor

Currently we panic with error message, and that will require allocation, and I consider that a publicly visible behaviour.

@Aaron1011
Copy link
Member

If the error message is fixed, we should be able to pre-allocate the necessary object (though it might require some hacky code in the panic internals).

@nbdd0121
Copy link
Contributor

Even for fixed error message we need to allocate a Box<dyn Any> for it, and since &str is not a ZST, there are actual heap allocation required. We couldn't preallocate it because we need to allocate one of them for each thread (not all threads are created by std APIs).

C++ doesn't have this issue because it uses special __cxa_allocate_exception and __cxa_free_exception instead of normal heap allocation, which we can't do because our API is just a plain Box.

We could make the panic payload a ZST, and that'll avoid allocation problem on unwind path entirely.

All other allocations that we currently have can be avoided, I think.

@SimonSapin
Copy link
Contributor

We’re constrained to Box<dyn Any> because it’s part of std::thread::Result, right?

@Amanieu
Copy link
Member

Amanieu commented Oct 13, 2022

There are 2 levels of allocation when panicking, first a Box<String> or Box<&str> must be allocated to hold the panic message. Then (with dwarf unwinding) an exception objects needs to be allocated on the heap, which holds the Box<Any>.

@nbdd0121
Copy link
Contributor

The exception object doesn't need to be allocated on the heap. It can be #[thread_local].

@fogti
Copy link
Contributor

fogti commented Oct 13, 2022

@nbdd0121 What if the exception object is subsequently passed across a thread boundary, and the thread later terminates while the exception object is still present on some other thread stack or heap?

@Amanieu
Copy link
Member

Amanieu commented Oct 13, 2022

Libunwind exception objects can't be sent to another thread. They are freed when the exception is caught an not rethrown.

However the process of unwinding could trigger additional memory allocations due to drop impls, which could cause another out-of-memory exception to be thrown. C++ handles this by only having one OOM exception object in TLS, if that one is already taken then the exception is allocated normally (with an abort if that allocation fails).

@fogti
Copy link
Contributor

fogti commented Oct 13, 2022

wouldn't it make more sense to collapse multiple out-of-memory exceptions into one (or, like, have a simple counter for the occurrences, not that it matters much), and stop executing that drop impl (I'm not sure how that would affect invariants and such, but the unwinding would be there either way, so I don't think it would change much), are there downsides to that besides even more unpredictable control flow in OOM cases?

@bjorn3
Copy link
Member

bjorn3 commented Oct 13, 2022

and stop executing that drop impl

This breaks the pinning invariant for things on the stack that are pinned and as such is unsound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocators Area: Custom and system allocators B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Tracked Libs issues that are tracked on the team's project board. S-tracking-ready-to-stabilize Status: This is ready to stabilize; it may need a stabilization report and a PR T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests