Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort on some large allocation requests, Panic on other #26951

Open
bluss opened this issue Jul 10, 2015 · 38 comments
Open

Abort on some large allocation requests, Panic on other #26951

bluss opened this issue Jul 10, 2015 · 38 comments
Labels
A-allocators Area: Custom and system allocators C-bug Category: This is a bug. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. P-low Low priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@bluss
Copy link
Member

bluss commented Jul 10, 2015

Compare these two:

let v = vec![0u8; !0];
let u = vec![0u16; !0];

We request a vector with 18446744073709551615 elements.

  • For u8 we receive out of memory (null) from the allocator, and call alloc::oom, which aborts the program: application terminated abnormally with signal 4 (Illegal instruction)
  • For u16, we get an assertion: thread '<main>' panicked at 'capacity overflow', ../src/libcore/option.rs:330

This is inconsistent. Why don't we abort on both of these cases? We abort on too large allocation requests, so those that are even larger could abort too?

@arielb1
Copy link
Contributor

arielb1 commented Jul 11, 2015

@bluss

For the record, your code is the equivalent of

fn main() { let v = ::std::vec::from_elem(0u16, !0); }

In the u16 case, the amount of memory that is supposed to be allocated is 2*(2^64 - 1), which exceeds the size of usize. That is checked for rather explicitly in Vec::with_capacity. Maybe we should abort instead of panicking in that case too.

@bluss
Copy link
Member Author

bluss commented Jul 11, 2015

Additional info: already !0 bytes is more than we can safely allow allocating (due to known fact #22104). We should treat that as capacity overflow, but decline to do so, because it will abort anyway.

@steveklabnik
Copy link
Member

/cc @rust-lang/compiler , I think?

@Gankra
Copy link
Contributor

Gankra commented Jul 16, 2015

This is a @rust-lang/libs problem.

I am of the opinion that aborts should only occur on problems that unwinding can't address. That is, if you overflow the capacity we have to check for that, and we can panic right away. Everything will be cleaned up and the world will be happy.

However if an allocation makes it past our checks and then fails in the allocator, we have historically conservatively assumed that you are legitimately OOM and abort; unwinding can lead to arbitrary other allocations which is presumably bad to do during OOM.

In practice, the platforms we support basically can't OOM naturally (by which I mean, you're asking for the last few bits of free memory), so I reckon if you ever managed to trigger an OOM it's because you managed to request All The Memory, in which case there's plenty of slack space for unwinding. If it's a legit OOM then we'll double panic and crash anyway.

However #14674 mentions that aborting on OOM enables LLVM to better optimize around allocations. I don't know the details.

Edit: Regardless, I do not want to add any kind of "heuristic checks" for "will probably OOM" or "was probably a real OOM".

@Aatch
Copy link
Contributor

Aatch commented Jul 17, 2015

I think that, ultimately, abort vs. unwind behaviour should be left up to the implementation, but the standard library should abort on failed allocations. This particular example is just a red herring. It's only because u8 is a single byte that it doesn't trigger any kind of arithmetic overflow behaviour. It's an interesting quirk, but doesn't really reflect any real inconsistent behaviour.

@Gankra
Copy link
Contributor

Gankra commented Jul 17, 2015

Also to be completely clear to those not super familiar with the issue: the "odd" behaviour described in this issue is completely by design, and not a mistake. Rather this issue is positing that the behaviour should be changed to be more consistent.

@bluss
Copy link
Member Author

bluss commented Jul 24, 2015

Of course. I don't think it's completely by design, it's patched up to be this way. For example, if we know a capacity of > isize::MAX is always too large, why does that abort and not panic?

@Gankra
Copy link
Contributor

Gankra commented Jul 24, 2015

@bluss any time we detect this, I believe we panic. Only alloc(..) == null aborts to my knowledge: https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs

We simply don't even check in some cases where we know any degeneracies will be caught by the allocator (e.g. when doubling capacity on 64-bit).

@Gankra
Copy link
Contributor

Gankra commented Jul 24, 2015

Note that checking for > isize::MAX on 64-bit is almost surely useless since your system will OOM you far before that point using normal growth. If you explicit ask for > isize::MAX then you're basically arbitrarily "lucky". It's almost inconsistent to check, in that regard.

@nagisa
Copy link
Member

nagisa commented Jul 24, 2015

checking for > isize::MAX on 64-bit is almost surely useless since your system will OOM you far before that point using normal growth.

That’s what they said about 32 bits timestamps and addressing too.

@Gankra
Copy link
Contributor

Gankra commented Jul 24, 2015

@nagisa Today this is simply a hard hardware issue: you only get 48 bits of the address space.

@nagisa
Copy link
Member

nagisa commented Jul 24, 2015

Yes, but it is always useful to be aware of future prospects.

@Gankra
Copy link
Contributor

Gankra commented Jul 24, 2015

@nagisa I have thankfully architected the current design to make such an upgrade trivial. In fact, all you have to do is delete some code!

https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs#L445-L453

@steveklabnik
Copy link
Member

So is this not a bug?

@Gankra
Copy link
Contributor

Gankra commented Aug 4, 2015

It's not an accident, at least.

On Tue, Aug 4, 2015 at 2:54 PM, Steve Klabnik notifications@github.com
wrote:

So is this not a bug?


Reply to this email directly or view it on GitHub
#26951 (comment).

@bluss
Copy link
Member Author

bluss commented Aug 4, 2015

I think all conversations I've had about this have said that performance improvement from aborting instead of panic are unlikely; it saves code size in the binary with likely very little to no runtime impact. We don't have numbers.

So I reported this because I'd like to use abort() for both of the "oom" cases mentioned in the issue's description. It is sensible to do after we get a message on abort working (this is a bug).

@Gankra Gankra added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-needs-decision Issue: In need of a decision. labels Aug 4, 2015
@Gankra
Copy link
Contributor

Gankra commented Aug 4, 2015

I think the @rust-lang/libs team just needs to make a final call on this. I think @bluss and I have both made our stances fairly clear, but to summarize (hopefully I'm not misrepresenting bluss):

Always Abort:

  • More consistent (always aborts when too much memory requested)
  • Potential theoretical perf gains to be had (llvm can optimize around straight-up ending the program)

Sometimes Abort:

  • Does "the most friendly thing we can" without incurring unnecessarily overhead (panics can recover -- aborts can't)

Never Abort (wild card!):

  • There is no such thing as a "true" oom. You will be killed by your OS first.
  • Destructors unlikely to allocate and will just trigger a double panic => abort if they do.

@bluss
Copy link
Member Author

bluss commented Aug 5, 2015

Code size may have an impact (from unrelated Servo discussion)

1:

if it takes even 5% off of our code size, that could have some significant impact on Servo's performance, particularly on embedded hardware with relatively small i-cache sizes.

@alexcrichton
Copy link
Member

triage: P-medium

As an action item the libs team has decided to see if it's possible to panic on OOM and we can possibly reevaluate from there.

@rust-highfive rust-highfive added P-medium Medium priority and removed I-nominated I-needs-decision Issue: In need of a decision. labels Aug 5, 2015
@alexcrichton alexcrichton added I-nominated P-medium Medium priority and removed P-medium Medium priority I-nominated labels Aug 5, 2015
@eddyb
Copy link
Member

eddyb commented Aug 6, 2015

Would be interesting to check is a singleton pointer value is enough to unwind without allocating.
The catching mechanism would box an "OOM happened" ZST (which doesn't actually allocate) as Box<Any> for that specific value, instead of transmuting the pointer to Box<Box<Any>>.

In the case of true OOM, it's possible that secondary allocations during unwinding (which should be rare, I can't recall ever seeing an allocating destructor) could succeed due to deallocations earlier in the unwinding process.

@Gankra
Copy link
Contributor

Gankra commented Aug 6, 2015

@eddyb I think our main concern is libunwind allocating. A quick look seems to reveal they never check their mallocs. However maybe there's some macro shenanigans or maybe this is all utility code that isn't called by us. I didn't really dig deep into it (also maybe I checked out the wrong version...).

@Gankra
Copy link
Contributor

Gankra commented Aug 6, 2015

In particular if Rust allocates during OOM that should be fine -- we always check and a double panic is just an abort.

@sfackler
Copy link
Member

sfackler commented Aug 6, 2015

Java pre-allocates an OutOfMemory exception at initialization to ensure it can properly throw when out of memory - it might make sense to do the same thing here if possible.

@steveklabnik
Copy link
Member

IIRC Python does something similar as well.

Sent from my iPhone

On Aug 6, 2015, at 12:06, Steven Fackler notifications@github.com wrote:

Java pre-allocates an OutOfMemory exception at initialization to ensure it can properly throw when out of memory - it might make sense to do the same thing here if possible.


Reply to this email directly or view it on GitHub.

@Aatch
Copy link
Contributor

Aatch commented Aug 7, 2015

Preallocating might be difficult given that Rust code might not be called from inside a Rust main. I guess we could use pre-allocation for when it is called from a Rust program, and then fallback to something else for the called by foreign code case.

@pnkfelix
Copy link
Member

pnkfelix commented Aug 7, 2015

cc me

@eddyb
Copy link
Member

eddyb commented Aug 7, 2015

@Aatch the problem of preallocation seems pointless, since we control what the pointer value means, and if it points to some static in liballoc, then we can treat it as an OOM panic instead of a Box<Box<Any+Send>> value.

@Gankra
Copy link
Contributor

Gankra commented Aug 7, 2015

Note that this is a memory safety issue for a few types: Box::new can't panic today -- this would introduce more exception-safety nonsense.

@pixel27
Copy link

pixel27 commented Aug 19, 2015

I don't really have an opinion on how the problem is solved, I would just recommend fixing the illegal instruction error soonest. I had bad code that was incorrectly calculating a vector length that only kicked in when I tried to run the release mode without arguments. Having the program crash repeatedly with "Illegal Instruction" had me incorrectly assuming the rust compiler was fubar and generating bad code.

Least I'm assuming it's the same issue with Rust 1.2 on a 64 bit machine calling something like:
Vec::with_capacity(33333333333333)

@pythonesque
Copy link
Contributor

Please don't panic on OOM, especially now that catch_panic is safe.

@bluss
Copy link
Member Author

bluss commented Sep 17, 2015

@pixel27 I imagine a good fix would be a better way to report the issue. Maybe an output message before the abort (which I believe is an issue we have filed).

@brson brson added A-allocators Area: Custom and system allocators E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. P-low Low priority I-needs-decision Issue: In need of a decision. and removed P-medium Medium priority labels Aug 4, 2016
@lilith
Copy link

lilith commented Aug 28, 2016

@pythonesque Why? If it's not possible to deal with OOM in Rust, I can't use Rust. See https://internals.rust-lang.org/t/could-we-support-unwinding-from-oom-at-least-for-collections/3673/21 for an exploration of why handling OOM is important.

@eddyb
Copy link
Member

eddyb commented Aug 28, 2016

@nathanaeljones IMO the real problem here is that we don't have an allocator abstraction in libstd yet.

@steveklabnik
Copy link
Member

@nathanaeljones the default, std allocator aborts, but given Rust's low-level nature, you could replace it with one that doesn't. It is a lot more work right now, but as @eddyb said, this is something that will come along in time.

@lilith
Copy link

lilith commented Aug 29, 2016

@steveklabnik Thanks; I'm not too concerned with 'lots of work', but rather if I can produce well-behaved binaries for both FFI and server use.

I started porting Imageflow over to Rust in June, but failed to verify that OOM panics (as widely reported) instead of aborting. I'd assumed that the stabilization of std::panic::catch_unwind meant Rust was ready for my use case. I decided to write a test today and, well, panicked.

It's very unclear to me where the anti-OOM-panic sentiment arises from. Double panics abort.

I've been trying to track down blockers around this, but haven't been able to discover the reasons why movement on graceful OOM handling has been slow. I think I ?might? have found the right issue to discuss: #27700 (comment)

@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 22, 2017
@steveklabnik
Copy link
Member

Triage: still blocked on a libs team decision.

@ehsanul
Copy link
Contributor

ehsanul commented May 11, 2020

Looks like there is now an unstable alloc_error_handler feature gate. Tracking issue: #51540

Mentioned in that issue, there is also an accepted RFC to set a cargo attribute for oom=panic, tracking issue: #43596

And another issue to make the default for handle_alloc_error panic, which has survived the FCP with disposition to merge: #66741

@Amanieu
Copy link
Member

Amanieu commented Sep 22, 2021

We discussed this in the Libs API meeting today: the consensus was that we should pipe "invalid layout" errors through to alloc_error_handler by making it take a enum with either AllocErr and LayoutErr. This would be similar to the existing unstable TryReserveError type, and in fact we could reuse the same type in both places.

@m-ou-se m-ou-se removed the I-needs-decision Issue: In need of a decision. label Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocators Area: Custom and system allocators C-bug Category: This is a bug. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. P-low Low priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests