Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM during std::vector allocation doesn't terminate the program #11042

Closed
zeux opened this issue Apr 30, 2020 · 22 comments · Fixed by #11079
Closed

OOM during std::vector allocation doesn't terminate the program #11042

zeux opened this issue Apr 30, 2020 · 22 comments · Fixed by #11079

Comments

@zeux
Copy link

zeux commented Apr 30, 2020

The behavior of code that OOMs right now is odd and unexpected:

  • By default, the error thrown by WebAssembly.memory.grow is caught and swallowed in emscripten_realloc_buffer; some other heap access in the future dies with "out of bounds memory access".

  • When assertions are enabled, an error message is printed to the output but it doesn't halt execution either.

Is there a reason why the exception is being caught here in the first place? It would be good to not do that at least if assertions are disabled, and maybe even if they are enabled (& use finally to log the error instead). This will simplify root cause analysis for OOMs.

@kripken
Copy link
Member

kripken commented Apr 30, 2020

We have to catch errors there to know if allocation failed? If it failed then the VM has no more memory to give us. Then we return an error code to our caller, which ends up as a malloc failure. By default ABORTING_MALLOC is set, so it will throw and halt the application (if it is unset, then malloc returns 0 and the application can handle it).

But maybe I don't understand your question?

@kripken
Copy link
Member

kripken commented Apr 30, 2020

Do you see a case where it should halt but doesn't? (or should return 0 from malloc and doesn't?)

@zeux
Copy link
Author

zeux commented Apr 30, 2020

We have to catch errors there to know if allocation failed?

Maybe I was misled by the comment -

  // Grows the asm.js/wasm heap to the given byte size, and updates both JS and asm.js/wasm side views to the buffer.
  // Returns 1 on success, or undefined if growing failed.

Does undefined here mean "JS undefined" or "undefined behavior"?

Then we return an error code to our caller, which ends up as a malloc failure. By default ABORTING_MALLOC is set, so it will throw and halt the application (if it is unset, then malloc returns 0 and the application can handle it).

I am not sure I see this happening... With these command line options, targeting a .js file (so .js + .wasm are produced):

emcc $^ -o $@ -Os -DNDEBUG -s ALLOW_MEMORY_GROWTH=1 -s NODERAWFS=1 -g2

The behavior that I observe is that on an OOM coming from STL (std::vector => operator new), I see the following output:

exception thrown: RuntimeError: invalid index into function table,RuntimeError: invalid index into function table
    at operator new(unsigned long) (wasm-function[60]:0x1d7a)
    at std::__2::allocator_traits<std::__2::allocator<char> >::allocate(std::__2::allocator<char>&, unsigned long) (wasm-function[157]:0x4f02)
    at std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::__grow_by_and_replace(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, char const*) (wasm-function[214]:0x5ac5)
    at std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::append(char const*, unsigned long) (wasm-function[46]:0x19ef)
    at writeJointBindMatrices(std::__2::vector<BufferView, std::__2::allocator<BufferView> >&, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >&, unsigned long&, cgltf_skin const&, QuantizationPosition const&, Settings const&) (wasm-functio
n[512]:0x1102a)
    at process(cgltf_data*, char const*, char const*, std::__2::vector<Mesh, std::__2::allocator<Mesh> >&, std::__2::vector<Animation, std::__2::allocator<Animation> >&, Settings const&, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >&, st
d::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >&, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >&) (wasm-function[717]:0x1732c)
    at gltfpack(char const*, char const*, Settings const&) (wasm-function[445]:0xbfa8)
    at main (wasm-function[795]:0x1c604)
    at Module._main (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4207:56)
    at callMain (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4250:13)

Does that indicate that abort was triggered?

This is with 1.39.12; with an earlier Emscripten version I get this instead:

exception thrown: RuntimeError: memory access out of bounds,RuntimeError: memory access out of bounds
    at wasm-function[705]:0x1d60a
    at wasm-function[793]:0x21a33
    at wasm-function[410]:0xd57c
    at wasm-function[786]:0x20e32
    at Module._main (C:\Users\Arseny\AppData\Roaming\npm\node_modules\gltfpack\bin\gltfpack.js:2:84300)
    at callMain (C:\Users\Arseny\AppData\Roaming\npm\node_modules\gltfpack\bin\gltfpack.js:2:85315)
    at doRun (C:\Users\Arseny\AppData\Roaming\npm\node_modules\gltfpack\bin\gltfpack.js:2:85893)
    at run (C:\Users\Arseny\AppData\Roaming\npm\node_modules\gltfpack\bin\gltfpack.js:2:86065)
    at runCaller (C:\Users\Arseny\AppData\Roaming\npm\node_modules\gltfpack\bin\gltfpack.js:2:84974)
    at removeRunDependency (C:\Users\Arseny\AppData\Roaming\npm\node_modules\gltfpack\bin\gltfpack.js:2:9665)

But maybe there was a fix in 1.39.12 around OOM error handling, unsure.

@kripken
Copy link
Member

kripken commented Apr 30, 2020

Does undefined here mean "JS undefined" or "undefined behavior"?

Oh, heh, sorry for the confusion! It returns a JS undefined. It could also have a return 0 but it would have the same effect (the caller converts a JS undefined into a 0). So we save a little code size here.

If your example is OOMing around the 2GB or 4GB limits, then recent fixes may be necessary. One has landed in #10811 (for 2GB) and one is in #11047 (for 4GB).

Both of the call stacks you mention would be bugs in emscripten - the error should be clearly from malloc (or sbrk, I forget). If those 2 PRs don't fix things, let me know!

@zeux
Copy link
Author

zeux commented Apr 30, 2020

With 1.39.13 the behavior seems to depend on the presence of -g for some reason.

Without -g:

exception thrown: RuntimeError: memory access out of bounds,RuntimeError: memory access out of bounds
    at wasm-function[706]:0x1d6f1
    at wasm-function[794]:0x21b14
    at wasm-function[411]:0xd622
    at wasm-function[787]:0x20f18
    at Module._main (C:\work\meshoptimizer\gltf\bin\gltfpack.js:2:84440)
    at callMain (C:\work\meshoptimizer\gltf\bin\gltfpack.js:2:85455)
    at doRun (C:\work\meshoptimizer\gltf\bin\gltfpack.js:2:86033)
    at run (C:\work\meshoptimizer\gltf\bin\gltfpack.js:2:86205)
    at runCaller (C:\work\meshoptimizer\gltf\bin\gltfpack.js:2:85114)
    at removeRunDependency (C:\work\meshoptimizer\gltf\bin\gltfpack.js:2:9665)

With -g (not sure what prints the 'undefined' text):

undefined
undefined
exception thrown: RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.,RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
    at abort (C:\work\meshoptimizer\gltf\bin\gltfpack.js:1583:9)
    at _abort (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4743:7)
    at dlfree (wasm-function[2437]:0x2d205)
    at operator delete(void*) (wasm-function[2352]:0x29dca)
    at std::__2::_DeallocateCaller::__do_call(void*) (wasm-function[341]:0x7768)
    at std::__2::_DeallocateCaller::__do_deallocate_handle_size(void*, unsigned long) (wasm-function[340]:0x7760)
    at std::__2::_DeallocateCaller::__do_deallocate_handle_size_align(void*, unsigned long, unsigned long) (wasm-function[339]:0x7758)
    at std::__2::__libcpp_deallocate(void*, unsigned long, unsigned long) (wasm-function[338]:0x774e)
    at std::__2::allocator<char>::deallocate(char*, unsigned long) (wasm-function[1272]:0xf2f5)
    at std::__2::allocator_traits<std::__2::allocator<char> >::deallocate(std::__2::allocator<char>&, char*, unsigned long) (wasm-function[1271]:0xf2e9)

With -g2:

exception thrown: RuntimeError: memory access out of bounds,RuntimeError: memory access out of bounds
    at remapNodes(cgltf_data*, std::__2::vector<NodeInfo, std::__2::allocator<NodeInfo> >&, unsigned long&) (wasm-function[713]:0x162fa)
    at process(cgltf_data*, char const*, char const*, std::__2::vector<Mesh, std::__2::allocator<Mesh> >&, std::__2::vector<Animation, std::__2::allocator<Animation> >&, Settings const&, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >&, st
d::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >&, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >&) (wasm-function[717]:0x1729f)
    at gltfpack(char const*, char const*, Settings const&) (wasm-function[445]:0xbfa7)
    at main (wasm-function[795]:0x1c603)
    at Module._main (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4207:56)
    at callMain (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4250:13)
    at doRun (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4286:21)
    at run (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4298:3)
    at runCaller (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4235:18)
    at removeRunDependency (C:\work\meshoptimizer\gltf\bin\gltfpack.js:483:4)

The changes you linked don't seem to be in 1.39.13 so I can check back after 1.39.14.

@kripken
Copy link
Member

kripken commented May 1, 2020

Ok, 1.39.14 was just tagged, and all the changes should be in there.

@zeux
Copy link
Author

zeux commented May 1, 2020

Still happens with 1.39.14:

exception thrown: RuntimeError: memory access out of bounds,RuntimeError: memory access out of bounds
    at parseMeshesObj(fastObjMesh*, cgltf_data*, std::__2::vector<Mesh, std::__2::allocator<Mesh> >&) (wasm-function[733]:0x18861)
    at parseObj(char const*, std::__2::vector<Mesh, std::__2::allocator<Mesh> >&, char const**) (wasm-function[732]:0x1863c)
    at gltfpack(char const*, char const*, Settings const&) (wasm-function[445]:0xbf0e)
    at main (wasm-function[795]:0x1c619)
    at Module._main (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4206:56)
    at callMain (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4249:13)
    at doRun (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4285:21)
    at run (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4297:3)
    at runCaller (C:\work\meshoptimizer\gltf\bin\gltfpack.js:4234:18)
    at removeRunDependency (C:\work\meshoptimizer\gltf\bin\gltfpack.js:481:4)

I'll try to debug this to see what's going on.

@kripken
Copy link
Member

kripken commented May 1, 2020

Ah, can you maybe get me a testcase to debug, or simple steps to build + reproduce?

@zeux
Copy link
Author

zeux commented May 1, 2020

This seems to reproduce the problem:

#include <vector>
#include <stdio.h>

int main()
{
	std::vector<std::vector<int>> v;

	for (int i = 0; ; ++i)
	{
		printf("iteration %d\n", i);
		std::vector<int> vv(1024*1024, i);
		v.push_back(std::move(vv));
	}
}

Compile with:

emcc test.cpp  -s ALLOW_MEMORY_GROWTH=1 -g2 -Os

Running with latest node I get this output:

iteration 495
iteration 496
iteration 497
iteration 498
iteration 499
iteration 500
iteration 501
iteration 502
exception thrown: RuntimeError: invalid index into function table,RuntimeError: invalid index into function table
    at __fwritex (wasm-function[83]:0x3d0e)
    at out (wasm-function[9]:0x24b)
    at printf_core (wasm-function[22]:0xbe3)
    at __vfprintf_internal (wasm-function[80]:0x3a8c)
    at iprintf (wasm-function[74]:0x38ed)
    at main (wasm-function[73]:0x38a4)
    at Module._main (C:\work\meshoptimizer\a.out.js:731:56)
    at callMain (C:\work\meshoptimizer\a.out.js:764:13)
    at doRun (C:\work\meshoptimizer\a.out.js:800:21)
    at run (C:\work\meshoptimizer\a.out.js:812:3)

@kripken
Copy link
Member

kripken commented May 2, 2020

I see, thanks @zeux ... so the issue here is exceptions. The libc++ std::vector error handling for a failed malloc is done using exceptions. So this works as expected with -fexceptions, but without exceptions things go wrong in libc++. In other words, with exceptions disabled libc++ assumes allocations never fail.

I'm not totally sure what to do here. On the one hand, this seems to work as intended. But it also seems like a problem that might be common (but I don't recall it being).

Separately, it looks like growth disables ABORTING_MALLOC, the docs say

// Setting this option [ALLOW_MEMORY_GROWTH] on will disable ABORTING_MALLOC, in other words,
// ALLOW_MEMORY_GROWTH enables fully standard behavior, of both malloc
// returning 0 when it fails, and also of being able to allocate more
// memory from the system as necessary.

Perhaps we should change that - it seems like we should, but this is very old behavior that hasn't been an issue til now...

@zeux
Copy link
Author

zeux commented May 2, 2020

So if exceptions are disabled and memory allocation fails, I'd expect something like std::terminate to get called from the new handler? https://en.cppreference.com/w/cpp/memory/new/set_new_handler

That is, I'd expect that if exception support is compiled out, the (throwing) new operator terminates the program instead of returning 0. If this did happen I'd be happy because I just want the OOM to be prominently highlighted as such at the correct source location.

Alternatively, supporting aborting malloc in memory growth mode would work for me as well.

@zeux zeux changed the title emscripten_realloc_buffer swallows exceptions by default OOM during std::vector allocation doesn't terminate the program May 2, 2020
@kripken
Copy link
Member

kripken commented May 2, 2020

Interesting, yeah, the docs you linked to say

std::set_new_handler [..] The default implementation throws std::bad_alloc. The user can install his own new-handler, which may offer behavior different than the default one.

But the libc++ code has this:

https://github.com/llvm/llvm-project/blob/9ed6f03189ce21a609e4d6933ece5e3fb77ba0e8/libcxx/src/new.cpp#L62-L82

If exceptions are disabled, that does not throw std::bad_alloc, it just ends up returning 0. That seems intentional? (and no place in libc++ calls set_new_handler to set up a default one)

Hopefully people that understand the c++ spec better than me can clarify, I'm confused 😕

@zeux
Copy link
Author

zeux commented May 2, 2020

Using set_new_handler that calls abort() produces this btw:

undefined
undefined
exception thrown: RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.,RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
    at abort (C:\work\meshoptimizer\gltf\bin\gltfpack.js:2:9909)
    at _abort (C:\work\meshoptimizer\gltf\bin\gltfpack.js:2:78356)
    at wasm-function[788]:0x207ff
    at wasm-function[59]:0x1d57
    at wasm-function[474]:0xf098
    at wasm-function[231]:0x69e2
    at wasm-function[885]:0x24db7
    at wasm-function[126]:0x4928
    at wasm-function[813]:0x225e4
    at wasm-function[815]:0x22c40

Which is much better, although I'd love to not see undefined and the -s ASSERTIONS=1 (since calling C abort() doesn't really benefit from -s ASSERTIONS). This works for me though - I don't know to what extent the C++ standard defines the behavior of new & new handler interaction in absence of exception support, since I think -fno-exceptions is effectively a non-standard language dialect anyway.

@zeux
Copy link
Author

zeux commented May 2, 2020

Ah, I guess the JS abort prints undefined twice (to stdout & stderr), and C abort is mapped to JS abort.

function abort(what) {
 if (Module["onAbort"]) {
  Module["onAbort"](what);
 }
 what += "";
 out(what);
 err(what);
 ABORT = true;
 EXITSTATUS = 1;
 var output = "abort(" + what + ") at " + stackTrace();
 what = output;
 throw new WebAssembly.RuntimeError(what);
}

So I suppose the verdict is that things work mostly as they should, it's just that the behavior isn't particularly intuitive. I'm not sure that there are good options here, since changing libc++ or installing a default new handler seem slightly odd, and aborting malloc doesn't seem ideal either as that's not what you'd expect a C malloc to do in OOM cases.

@sbc100
Copy link
Collaborator

sbc100 commented May 2, 2020

Yeah that double printing of error messages I find annoying .. we should look into fixed that.

Looks like it could even print 3 times if you have an onAbort handler that happens to print too :)

@sbc100
Copy link
Collaborator

sbc100 commented May 2, 2020

I think the recommendation to build with -s ASSERTIONS=1 whenever an abort occurs is probably good thing since there could be many things that lead to the abort that might give more information with assertions enabled.

Perhaps we could rephrase is as Building -s ASSERTIONS=1 may yield more information? Anecdotally, in my experience it normally does.

@zeux
Copy link
Author

zeux commented May 2, 2020

might give more information with assertions enabled.

This makes sense to me for any case where abort is called from Emscripten JS library itself, but not necessarily if it's called from C - if a user says abort(); in their program, it feels like the best way to do is to simply abort - if there's any additional information to print, it can be printed before calling abort().

Of course if the Emscripten C standard library has any conditional prints before calling C abort, then passing ASSERTION=1 might still be valuable.

@zeux
Copy link
Author

zeux commented May 2, 2020

I thought about this a bit more and I feel like there needs to be some change in Emscripten. The reason for this is that I realized that the problem isn't merely that operator new returns 0 - it's that 0 is a valid pointer in Wasm.

So consider the default 2GB heap, and a program like the above that allocates many vectors that add up to more than 2 GB. The vectors that don't get enough memory get allocated with array data = 0, but then the code proceeds to write to that pointer.

Effectively this results in silently corrupting a lot of program state. I think this is why the errors that I've been getting from this are so random and odd - you aren't going to get an out-of-bounds access error on the vector that couldn't be allocated, it'll be a random error somewhere else.

What's worse, with a 4GB heap basically all pointers are valid, so you won't ever get an out-of-bounds access error.

Of course this also applies to malloc, but at least for malloc it's normal to expect the users to check for NULL, whereas for C++ containers OOM should lead to an exception or a program crash - and in libc++ in other environments there'd be a crash when dereferencing NULL shortly after allocation at least...

@sbc100
Copy link
Collaborator

sbc100 commented May 2, 2020

I agree. I doesn't seem to make sense that we should disable exceptions and also not terminate when we would otherwise throw std::bad_alloc. I wonder if we, as platform, when we disable exceptions should install handler using set_new_handler? I'd want to look at other platforms that disable exceptions and see what they do. Its hard for me to believe that they return 0 and "hope for a segfault soon enough".

@sbc100
Copy link
Collaborator

sbc100 commented May 2, 2020

It looks like chromium installs a new handler for this purpose: https://source.chromium.org/chromium/chromium/src/+/master:base/process/memory_linux.cc;drc=ccf17f2881e6c077f20d4421ebcee2bbd5486db2;bpv=1;bpt=1;l=58?q=set_new_handler&ss=chromium&originalUrl=https:%2F%2Fcs.chromium.org%2F

I wonder if it make sense to install something like this by default in emscripten?

@kripken
Copy link
Member

kripken commented May 2, 2020

Yeah, thinking on this some more, I strongly agree we should change this. In fact I remember a previous report of this issue, so it's recurring. It's just bad for new to return 0, especially when no-exceptions is our default!

Specifically, I think we should modify the one line in libc++/libc++abi to abort() instead of break;. That would be more compact than calling set_new_handler (which writes to memory). How does that sound?

@sbc100
Copy link
Collaborator

sbc100 commented May 2, 2020

I'd rather do it standard way via set_new_handler TBH, but lets look that costs of doing that in some kind of static ctor over hacking libc++.

And we need to remember to only do this when exception are disabled.

kripken added a commit that referenced this issue May 5, 2020
…ory growth (#11079)

When libc++/libc++abi are built with exceptions disabled, the
new implementation there does not throw an exception for
an error, but it also does nothing else. So new ends up returning
a zero if malloc did, which can break programs.

Technically libc++/libc++abi are doing a reasonable thing here,
just removing all exceptions-related code when exceptions are
disabled. The assumption is likely that a user program would
set a new_handler if an error is desired. For us, we have to change
this as our default mode is to have exceptions disabled, and we
don't want users to need to know they need to do anything.

This makes it abort instead. (Note that without growth this
happened to always work, since we abort on any failing allocation.
With growth enabled, though, malloc returns 0, and we end up
in this situation.)

Fixes #11042 As discussed there I also looked at the option of
installing a set_new_handler that does an abort. That ends up
increasing code size by a little (bit less than 1%) because if adds
a global constructor, a function to the table, and some memory
operations. It seems better to just modify new itself which avoids
all that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants