-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Make dlmalloc and emmalloc align to max_align_t #10110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks like the problem with emmalloc lies deeper than I thought. |
kripken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks good, but we do need to estimate the cost of this change, which I'm not sure how best to do.
@juj is working on a large emmalloc refactor so we may want to wait for that to land first.
|
Perhaps we can resolve this with expanding to |
sbc100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juj do you have a use case in mind where using the wider/correct alignment would be a significant regression. I'd rather not add more options unless there is a real need.
| @@ -0,0 +1,21 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this file in tests/core?
I may just revert the emmalloc changes here and fix dlmalloc only, until the new emmalloc is in. |
f44c72d to
e2e35ad
Compare
Required by C and C++ standards. emmalloc change incoming.
e2e35ad to
d9429ae
Compare
|
The underlying issue came up again, so I remembered we didn't resolve this. It might be simplest to just switch to 16-byte alignment for now, and if there are perf issues, to consider a malloc variation for it. But hopefully that's not an issue. We should verify at least the havlak and lua benchmarks, which are very malloc heavy, that they don't regress. I can do that if you want @Akaricchi |
I may be in the minority with the memory layouting concerns.. I can localmod this to 8 bytes in our project to avoid the trouble. |
|
Benchmarking this, it regresses Havlak by 5%, and also 1-2% on a few others like lua-binarytrees and poppler. That makes sense I guess since Havlak does tons of tiny allocations, and this increases the effective sizes of those. Given that, I think we maybe want to add this as an option. One possible way to go is to keep dlmalloc as it is as the default, and add a dlmalloc-16 option. Not sure what to do with emmalloc - as it is focused on code size anyhow and not speed, maybe the speed issue matters less and we can just make the change? On the other hand, keeping emmalloc as it is may be fine too, as if someone needs 16-byte alignment they can use dlmalloc. |
|
I wouldn't be opposed to a separate dlmalloc, but having it just do the right thing would be better, IMO. AFAICT this is blocking some UBSan usage as it triggers in lots of places. |
|
@trybka I think the regression is big enough that we shouldn't do this by default. But if it blocks something like ubsan specifically then perhaps we can enable it by default there (those wouldn't be production builds anyhow). That would add some complexity though. But we can figure out those details later. I think the first step is to land something for this that is off by default. Specifically, a new variation on dlmalloc, and with no changes to emmalloc for now. @Akaricchi does that make sense? And would you have time to update this PR for that? (If not I can look into it.) |
|
@kripken if you ask me, no, that does not make sense. It does not make any sense to intentionally violate the standard by default in the name of performance. There is a reason why stuff like The real proper fix would've been to make
This is a horrible idea. By doing this, you're allowing a class of bugs to sneak past debug builds undetected, and possibly make it into production builds. UBSan is supposed to make debugging easier by exposing problems with your code, not harder by masking them. The fact that UBSan complains about the current behavior is in itself evidence that this behavior is wrong. |
|
Fair points! Especially about UBSan.
I think we also need it for v128? If not, and if long double is the only reason, then I think this might be a good enough reason to reconsider our long double policy. We compromised there because we thought the only downside was some extra tooling work for printf of long doubles etc., but if it slows us down by up to 5% on malloc-heavy benchmarks, I think that's the wrong tradeoff.
I don't know that much about ICC, but please let's not be so negative about things. What I mean is, let's talk about other projects in a way that wouldn't insult them if they read what we say. (I think I might be insulted if things were reversed - not in a big way, but it's unnecessary.) |
No, malloc only needs to guarantee sufficient alignment for any of the basic C scalar types. Non-standard stuff like SIMD types aren't taken into account; you're expected to use Also, just to clarify (as there's been minor confusion over this in another discussion before): |
|
In any case, aligning to |
|
@Akaricchi I see, thanks! Given that, I think we should reduce the alignment of long doubles in the LLVM target. That seems like the best option by far. But let's see what LLVM people think, cc @dschuff @tlively @aheejin @sbc100 Separately from that, making them align to |
|
Changing |
|
I agree that we shouldn't keep malloc's alignment as less than On the face of it, aligning long doubles to 8 seems ok, since they are software-implemented there's no particular need for them to be further aligned. As @tlively says, that would be an ABI break though. As a side note, if we are going to be offering compiler flags, maybe it makes more sense to have one that changes the alignment of long double (e.g. like the x86 |
|
One of the considerations when this came up before was simd128. It doesn't require 16-byte alignment for correctness, but many hardware architectures have slowdowns when operating on misaligned data. Does Emscripten's benchmark suite include any SIMD benchmarks? |
No, we don't have any real-world SIMD benchmarks in the repo. @Akaricchi was saying above that max_align_t only considers standard C types and that SIMD programmers should use aligned_alloc if they need to allocate space for vectors. I don't know how often that is done in practice, but I wouldn't have a problem making this change and sticking a note about it in the SIMD docs. |
|
Documenting this for SIMD makes sense I think. If we go that route, we should probably have a (default-off) build option to increase the default alignment of |
Considering this change is standards compliant, I don't think folks will have to change much code. And even if they do, I don't think this is worth adding an option for. |
|
I don't feel strongly either way, but I am a little worried that while this would be standards-compliant, most platforms have |
|
|
@Akaricchi Yeah, you're right, I wasn't being precise I guess. I was thinking of 64-bit platforms on the desktop mostly. It's true nothing changes for SIMD code in emscripten, but I worry about new codebases being ported. But your point is valid that maybe that risk is low. So maybe we'll never need an option for 16. We can change to 8, and wait to see if there's a need I guess. |
|
FWIW a comment about the alignment of malloc in the documentation definitely wouldn't hurt. |
Hmm, are you sure? That contradicts my experiments, but maybe I am not measuring the correct thing: Is this different from the alignment of max_align_t? |
|
@sbc100 I'm not sure what |
| self.set_setting('MALLOC', malloc) | ||
| self.emcc_args += ['-std=gnu11'] | ||
| src = open(path_from_root('tests', 'core', 'test_malloc_alignment.c')).read() | ||
| self.do_run(src, '', basename='src.c') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These days you can just write self.do_runf(path_from_root('tests', 'core', 'test_malloc_alignment.c'))
|
Just going to throw my uninformed opinion in here and say that allocators that align to <16 in the days of SIMD and 64-bit seems like a really bad idea. In a lot of code it may be unpractical to guarantee all memory you use comes from I've spent waaaay too many unnecessary cycles on various game engines crashing on win 32-bit 8-byte aligned SIMD values in a past life. :) |
|
@aardappel Do you think that would still be an issue (e.g. in game engines) if your past life was now? Alignment issues aren't what they used to be; e.g. a very concrete example, on Intel the |
|
@dschuff those exact issues wouldn't be a problem anymore since we wouldn't ship a 32-bit Windows binary anymore :) But if emscripten defaults to an 8-byte aligned allocator and a SIMD value on such a boundary is way slower (in the worst case a trap that has to be caught), then the problem remains the same. In a game engine, it is common that things like positions, directions, colors, velocities of things are types that underneath become SIMD types. These kinds of values are put in all sorts of objects, and objects inside other objects etc, to the point that knowing which objects requires 16-bit alignment becomes very opaque. Mix in there are variety of allocators and ways objects are allocated, makes any 8-byte alignment a minefield. In some sense it is worse in Wasm because some things just being slower is even harder to figure out. Your game could run 10% faster if you just changed the allocator, but you have no way of knowing, and you just shrug and assume it's normal. |
|
Re instructions that will fault: ARMv7 and ARMv8 (both 32 and 64) supports unaligned loads and stores of SIMD register (depends on SCTLR bit that can be controlled). |
|
Is there any hope that this can get finished? Been waiting a year unable to build my project. |
By default we now use `alignof(max_align_t)` in both `emmalloc` and `dlmalloc`. This is a requirement of the C and C++ standards. Developers can opt into the old behviour using `-s MALLOC=dlmalloc-align8` or `-s MALLOC=emmalloc-align8`. Based on #10110 which was authored by @Akaricchi. Fixes: #10072
|
OK, I took a stab a version of this change that should make everybody happy: #14456. I've added |
By default we now use `alignof(max_align_t)` in both `emmalloc` and `dlmalloc`. This is a requirement of the C and C++ standards. Developers can opt into the old behviour using `-s MALLOC=dlmalloc-align8` or `-s MALLOC=emmalloc-align8`. Based on #10110 which was authored by @Akaricchi. Fixes: #10072
By default we now use `alignof(max_align_t)` in both `emmalloc` and `dlmalloc`. This is a requirement of the C and C++ standards. Developers can opt into the old behviour using `-s MALLOC=dlmalloc-align8` or `-s MALLOC=emmalloc-align8`. Based on #10110 which was authored by @Akaricchi. Fixes: #10072
|
Closing as superseded by #14456 |
By default we now use `alignof(max_align_t)` in both `emmalloc` and `dlmalloc`. This is a requirement of the C and C++ standards. Developers can opt into the old behviour using `-s MALLOC=dlmalloc-align8` or `-s MALLOC=emmalloc-align8`. Based on #10110 which was authored by @Akaricchi. Fixes: #10072
By default we now use `alignof(max_align_t)` in both `emmalloc` and `dlmalloc`. This is a requirement of the C and C++ standards. Developers can opt into the old behviour using `-s MALLOC=dlmalloc-align8` or `-s MALLOC=emmalloc-align8`. Based on #10110 which was authored by @Akaricchi. Fixes: #10072
By default we now use `alignof(max_align_t)` in both `emmalloc` and `dlmalloc`. This is a requirement of the C and C++ standards. Developers can opt into the old behviour using `-s MALLOC=dlmalloc-align8` or `-s MALLOC=emmalloc-align8`. Based on #10110 which was authored by @Akaricchi. Fixes: #10072
By default we now use `alignof(max_align_t)` in both `emmalloc` and `dlmalloc`. This is a requirement of the C and C++ standards. Developers can opt into the old behviour using `-s MALLOC=dlmalloc-align8` or `-s MALLOC=emmalloc-align8`. Based on #10110 which was authored by @Akaricchi. Fixes: #10072
Fixes #10072
emmalloc also had a small bug: it could return misaligned pointers if base alignment is greater than 8. I fixed that as well.