Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.1.2: Assertion failure on arm64 (16 vCPU graviton2) #851

Closed
jkriegshauser opened this issue Feb 14, 2024 · 5 comments
Closed

2.1.2: Assertion failure on arm64 (16 vCPU graviton2) #851

jkriegshauser opened this issue Feb 14, 2024 · 5 comments

Comments

@jkriegshauser
Copy link

jkriegshauser commented Feb 14, 2024

This seems similar to #354 and I suspect the solution is similar.

Mimalloc v2.1.2

Assertion:

mimalloc: assertion failed: at "/builds/omniverse/externals/conan-transition/_build/conancache/67342464/.conan/data/mimalloc/2.1.2/_/_/build/b4cf5d3864c7b8c3fbf572c574794ebe20e5ce13/src/src/alloc.c":109, mi_heap_malloc_small_zero
  assertion: "heap->thread_id == 0 || heap->thread_id == tid"

This is running on a m6g.4xlarge AWS instance with a graviton2 CPU.

At application startup, _mi_heap_main.thread_id is initialized to 0xfffff7fefcd0 and that is what _mi_thread_id() returns as well.

However, at some point the return value of _mi_thread_id() changes:

(gdb) p /x _mi_thread_id()
$12 = 0x2995c90

It's not immediately apparent what triggers it.

Worth pointing out that neither value matches what GDB thinks the thread ID or the LWP is:

(gdb) i thr
  Id   Target Id         Frame
* 1    Thread 0xfffff7fee880 (LWP 27765) "test.unit" __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51

The library is built on a different host using Conan, and I'm not sure what the params of that host are.

Kernel is 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP and GLIBC is 2.27-3ubuntu1.5 on the host that it's running on.

@jkriegshauser jkriegshauser changed the title Assertion failure on arm64 (16 vCPU graviton2) 2.1.2: Assertion failure on arm64 (16 vCPU graviton2) Feb 14, 2024
jkriegshauser added a commit to jkriegshauser/mimalloc that referenced this issue Feb 14, 2024
@daanx
Copy link
Collaborator

daanx commented Mar 2, 2024

That assertion should not fail -- I pushed a possible fix (to dev for now). Can you test it?

@jkriegshauser
Copy link
Author

No dice. Seems like for whatever reason __has_builtin(__builtin_thread_pointer) is never evaluated on my compiler (g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0) even though __builtin_thread_pointer() absolutely compiles and works (see godbolt links below).

When i remove the defined(__has_builtin) && __has_builtin(__builtin_thread_pointer) part of that conditional it builds on this platform, and runs... kinda. I get to a point where mi_free() is called and crashes because segment is 0x0. Looks like I'm calling free() and mi_free() is being called instead somehow. I'm not trying to override malloc/free, in fact I don't want to, so i'm not sure what's going on there, but that's a separate issue.

I did a quick test on godbolt with variations of:

#include <stdio.h>

#if defined(__has_builtin) && !__has_builtin(__builtin_thread_pointer)
#error "Don't have"
#endif

int main()
{
    printf("%p\n", __builtin_thread_pointer());
}

[godbolt] x86-64, gcc 7.1 - 9.5 don't like the compound preprocessor statement #if defined(__has_builtin) && !__has_builtin(__builtin_thread_pointer), but this part succeeds if these are split to separate #if statements. Also odd that it succeeds, but then gives an error on the printf line: error: '__builtin_thread_pointer' is not supported on this target (even though the __has_builtin succeeds)

[godbolt] x86-64 gcc 10.x are okay with the compound preprocessor statement, but still gives the error on the printf line: error: '__builtin_thread_pointer' is not supported on this target (even though the __has_builtin succeeds)

[godbolt] x86-64 gcc 11.1 - 13.2 built succeesfully.

[godbolt] ARM64 gcc 7.5 still doesn't like the compound statement, but will compile separated out, but oddly the interior of the #if __has_builtin(__builtin_thread_pointer) is never reached. In the linked example it should error but it doesn't. So I can call it but it doesn't report that it has it.

@daanx
Copy link
Collaborator

daanx commented Mar 25, 2024

Apologies for the late followup; wow-- thanks for trying all these variations .. but yikes, not sure how to deal with such inconsistent behaviour. Even without thread_pointer the assembly should work I think.. or otherwise the portable fall back to a thread local solution; can you try to run with that option in include/prim.h:248. ? (you would need to disable some of the preceding #ifs.

@jkriegshauser
Copy link
Author

jkriegshauser commented Mar 26, 2024 via email

@daanx
Copy link
Collaborator

daanx commented Apr 25, 2024

I think release v1.8.4 and v2.1.4 should work now without troubles? However, it seems the fix caused build errors on other platforms (issue #883) so I just changed the default to use TLS slots again --@jkriegshauser if you can, please try with the latest dev or dev-slice if this works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants