You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're investigating some crashes we see when source-building .NET on a heavy ppc64le machine (100+ CPU, 100+GB RAM).
The stacktrace has the following:
Thread 8 (Thread 0x20001300f080 (LWP 152780) ".NET ThreadPool"):
#0 0x00002000006142b4 in wait4 () from /lib64/libc.so.6
#1 0x000020000061411c in waitpid () from /lib64/libc.so.6
#2 0x0000200000b6fa50 in mono_dump_native_crash_info () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#3 0x0000200000b1efac in mono_handle_native_crash () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#4 0x0000200000b6edd8 in sigabrt_signal_handler () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#5 <signal handler called>
#6 0x00002000005b30cc in __pthread_kill_implementation () from /lib64/libc.so.6
#7 0x000020000055223c in raise () from /lib64/libc.so.6
#8 0x000020000052c70c in abort () from /lib64/libc.so.6
#9 0x0000200000bb6248 in monoeg_assert_abort () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#10 0x00002000009e07d4 in mono_log_write_logfile () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#11 0x00002000009dbb44 in structured_log_adapter () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#12 0x0000200000bb67fc in monoeg_g_logv_nofree () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#13 0x0000200000bb6960 in monoeg_assertion_message () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#14 0x0000200000bb69dc in mono_assertion_message () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#15 0x00002000009c3c04 in mono_marshal_ilgen_init () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#16 0x00002000009c3ad8 in mono_emit_marshal_ilgen () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#17 0x0000200000926db0 in mono_emit_marshal () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#18 0x00002000009cc074 in emit_native_wrapper_ilgen () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#19 0x0000200000928060 in mono_marshal_get_native_wrapper () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#20 0x0000200000a73cdc in mono_jit_compile_method_with_opt () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#21 0x0000200000a6d134 in mono_jit_compile_method () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#22 0x0000200000b22144 in common_call_trampoline () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#23 0x0000200000b21c38 in mono_magic_trampoline () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
...
The interesting part is:
...
#14 0x0000200000bb69dc in mono_assertion_message () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#15 0x00002000009c3c04 in mono_marshal_ilgen_init () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
...
Though there are no line numbers, after spelunking through the code, I think we may be hitting the assert on the first line of this function:
I imagine this may happen when multiple threads call mono_marshal_ilgen_init, which is more likely on a machine with many cores? Or is there something that ensures there is only a single thread performs the initialization?
The fix ensures the initialization happens only once:
Installation of callbacks occurs only once before any user code is executing, so there is no longer a race condition. See discussion here: #77383 (comment)
We're observing the issue with 7.0. Can you backport the fix?
We're investigating some crashes we see when source-building .NET on a heavy ppc64le machine (100+ CPU, 100+GB RAM).
The stacktrace has the following:
The interesting part is:
Though there are no line numbers, after spelunking through the code, I think we may be hitting the assert on the first line of this function:
runtime/src/mono/mono/metadata/marshal-ilgen.c
Lines 35 to 42 in 3286b32
I imagine this may happen when multiple threads call
mono_marshal_ilgen_init
, which is more likely on a machine with many cores? Or is there something that ensures there is only a single thread performs the initialization?cc @lambdageek @omajid @Swapnali911
The text was updated successfully, but these errors were encountered: