Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert in mono_marshal_ilgen_init #83804

Closed
tmds opened this issue Mar 23, 2023 · 3 comments
Closed

assert in mono_marshal_ilgen_init #83804

tmds opened this issue Mar 23, 2023 · 3 comments
Assignees
Milestone

Comments

@tmds
Copy link
Member

tmds commented Mar 23, 2023

We're investigating some crashes we see when source-building .NET on a heavy ppc64le machine (100+ CPU, 100+GB RAM).

The stacktrace has the following:

Thread 8 (Thread 0x20001300f080 (LWP 152780) ".NET ThreadPool"):
#0  0x00002000006142b4 in wait4 () from /lib64/libc.so.6
#1  0x000020000061411c in waitpid () from /lib64/libc.so.6
#2  0x0000200000b6fa50 in mono_dump_native_crash_info () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#3  0x0000200000b1efac in mono_handle_native_crash () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#4  0x0000200000b6edd8 in sigabrt_signal_handler () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#5  <signal handler called>
#6  0x00002000005b30cc in __pthread_kill_implementation () from /lib64/libc.so.6
#7  0x000020000055223c in raise () from /lib64/libc.so.6
#8  0x000020000052c70c in abort () from /lib64/libc.so.6
#9  0x0000200000bb6248 in monoeg_assert_abort () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#10 0x00002000009e07d4 in mono_log_write_logfile () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#11 0x00002000009dbb44 in structured_log_adapter () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#12 0x0000200000bb67fc in monoeg_g_logv_nofree () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#13 0x0000200000bb6960 in monoeg_assertion_message () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#14 0x0000200000bb69dc in mono_assertion_message () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#15 0x00002000009c3c04 in mono_marshal_ilgen_init () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#16 0x00002000009c3ad8 in mono_emit_marshal_ilgen () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#17 0x0000200000926db0 in mono_emit_marshal () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#18 0x00002000009cc074 in emit_native_wrapper_ilgen () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#19 0x0000200000928060 in mono_marshal_get_native_wrapper () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#20 0x0000200000a73cdc in mono_jit_compile_method_with_opt () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#21 0x0000200000a6d134 in mono_jit_compile_method () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#22 0x0000200000b22144 in common_call_trampoline () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#23 0x0000200000b21c38 in mono_magic_trampoline () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
...

The interesting part is:

...
#14 0x0000200000bb69dc in mono_assertion_message () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
#15 0x00002000009c3c04 in mono_marshal_ilgen_init () from /root/dotnet-v7.0.103/previous-sdk/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
...

Though there are no line numbers, after spelunking through the code, I think we may be hitting the assert on the first line of this function:

void
mono_install_marshal_callbacks_ilgen (MonoMarshalIlgenCallbacks *cb)
{
g_assert (!ilgen_cb_inited);
g_assert (cb->version == MONO_MARSHAL_CALLBACKS_VERSION);
memcpy (&ilgen_marshal_cb, cb, sizeof (MonoMarshalIlgenCallbacks));
ilgen_cb_inited = TRUE;
}

I imagine this may happen when multiple threads call mono_marshal_ilgen_init, which is more likely on a machine with many cores? Or is there something that ensures there is only a single thread performs the initialization?

cc @lambdageek @omajid @Swapnali911

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 23, 2023
@jandupej
Copy link
Member

Could be related to #74603 @lambdageek

@tmds
Copy link
Member Author

tmds commented Mar 23, 2023

Yes, it's the same issue.

It was fixed by: #77448.

The fix ensures the initialization happens only once:

Installation of callbacks occurs only once before any user code is executing, so there is no longer a race condition. See discussion here: #77383 (comment)

We're observing the issue with 7.0. Can you backport the fix?

@jandupej jandupej removed the untriaged New issue has not been triaged by the area owner label Mar 23, 2023
@steveisok steveisok added this to the 7.0.x milestone Mar 24, 2023
@tmds
Copy link
Member Author

tmds commented Apr 11, 2023

Closing, as the backport to 7.0 has been merged.

@tmds tmds closed this as completed Apr 11, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants