Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random crashes in System.Net.Mail test suites with mono interpreter #87271

Closed
vargaz opened this issue Jun 8, 2023 · 8 comments · Fixed by #87555
Closed

Random crashes in System.Net.Mail test suites with mono interpreter #87271

vargaz opened this issue Jun 8, 2023 · 8 comments · Fixed by #87555
Labels
area-Codegen-Interpreter-mono blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab

Comments

@vargaz
Copy link
Contributor

vargaz commented Jun 8, 2023

Description

This happens often on CI:
https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-heads-main-f9cccf963c964959af/System.Net.Mail.Functional.Tests/1/console.a97473cd.log?helixlogtype=result

To reproduce:

while true; do MONO_ENV_OPTIONS=--interp ./dotnet.sh build /t:Test /p:Configuration=Release src/libraries/System.Net.Mail/tests/Functional/ || break; done

Environment:

  • dotnet/runtime master on osx arm64

Reproduction Steps

.

Expected behavior

.

Actual behavior

.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

Known Issue Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "Console log: 'System\.Net\.Mail\.Functional\.Tests' from job(.|\n)*mono_dump_native_crash_info",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 8, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 8, 2023
@vargaz vargaz added area-Codegen-Interpreter-mono and removed untriaged New issue has not been triaged by the area owner needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 8, 2023
@ghost
Copy link

ghost commented Jun 8, 2023

Tagging subscribers to this area: @BrzVlad, @kotlarmilos
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

This happens often on CI:
https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-heads-main-f9cccf963c964959af/System.Net.Mail.Functional.Tests/1/console.a97473cd.log?helixlogtype=result

To reproduce:

while true; do MONO_ENV_OPTIONS=--interp ./dotnet.sh build /t:Test /p:Configuration=Release src/libraries/System.Net.Mail/tests/Functional/ || break; done

Environment:

  • dotnet/runtime master on osx arm64

Reproduction Steps

.

Expected behavior

.

Actual behavior

.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

Author: vargaz
Assignees: -
Labels:

area-Codegen-Interpreter-mono

Milestone: -

@vargaz
Copy link
Contributor Author

vargaz commented Jun 8, 2023

@BrzVlad

@BrzVlad
Copy link
Member

BrzVlad commented Jun 8, 2023

----- start Thu 08 Jun 2023 08:40:01 AM UTC =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None)

=================================================================
	Native Crash Reporting
=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

=================================================================
	Native stacktrace:
=================================================================
	0x7f252d988ee2 - Unknown
	0x7f252d93262e - Unknown
	0x7f252d899e96 - Unknown
	0x7f252e117730 - Unknown
	0x7f252e139d3d - Unknown
	0x7f252e1411ae - Unknown
	0x7f252dd538cf - Unknown
	0x7f252e140bca - Unknown
	0x7f252dd52e0d - Unknown
	0x7f252dd538cf - Unknown
	0x7f252dd5395f - Unknown
	0x7f252dd52ee7 - Unknown
	0x7f252dd52f76 - Unknown
	0x7f252e117c7b - Unknown
	0x7f252e117e94 - Unknown
	0x7f252e1160a0 - Unknown
	0x7f252e10e1f5 - Unknown
	0x7f252da3f356 - Unknown
	0x7f252dac8ba7 - Unknown
	0x7f252e10cfa3 - Unknown
	0x7f252dd1806f - Unknown

=================================================================
	External Debugger Dump:
=================================================================
[New LWP 25]
[New LWP 26]
[New LWP 27]
[New LWP 28]
[New LWP 29]
[New LWP 30]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Dwarf Error: Cannot handle DW_FORM_strx1 in DWARF reader [in module /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so.dbg]
Dwarf Error: Cannot handle DW_FORM_strx1 in DWARF reader [in module /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libSystem.Native.so.dbg]
Dwarf Error: Cannot handle DW_FORM_strx1 in DWARF reader [in module /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libSystem.Net.Security.Native.so.dbg]
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
103	../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
  Id   Target Id                                                  Frame 
* 1    Thread 0x7f252dc1a740 (LWP 24) "dotnet"                    __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
  2    Thread 0x7f252cfff700 (LWP 25) "SGen worker"               futex_wait_cancelable (private=0, expected=0, futex_word=0x7f252db81698 <work_cond+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  3    Thread 0x7f252b1ab700 (LWP 26) ".NET EventPipe"            0x00007f252dd0d6f9 in __GI___poll (fds=0x7f2524002710, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
  4    Thread 0x7f252aea9700 (LWP 27) "Finalizer"                 0x00007f252e10e495 in __GI___pthread_timedjoin_ex (threadid=139797485049600, thread_return=0x7f252aea8d60, abstime=0x0, block=<optimized out>) at pthread_join_common.c:89
  5    Thread 0x7f2523fff700 (LWP 28) ".NET SigHandler"           __libc_read (nbytes=1, buf=0x7f2523ffeebf, fd=7) at ../sysdeps/unix/sysv/linux/read.c:26
  6    Thread 0x7f25282f0700 (LWP 29) ".NET Long Runni" (Exiting) __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
  7    Thread 0x7f25236fd700 (LWP 30) ".NET Long Runni" (Exiting) 0x00007f252e1170ca in __waitpid (pid=31, stat_loc=0x7f25236fae80, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30

Thread 7 (Thread 0x7f25236fd700 (LWP 30)):
#0  0x00007f252e1170ca in __waitpid (pid=31, stat_loc=0x7f25236fae80, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x00007f252d989012 in mono_dump_native_crash_info () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#2  0x00007f252d93262e in mono_handle_native_crash () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#3  0x00007f252d899e96 in mono_sigsegv_signal_handler_debug () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#4  <signal handler called>
#5  0x00007f252e139d3d in elf_machine_rela (skip_ifunc=0, reloc_addr_arg=0x7f252310a440, version=0x30, sym=0x7f2522e49d70, reloc=0x7f2522ea45a8, map=0x7f2510185f20) at ../sysdeps/x86_64/dl-machine.h:308
#6  elf_dynamic_do_Rela (skip_ifunc=0, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, map=0x7f2510185f20) at do-rel.h:137
#7  _dl_relocate_object (scope=<optimized out>, reloc_mode=reloc_mode@entry=0, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:258
#8  0x00007f252e1411ae in dl_open_worker (a=a@entry=0x7f25236fcb50) at dl-open.c:377
#9  0x00007f252dd538cf in __GI__dl_catch_exception (exception=exception@entry=0x7f25236fcb30, operate=operate@entry=0x7f252e140f70 <dl_open_worker>, args=args@entry=0x7f25236fcb50) at dl-error-skeleton.c:196
#10 0x00007f252e140bca in _dl_open (file=0x7f252e11aa0f "libgcc_s.so.1", mode=-2147483646, caller_dlopen=0x7f252e117c7b <pthread_cancel_init+43>, nsid=<optimized out>, argc=18, argv=0x7ffe76588998, env=0x7ffe76588a30) at dl-open.c:599
#11 0x00007f252dd52e0d in do_dlopen (ptr=ptr@entry=0x7f25236fcd90) at dl-libc.c:96
#12 0x00007f252dd538cf in __GI__dl_catch_exception (exception=exception@entry=0x7f25236fcd10, operate=operate@entry=0x7f252dd52dd0 <do_dlopen>, args=args@entry=0x7f25236fcd90) at dl-error-skeleton.c:196
#13 0x00007f252dd5395f in __GI__dl_catch_error (objname=objname@entry=0x7f25236fcd68, errstring=errstring@entry=0x7f25236fcd70, mallocedp=mallocedp@entry=0x7f25236fcd67, operate=operate@entry=0x7f252dd52dd0 <do_dlopen>, args=args@entry=0x7f25236fcd90) at dl-error-skeleton.c:215
#14 0x00007f252dd52ee7 in dlerror_run (operate=operate@entry=0x7f252dd52dd0 <do_dlopen>, args=args@entry=0x7f25236fcd90) at dl-libc.c:46
#15 0x00007f252dd52f76 in __GI___libc_dlopen_mode (name=name@entry=0x7f252e11aa0f "libgcc_s.so.1", mode=mode@entry=-2147483646) at dl-libc.c:195
#16 0x00007f252e117c7b in pthread_cancel_init () at ../sysdeps/nptl/unwind-forcedunwind.c:53
#17 0x00007f252e117e94 in _Unwind_ForcedUnwind (exc=0x7f25236fdd70, stop=0x7f252e115f30 <unwind_stop>, stop_argument=0x7f25236fcf10) at ../sysdeps/nptl/unwind-forcedunwind.c:127
#18 0x00007f252e1160a0 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:121
#19 0x00007f252e10e1f5 in __do_cancel () at pthreadP.h:305
#20 __pthread_exit (value=<optimized out>) at pthread_exit.c:28
#21 0x00007f252da3f356 in mono_threads_platform_exit () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#22 0x00007f252dac8ba7 in start_wrapper () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#23 0x00007f252e10cfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#24 0x00007f252dd1806f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f25282f0700 (LWP 29)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007f252e10f7d1 in __GI___pthread_mutex_lock (mutex=0x7f252e156968 <_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:115
#2  0x00007f252e140b66 in _dl_open (file=0x7f252e11aa0f "libgcc_s.so.1", mode=-2147483646, caller_dlopen=0x7f252e117c7b <pthread_cancel_init+43>, nsid=-2, argc=18, argv=0x7ffe76588998, env=0x7ffe76588a30) at dl-open.c:548
#3  0x00007f252dd52e0d in do_dlopen (ptr=ptr@entry=0x7f25282efd90) at dl-libc.c:96
#4  0x00007f252dd538cf in __GI__dl_catch_exception (exception=exception@entry=0x7f25282efd10, operate=operate@entry=0x7f252dd52dd0 <do_dlopen>, args=args@entry=0x7f25282efd90) at dl-error-skeleton.c:196
#5  0x00007f252dd5395f in __GI__dl_catch_error (objname=objname@entry=0x7f25282efd68, errstring=errstring@entry=0x7f25282efd70, mallocedp=mallocedp@entry=0x7f25282efd67, operate=operate@entry=0x7f252dd52dd0 <do_dlopen>, args=args@entry=0x7f25282efd90) at dl-error-skeleton.c:215
#6  0x00007f252dd52ee7 in dlerror_run (operate=operate@entry=0x7f252dd52dd0 <do_dlopen>, args=args@entry=0x7f25282efd90) at dl-libc.c:46
#7  0x00007f252dd52f76 in __GI___libc_dlopen_mode (name=name@entry=0x7f252e11aa0f "libgcc_s.so.1", mode=mode@entry=-2147483646) at dl-libc.c:195
#8  0x00007f252e117c7b in pthread_cancel_init () at ../sysdeps/nptl/unwind-forcedunwind.c:53
#9  0x00007f252e117e94 in _Unwind_ForcedUnwind (exc=0x7f25282f0d70, stop=0x7f252e115f30 <unwind_stop>, stop_argument=0x7f25282eff10) at ../sysdeps/nptl/unwind-forcedunwind.c:127
#10 0x00007f252e1160a0 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:121
#11 0x00007f252e10e1f5 in __do_cancel () at pthreadP.h:305
#12 __pthread_exit (value=<optimized out>) at pthread_exit.c:28
#13 0x00007f252da3f356 in mono_threads_platform_exit () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#14 0x00007f252dac8ba7 in start_wrapper () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#15 0x00007f252e10cfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#16 0x00007f252dd1806f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f2523fff700 (LWP 28)):
#0  __libc_read (nbytes=1, buf=0x7f2523ffeebf, fd=7) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __libc_read (fd=7, buf=0x7f2523ffeebf, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007f252d42e09f in SignalHandlerLoop () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0//libSystem.Native.so
#3  0x00007f252e10cfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4  0x00007f252dd1806f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f252aea9700 (LWP 27)):
#0  0x00007f252e10e495 in __GI___pthread_timedjoin_ex (threadid=139797485049600, thread_return=0x7f252aea8d60, abstime=0x0, block=<optimized out>) at pthread_join_common.c:89
#1  0x00007f252da3f539 in mono_native_thread_join () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#2  0x00007f252dac80c0 in mono_threads_join_threads () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#3  0x00007f252daef58b in finalizer_thread () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#4  0x00007f252dac8aa9 in start_wrapper () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#5  0x00007f252e10cfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f252dd1806f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f252b1ab700 (LWP 26)):
#0  0x00007f252dd0d6f9 in __GI___poll (fds=0x7f2524002710, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f252da153be in ds_ipc_poll () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#2  0x00007f252da127e8 in ds_ipc_stream_factory_get_next_available_stream () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#3  0x00007f252da10dcf in server_thread () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#4  0x00007f252da14811 in ep_rt_thread_mono_start_func () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#5  0x00007f252e10cfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f252dd1806f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f252cfff700 (LWP 25)):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x7f252db81698 <work_cond+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f252db81648 <lock>, cond=0x7f252db81670 <work_cond>) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7f252db81670 <work_cond>, mutex=0x7f252db81648 <lock>) at pthread_cond_wait.c:655
#3  0x00007f252db56723 in thread_func () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#4  0x00007f252e10cfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#5  0x00007f252dd1806f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f252dc1a740 (LWP 24)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007f252e10f7d1 in __GI___pthread_mutex_lock (mutex=0x7f252e156968 <_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:115
#2  0x00007f252e140b66 in _dl_open (file=0x55d1dfafc840 "/root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Debug.dll.so", mode=-2147483647, caller_dlopen=0x7f252da30c0a <mono_dl_open_full+122>, nsid=-2, argc=18, argv=0x7ffe76588998, env=0x7ffe76588a30) at dl-open.c:548
#3  0x00007f252e101256 in dlopen_doit (a=a@entry=0x7ffe76585f20) at dlopen.c:66
#4  0x00007f252dd538cf in __GI__dl_catch_exception (exception=exception@entry=0x7ffe76585ec0, operate=operate@entry=0x7f252e101200 <dlopen_doit>, args=args@entry=0x7ffe76585f20) at dl-error-skeleton.c:196
#5  0x00007f252dd5395f in __GI__dl_catch_error (objname=objname@entry=0x55d1dea870f0, errstring=errstring@entry=0x55d1dea870f8, mallocedp=mallocedp@entry=0x55d1dea870e8, operate=operate@entry=0x7f252e101200 <dlopen_doit>, args=args@entry=0x7ffe76585f20) at dl-error-skeleton.c:215
#6  0x00007f252e101995 in _dlerror_run (operate=operate@entry=0x7f252e101200 <dlopen_doit>, args=args@entry=0x7ffe76585f20) at dlerror.c:170
#7  0x00007f252e1012e6 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#8  0x00007f252da30c0a in mono_dl_open_full () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#9  0x00007f252d9189d9 in load_aot_module () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#10 0x00007f252da49408 in mono_assembly_request_load_from () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#11 0x00007f252da48eca in mono_assembly_request_open () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#12 0x00007f252d9681ae in mono_core_preload_hook () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#13 0x00007f252da4b423 in invoke_assembly_preload_hook () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#14 0x00007f252da48185 in mono_assembly_request_byname () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#15 0x00007f252da47aa2 in mono_assembly_load_reference () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#16 0x00007f252da4c9a8 in mono_class_from_typeref_checked () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#17 0x00007f252da8d476 in method_from_memberref () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#18 0x00007f252da8b87e in mono_get_method_checked () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#19 0x00007f252dae3ab4 in mono_custom_attrs_from_index_checked () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#20 0x00007f252d9c5027 in interp_inline_method () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#21 0x00007f252d9bf740 in interp_inline_newobj () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#22 0x00007f252d9afeea in generate_code () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#23 0x00007f252d9c50c9 in interp_inline_method () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#24 0x00007f252d9bcc81 in interp_transform_call () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#25 0x00007f252d9ac3ff in generate_code () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#26 0x00007f252d9b3722 in generate () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#27 0x00007f252d9b31fd in mono_interp_transform_method () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#28 0x00007f252d9c77c3 in tier_up_method () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#29 0x00007f252d9c78cd in mono_interp_tier_up_frame_patchpoint () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#30 0x00007f252d998c05 in mono_interp_exec_method () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#31 0x00007f252d98a886 in interp_runtime_invoke () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#32 0x00007f252dab08d7 in mono_runtime_invoke_checked () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#33 0x00007f252dab7548 in mono_runtime_exec_main_checked () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#34 0x00007f252d8f32af in mono_jit_exec () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#35 0x00007f252d8f5ae2 in mono_main () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#36 0x00007f252d967e67 in monovm_execute_assembly () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.so
#37 0x00007f252dbb09fe in ?? () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libhostpolicy.so
#38 0x00007f252dbb1a79 in corehost_main () from /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/libhostpolicy.so
#39 0x00007f252dbeeeb3 in ?? () from /root/helix/work/correlation/host/fxr/8.0.0/libhostfxr.so
#40 0x00007f252dbede7d in ?? () from /root/helix/work/correlation/host/fxr/8.0.0/libhostfxr.so
#41 0x00007f252dbea548 in hostfxr_main_startupinfo () from /root/helix/work/correlation/host/fxr/8.0.0/libhostfxr.so
#42 0x000055d1de453350 in ?? ()
#43 0x000055d1de45361f in ?? ()
#44 0x00007f252dc4309b in __libc_start_main (main=0x55d1de453590, argc=18, argv=0x7ffe76588998, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe76588988) at ../csu/libc-start.c:308
#45 0x000055d1de449fb9 in ?? ()

@BrzVlad
Copy link
Member

BrzVlad commented Jun 8, 2023

Looks like pthread_exit crashes ? Doesn't make much sense

@vargaz
Copy link
Contributor Author

vargaz commented Jun 8, 2023

I get random crashes inside the interpreter locally:

 
  =================================================================
  	Native stacktrace:
  =================================================================
  	0x105518538 - /Users/vargaz/git/runtime/artifacts/bin/testhost/net8.0-osx-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.dylib : mono_dump_native_crash_info
  	0x1054c6a30 - /Users/vargaz/git/runtime/artifacts/bin/testhost/net8.0-osx-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.dylib : mono_handle_native_crash
  	0x10542abb0 - /Users/vargaz/git/runtime/artifacts/bin/testhost/net8.0-osx-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.dylib : mono_sigsegv_signal_handler_debug
  	0x18f436a24 - /usr/lib/system/libsystem_platform.dylib : _sigtramp
  	0x10551baf8 - /Users/vargaz/git/runtime/artifacts/bin/testhost/net8.0-osx-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.dylib : mono_interp_exec_method
  	0x105519b54 - /Users/vargaz/git/runtime/artifacts/bin/testhost/net8.0-osx-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.dylib : interp_runtime_invoke
  	0x105660b00 - /Users/vargaz/git/runtime/artifacts/bin/testhost/net8.0-osx-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.dylib : mono_runtime_invoke_checked
  	0x10567a8b8 - /Users/vargaz/git/runtime/artifacts/bin/testhost/net8.0-osx-Release-arm64/shared/Microsoft.NETCore.App/8.0.0/libcoreclr.dylib : start_wrapper
  	0x18f407fa8 - /usr/lib/system/libsystem_pthread.dylib : _pthread_start
  	0x18f402da0 - /usr/lib/system/libsystem_pthread.dylib : thread_start

@steveharter steveharter added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Jun 9, 2023
@BrzVlad
Copy link
Member

BrzVlad commented Jun 9, 2023

The way this issue reproduces is that when doing a call ip is fetched as null from cmethod->code. This pointed to the idea that there is incorrect memory synchronisation between imethod->transformed and imethod->code.

The following fix solves the issue:

diff --git a/src/mono/mono/mini/interp/interp.c b/src/mono/mono/mini/interp/interp.c
index a6ee9e1f3ee..833920a1987 100644
--- a/src/mono/mono/mini/interp/interp.c
+++ b/src/mono/mono/mini/interp/interp.c
@@ -3606,6 +3606,8 @@ method_entry (ThreadContext *context, InterpFrame *frame,
                        frame->stack = (stackval*)context->stack_pointer;
                        return slow;
                }
+       } else {
+               mono_memory_barrier ();
        }
 
        return slow;

Adding a read barrier in the fast path for each call might be quite expensive. It might be worthwhile benchmarking though.

Another possible fix would be to rely on data dependency to ensure data is accessed correctly. We could extract imethod compilation data in a separate structure that is published in the imethod when compilation is done and then, during method execution, access this through it: imethod->transform_data->code, imethod->transform_data->alloca_size etc. This would add a few ptr derefs in the fast path so it probably would have some penalty cost, I assume lower than a read barrier for every call.

Another fix that would add no overhead for calls, would be to reuse the existing idea used for tiering and patch with new imethods. We would have a separate InterpMethod for compiled method and when we compile it, we patch the data items with the new pointer, relying once again on data dependency to ensure the compilation data is accessed correctly.

@vargaz
Copy link
Contributor Author

vargaz commented Jun 9, 2023

Wouldn't getting rid of transformed and checking code instead solve the issue ?

@BrzVlad
Copy link
Member

BrzVlad commented Jun 9, 2023

The same logical problem still remains, that if the method is transformed we expect certain fields to have been properly initialized. We could check for code and maybe this particular issue is fixed, but then immediately during execution imethod->alloca_size, imethod->data_items etc. could also theoretically be invalid.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 14, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 16, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Codegen-Interpreter-mono blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants