Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventPipe hangs with a shared NativeAOT library #89346

Closed
LakshanF opened this issue Jul 22, 2023 · 5 comments
Closed

EventPipe hangs with a shared NativeAOT library #89346

LakshanF opened this issue Jul 22, 2023 · 5 comments

Comments

@LakshanF
Copy link
Contributor

LakshanF commented Jul 22, 2023

EventPipe seems to have an issue when events are written from a shared library. Using a tool like dotnet-trace to collect trace results in a hang.

Repro Steps

  • Use a sample similar to https://github.com/dotnet/samples/tree/main/core/nativeaot/NativeLibrary
  • Add EventSourceSupport property to the project
  • Add an EventSource to the shared library and write some events. This sample has an example of the additional steps above for the sample app above.
  • Start the C program and get a tool like dotnet-trace to collect (for the sample above, can use dotnet-trace collect --providers LaksDemoEventSource,Microsoft-Windows-DotNETRuntime --name MultiAotTest
    • Note that if no trace is collected, the application behaves as expected.
  • This result in the program hanging.
  • Couple of interesting call stacks are given below
[ERROR] Microsoft.Diagnostics.NETCore.Client.ServerNotAvailableException: Could not send Stop command. The target process may have exited.
   at Microsoft.Diagnostics.NETCore.Client.EventPipeSession.Stop() in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/EventPipeSession.cs:line 60
   at Microsoft.Diagnostics.Tools.Trace.CollectCommandHandler.Collect(CancellationToken ct, IConsole console, Int32 processId, FileInfo output, UInt32 buffersize, String providers, String profile, TraceFileFormat format, TimeSpan duration, String clrevents, String clreventlevel, String name, String diagnosticPort, Boolean showchildio, Boolean resumeRuntime) in /_/src/Tools/dotnet-trace/CommandLine/Commands/CollectCommand.cs:line 326

   0  Id: 5870.8a90 Suspend: 1 Teb: 000000d2`8df47000 Unfrozen
 # Child-SP          RetAddr               Call Site
00 000000d2`8dd3ee88 00007ffb`cb1bf499     ntdll!NtWaitForMultipleObjects+0x14
01 000000d2`8dd3ee90 00007ffb`02d71a8a     KERNELBASE!WaitForMultipleObjectsEx+0xe9
02 000000d2`8dd3f170 00007ffb`02d78a45     NativeLibrary!PalCompatibleWaitAny+0x3a [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp @ 288] 
03 (Inline Function) --------`--------     NativeLibrary!ThreadStore::GetCurrentThreadIfAvailable+0x98 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\threadstore.inl @ 35] 
04 000000d2`8dd3f1b0 00007ffb`02dc934d     NativeLibrary!CLREventStatic::Wait+0xc5 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\event.cpp @ 89] 
05 (Inline Function) --------`--------     NativeLibrary!ep_rt_wait_event_wait+0x11 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\eventpipe\ep-rt-aot.h @ 609] 
06 (Inline Function) --------`--------     NativeLibrary!session_disable_streaming_thread+0x29 [D:\a\_work\1\s\src\native\eventpipe\ep-session.c @ 119] 
07 (Inline Function) --------`--------     NativeLibrary!ep_session_disable+0x43 [D:\a\_work\1\s\src\native\eventpipe\ep-session.c @ 426] 
08 000000d2`8dd3f1f0 00007ffb`02dc912f     NativeLibrary!disable_holding_lock+0x17d [D:\a\_work\1\s\src\native\eventpipe\ep.c @ 585] 
09 000000d2`8dd3f2e0 00007ffb`02dcc392     NativeLibrary!disable_helper+0x5f [D:\a\_work\1\s\src\native\eventpipe\ep.c @ 657] 
0a 000000d2`8dd3f340 00007ffb`02dd27ba     NativeLibrary!ep_disable+0x72 [D:\a\_work\1\s\src\native\eventpipe\ep.c @ 1147] 
0b 000000d2`8dd3f370 00007ffb`02d6cf08     NativeLibrary!ep_shutdown+0x8a [D:\a\_work\1\s\src\native\eventpipe\ep.c @ 1433] 
0c 000000d2`8dd3f3a0 00007ffb`cb623a83     NativeLibrary!OnProcessExit+0x28 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\startup.cpp @ 510] 
0d 000000d2`8dd3f3d0 00007ffb`cb5f006e     ucrtbase!<lambda_f03950bc5685219e0bcd2087efbe011e>::operator()+0xab
0e 000000d2`8dd3f420 00007ffb`cb5eda1d     ucrtbase!__crt_seh_guarded_call<int>::operator()<<lambda_7777bce6b2f8c936911f934f8298dc43>,<lambda_f03950bc5685219e0bcd2087efbe011e> &,<lambda_3883c3dff614d5e0c5f61bb1ac94921c> >+0x36
0f 000000d2`8dd3f450 00007ffb`02e04c79     ucrtbase!execute_onexit_table+0x3d
10 000000d2`8dd3f490 00007ffb`02e04d9e     NativeLibrary!dllmain_crt_process_detach+0x45 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp @ 182] 
11 000000d2`8dd3f4d0 00007ffb`cd93868f     NativeLibrary!dllmain_dispatch+0xe6 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp @ 293] 
12 000000d2`8dd3f530 00007ffb`cd961096     ntdll!LdrpCallInitRoutine+0x6b
@LakshanF LakshanF self-assigned this Jul 22, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 22, 2023
@ghost
Copy link

ghost commented Jul 22, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

EventPipe seems to have an issue when events are written from a shared library. Using a tool like dotnet-trace to collect trace results in a hang.

Repro Steps

  • Use a sample similar to https://github.com/dotnet/samples/tree/main/core/nativeaot/NativeLibrary
  • Add EventSourceSupport property to the project
  • Add an EventSource to the shared library and write some events. This sample has an example of the additional steps above for the sample app above.
  • Start the C program and get a tool like dotnet-trace to collect (for the sample above, can use dotnet-trace collect --providers LaksDemoEventSource,Microsoft-Windows-DotNETRuntime --name MultiAotTest
  • This result in the program hanging.
  • Impacted callstack is given below
   0  Id: 5870.8a90 Suspend: 1 Teb: 000000d2`8df47000 Unfrozen
 # Child-SP          RetAddr               Call Site
00 000000d2`8dd3ee88 00007ffb`cb1bf499     ntdll!NtWaitForMultipleObjects+0x14
01 000000d2`8dd3ee90 00007ffb`02d71a8a     KERNELBASE!WaitForMultipleObjectsEx+0xe9
02 000000d2`8dd3f170 00007ffb`02d78a45     NativeLibrary!PalCompatibleWaitAny+0x3a [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp @ 288] 
03 (Inline Function) --------`--------     NativeLibrary!ThreadStore::GetCurrentThreadIfAvailable+0x98 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\threadstore.inl @ 35] 
04 000000d2`8dd3f1b0 00007ffb`02dc934d     NativeLibrary!CLREventStatic::Wait+0xc5 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\event.cpp @ 89] 
05 (Inline Function) --------`--------     NativeLibrary!ep_rt_wait_event_wait+0x11 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\eventpipe\ep-rt-aot.h @ 609] 
06 (Inline Function) --------`--------     NativeLibrary!session_disable_streaming_thread+0x29 [D:\a\_work\1\s\src\native\eventpipe\ep-session.c @ 119] 
07 (Inline Function) --------`--------     NativeLibrary!ep_session_disable+0x43 [D:\a\_work\1\s\src\native\eventpipe\ep-session.c @ 426] 
08 000000d2`8dd3f1f0 00007ffb`02dc912f     NativeLibrary!disable_holding_lock+0x17d [D:\a\_work\1\s\src\native\eventpipe\ep.c @ 585] 
09 000000d2`8dd3f2e0 00007ffb`02dcc392     NativeLibrary!disable_helper+0x5f [D:\a\_work\1\s\src\native\eventpipe\ep.c @ 657] 
0a 000000d2`8dd3f340 00007ffb`02dd27ba     NativeLibrary!ep_disable+0x72 [D:\a\_work\1\s\src\native\eventpipe\ep.c @ 1147] 
0b 000000d2`8dd3f370 00007ffb`02d6cf08     NativeLibrary!ep_shutdown+0x8a [D:\a\_work\1\s\src\native\eventpipe\ep.c @ 1433] 
0c 000000d2`8dd3f3a0 00007ffb`cb623a83     NativeLibrary!OnProcessExit+0x28 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\startup.cpp @ 510] 
0d 000000d2`8dd3f3d0 00007ffb`cb5f006e     ucrtbase!<lambda_f03950bc5685219e0bcd2087efbe011e>::operator()+0xab
0e 000000d2`8dd3f420 00007ffb`cb5eda1d     ucrtbase!__crt_seh_guarded_call<int>::operator()<<lambda_7777bce6b2f8c936911f934f8298dc43>,<lambda_f03950bc5685219e0bcd2087efbe011e> &,<lambda_3883c3dff614d5e0c5f61bb1ac94921c> >+0x36
0f 000000d2`8dd3f450 00007ffb`02e04c79     ucrtbase!execute_onexit_table+0x3d
10 000000d2`8dd3f490 00007ffb`02e04d9e     NativeLibrary!dllmain_crt_process_detach+0x45 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp @ 182] 
11 000000d2`8dd3f4d0 00007ffb`cd93868f     NativeLibrary!dllmain_dispatch+0xe6 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp @ 293] 
12 000000d2`8dd3f530 00007ffb`cd961096     ntdll!LdrpCallInitRoutine+0x6b
Author: LakshanF
Assignees: LakshanF
Labels:

area-NativeAOT-coreclr

Milestone: -

@agocke agocke added this to the 8.0.0 milestone Aug 9, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Aug 9, 2023
@agocke
Copy link
Member

agocke commented Aug 10, 2023

@LakshanF Has this been fixed?

@agocke
Copy link
Member

agocke commented Aug 16, 2023

@LakshanF seems unlikely we can fix this in .NET 8. Can we fix the SDK to hard error if you're publishing a library with EventSource enabled?

@agocke agocke modified the milestones: 8.0.0, 9.0.0 Aug 28, 2023
MichalStrehovsky added a commit to MichalStrehovsky/runtime that referenced this issue Sep 7, 2023
MichalStrehovsky added a commit that referenced this issue Sep 14, 2023
Works around #89346.

We blocked event pipe completely in shared libraries in #90811 but:

* Elinor found out the shutdown problem is actually Windows specific, so blocking Linux (our most important target) is unnecessary.
* We can avoid the hang at the cost of corrupting the ongoing event pipe session by just not doing the shutdown. This can be worked around by simply stopping the event pipe session at dotnet-trace side manually.

I validated the shared library scenario no longer hangs at shutdown with this.

#89346 still tracks if we can do better here.
github-actions bot pushed a commit that referenced this issue Sep 14, 2023
@LakshanF
Copy link
Contributor Author

We no longer hang on windows after #91715, since we no longer call EventPipe shutdown routines on Windows when the shared library gets unloaded. This means that the trace file will be corrupted if trace collection is still active when the process exits. The recommended workaround for shared library tracing is to stop tracing earlier in the application (only collect for a specific time period or use the ctrl-C option with a tool like dotnet-trace when the interesting traces are collected).

This is likely going to be the guidance for shared library trace collection and we are unlikely to have a better solution than this.

@agocke
Copy link
Member

agocke commented May 28, 2024

Makes sense, thanks. I think the outstanding work item here is just docs.

@agocke agocke closed this as completed May 28, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jun 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

No branches or pull requests

3 participants