Skip to content
This repository has been archived by the owner on Nov 1, 2020. It is now read-only.

sched_yield infinite loop #7816

Open
yunekit opened this issue Oct 7, 2019 · 11 comments
Open

sched_yield infinite loop #7816

yunekit opened this issue Oct 7, 2019 · 11 comments

Comments

@yunekit
Copy link

yunekit commented Oct 7, 2019

The sched_yield()==0 infinite loop occurred during the running of the application, causing the process to block and fail to work.

@yunekit
Copy link
Author

yunekit commented Oct 7, 2019

The application lost its response, and it was observed with strace that an infinite loop occurred during the call to sched_yield, suspected to occur during the GC.

@Suchiman
Copy link
Contributor

Suchiman commented Oct 7, 2019

Might this be due to #7807 (comment) ?

@yunekit
Copy link
Author

yunekit commented Oct 7, 2019

Thank you. The specific cause of this problem has not yet been found. In my source code, all loops have other calls, but the problem thread that has an infinite loop has only sched_yield calls.

@jkotas
Copy link
Member

jkotas commented Oct 7, 2019

Could you please share the stacktrace that leads to the sched_yield()==0 infinite loop that you are seeing?

@yunekit
Copy link
Author

yunekit commented Oct 8, 2019

The stack information obtained using GDB is as follows:

Thread 7 (Thread 0x7fd4cafd0700 (LWP 4739)):
#0  0x00007fd4f53aa827 in sched_yield () at ../sysdeps/unix/syscall-template.S:84
#1  0x00000000004611e6 in PalSwitchToThread () at /root/corert_3114860/src/Native/Runtime/unix/PalRedhawkUnix.cpp:571
#2  0x0000000000421455 in ThreadStore::SuspendAllThreads (this=<optimized out>, waitForGCEvent=<optimized out>, fireDebugEvent=<optimized out>) at /root/corert_3114860/src/Native/Runtime/threadstore.cpp:279
#3  0x000000000042fed5 in WKS::GCHeap::GarbageCollectGeneration (this=<optimized out>, gen=<optimized out>, reason=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:36525
#4  0x0000000000431e62 in WKS::gc_heap::try_allocate_more_space (acontext=<optimized out>, size=<optimized out>, flags=<optimized out>, gen_number=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:13915
#5  0x0000000000455db0 in WKS::gc_heap::allocate_more_space (size=<optimized out>, flags=0, alloc_generation_number=0, acontext=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:14350
#6  WKS::gc_heap::allocate (jsize=<optimized out>, acontext=<optimized out>, flags=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:14381
#7  WKS::GCHeap::Alloc (this=<optimized out>, context=<optimized out>, size=<optimized out>, flags=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:35807
#8  0x0000000000417dab in RhpGcAlloc (pEEType=<optimized out>, uFlags=0, cbSize=24, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/gcrhenv.cpp:288
#9  0x000000000046de9c in RhpNewObject () at /root/corert_3114860/src/Native/Runtime/unix/unixasmmacrosamd64.inc:429
#10 0x0000000000624f59 in jwsd_jwsHttpd_RequestDataPreprocessor__RecvComplete (sck=..., size=81, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:154
#11 0x00000000005f88f8 in jwsd_JNet_Sockets_JwsSocketReader__SocketReaderNetEvent (sck=..., eve=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/Sockets/JwsSocketReader.cs:228
#12 0x00000000005cf337 in jwsd_JNet_EpollNetDataMonitor__SocketWorkThreadFunc (this=..., o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:545
#13 0x00000000007afdae in S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\FacetChecker.cs:1167
#14 0x0000000000858d4c in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#15 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#16 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#17 0x00007fd4f56976ba in start_thread (arg=0x7fd4cafd0700) at pthread_create.c:333
#18 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

After the thread 7 completes the program transaction, it triggers the GC action. After suspending all the threads, PalSwitchToThread() --> sched_yield () enters the infinite loop.

@jkotas
Copy link
Member

jkotas commented Oct 8, 2019

This thread is waiting for other threads to reach GC safe point. Could you please check the stacktraces of the the other threads? You should fine a thread that is running inside a managed code and that this thread is waiting on. That other thread running inside managed code is the problem.

@yunekit
Copy link
Author

yunekit commented Oct 8, 2019

This app is a web service program. It runs very stable on the mono and .net core platforms.

Since the GC keeps all threads in a suspended state, other threads cannot continue to work.
All threads are as follows:

Thread 14 (Thread 0x7fd4aafd0700 (LWP 4753)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000799646 in S_P_CoreLib_System_Threading_LowLevelMonitor__Acquire ()
#5  0x0000000000859ab8 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait ()
#6  0x00000000007991f4 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__WaitForSignal ()
#7  0x0000000000799033 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__Wait ()
#8  0x0000000000858f4f in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WaitForRequest ()
#9  0x0000000000858d61 in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart ()
#10 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread ()
#11 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint ()
#12 0x00007fd4f56976ba in start_thread (arg=0x7fd4aafd0700) at pthread_create.c:333
#13 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 13 (Thread 0x7fd4c97a0700 (LWP 4752)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000859bf9 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait ()
#5  0x00000000007991f4 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__WaitForSignal ()
#6  0x0000000000799033 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__Wait ()
#7  0x0000000000858f4f in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WaitForRequest ()
#8  0x0000000000858d61 in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart ()
#9  0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread ()
#10 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint ()
#11 0x00007fd4f56976ba in start_thread (arg=0x7fd4c97a0700) at pthread_create.c:333
#12 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 12 (Thread 0x7fd4ab7e0700 (LWP 4751)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000859bf9 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait ()
#5  0x00000000007991f4 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__WaitForSignal ()
#6  0x0000000000799033 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__Wait ()
#7  0x0000000000858f4f in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WaitForRequest ()
#8  0x0000000000858d61 in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart ()
#9  0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread ()
#10 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint ()
#11 0x00007fd4f56976ba in start_thread (arg=0x7fd4ab7e0700) at pthread_create.c:333
#12 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 11 (Thread 0x7fd4abff0700 (LWP 4744)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000461f01 in UnixEvent::Wait (this=<optimized out>, milliseconds=<optimized out>) at /root/corert_3114860/src/Native/Runtime/unix/PalRedhawkUnix.cpp:297
#2  0x000000000046fd8b in CLREventStatic::Wait (this=0x33adbc0, dwMilliseconds=<optimized out>, bAlertable=false, bAllowReentrantWait=false) at /root/corert_3114860/src/Native/Runtime/event.cpp:105
#3  0x0000000000421673 in ThreadStore::WaitForSuspendComplete (this=<optimized out>) at /root/corert_3114860/src/Native/Runtime/threadstore.cpp:322
#4  0x00000000004208d3 in Thread::WaitForSuspend (this=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:77
#5  RhpWaitForSuspend2 () at /root/corert_3114860/src/Native/Runtime/thread.cpp:946
#6  0x000000000079ad1c in S_P_CoreLib_System_Threading_Thread__Yield ()
#7  0x0000000000859d91 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__UninterruptibleSleep0 ()
#8  0x00000000007ad276 in S_P_CoreLib_System_Threading_SpinWait__SpinOnceCore ()
#9  0x00000000007ab43f in S_P_CoreLib_System_Threading_SemaphoreSlim__Wait_4 ()
#10 0x00000000005fe173 in jwsd_JNet_Sockets_SocketCloseHelper__DoClose () at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/Sockets/SocketCloseHelper.cs:204
#11 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#12 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#13 0x00007fd4f56976ba in start_thread (arg=0x7fd4abff0700) at pthread_create.c:333
#14 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 10 (Thread 0x7fd4c8f90700 (LWP 4743)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x00000000005010b1 in System_IO_FileSystem_Interop_Sys__LStat_0 () at /_/src/Common/src/CoreLib/System/Runtime/Versioning/NonVersionableAttribute.cs:29
#5  0x0000000000501c27 in System_IO_FileSystem_Interop_Sys__LStat_1 (path=..., output=...) at /_/src/Common/src/Interop/Unix/System.Native/Interop.Stat.Span.cs:36
#6  0x00000000005b5943 in System_IO_FileSystem_System_IO_FileSystem__FileExists_0 (fullPath=..., fileType=32768, errorInfo=...) at /_/src/System.IO.FileSystem/src/System/IO/FileSystem.Exists.Unix.cs:41
#7  0x0000000000558acd in System_IO_FileSystem_System_IO_File__Exists (path=...) at /_/src/System.IO.FileSystem/src/System/IO/File.cs:136
#8  0x0000000000639e88 in jwsd_jwsHttpd_SiteHostManager__TargetFilePathAndName (this=..., urlPath=..., pPathAndName=..., newUrlPath=..., toDivFix=@0x7fd4c8f8f928: false) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/SiteHostManager.cs:791
#9  0x0000000000639632 in jwsd_jwsHttpd_SiteHostManager__TryGetTargetFilePathAndName (this=..., rdt=..., toDirFix=@0x7fd4c8f8fa18: false) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/SiteHostManager.cs:661
#10 0x000000000062a865 in jwsd_jwsHttpd_RequestWorker__RealRun (reqss=..., rdt=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestWorker.cs:357
#11 0x0000000000629ae4 in jwsd_jwsHttpd_RequestWorker__Run (reqs=..., rdate=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestWorker.cs:28
#12 0x0000000000624c2d in jwsd_jwsHttpd_RequestDataPreprocessor__ParseData_0 (reqs=..., handlerSize=<optimized out>) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:110
#13 0x00000000006251a9 in jwsd_jwsHttpd_RequestDataPreprocessor__RecvComplete (sck=..., size=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:253
#14 0x00000000005f88f8 in jwsd_JNet_Sockets_JwsSocketReader__SocketReaderNetEvent (sck=..., eve=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/Sockets/JwsSocketReader.cs:228
#15 0x00000000005cf337 in jwsd_JNet_EpollNetDataMonitor__SocketWorkThreadFunc (this=..., o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:545
#16 0x00000000007afdae in S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\FacetChecker.cs:1167
#17 0x0000000000858d4c in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#18 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#19 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#20 0x00007fd4f56976ba in start_thread (arg=0x7fd4c8f90700) at pthread_create.c:333
#21 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 9 (Thread 0x7fd4c9fb0700 (LWP 4741)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x00000000004ff941 in System_IO_FileSystem_Interop_Sys__Stat_0 () at /_/artifacts/obj/System.IO.Compression.Brotli/netcoreapp-Unix-Release/System.SR.cs:20
#5  0x0000000000500337 in System_IO_FileSystem_Interop_Sys__Stat_1 (path=..., output=...) at /_/src/Common/src/Interop/Unix/System.Native/Interop.Stat.Span.cs:24
#6  0x00000000005b5927 in System_IO_FileSystem_System_IO_FileSystem__FileExists_0 (fullPath=..., fileType=16384, errorInfo=...) at /_/src/System.IO.FileSystem/src/System/IO/FileSystem.Exists.Unix.cs:41
#7  0x00000000005280ba in System_IO_FileSystem_System_IO_Directory__Exists (path=...) at /_/src/System.IO.FileSystem/src/System/IO/Directory.cs:64
#8  0x000000000063a38d in jwsd_jwsHttpd_SiteHostManager__UrlToPhysicsPath (this=..., urlPath=..., isDir=@0x7fd4c9faf870: false) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/SiteHostManager.cs:908
#9  0x0000000000639d86 in jwsd_jwsHttpd_SiteHostManager__TargetFilePathAndName (this=..., urlPath=..., pPathAndName=..., newUrlPath=..., toDivFix=@0x7fd4c9faf928: false) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/SiteHostManager.cs:760
#10 0x0000000000639632 in jwsd_jwsHttpd_SiteHostManager__TryGetTargetFilePathAndName (this=..., rdt=..., toDirFix=@0x7fd4c9fafa18: false) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/SiteHostManager.cs:661
#11 0x000000000062a865 in jwsd_jwsHttpd_RequestWorker__RealRun (reqss=..., rdt=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestWorker.cs:357
#12 0x0000000000629ae4 in jwsd_jwsHttpd_RequestWorker__Run (reqs=..., rdate=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestWorker.cs:28
#13 0x0000000000624c2d in jwsd_jwsHttpd_RequestDataPreprocessor__ParseData_0 (reqs=..., handlerSize=<optimized out>) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:110
#14 0x00000000006251a9 in jwsd_jwsHttpd_RequestDataPreprocessor__RecvComplete (sck=..., size=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:253
#15 0x00000000005f88f8 in jwsd_JNet_Sockets_JwsSocketReader__SocketReaderNetEvent (sck=..., eve=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/Sockets/JwsSocketReader.cs:228
#16 0x00000000005cf337 in jwsd_JNet_EpollNetDataMonitor__SocketWorkThreadFunc (this=..., o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:545
#17 0x00000000007afdae in S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\FacetChecker.cs:1167
#18 0x0000000000858d4c in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#19 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#20 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#21 0x00007fd4f56976ba in start_thread (arg=0x7fd4c9fb0700) at pthread_create.c:333
#22 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 8 (Thread 0x7fd4ca7c0700 (LWP 4740)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000859bf9 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#5  0x0000000000859e16 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo(long,...)(...)( *) static () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#6  0x0000000000857572 in S_P_CoreLib_System_Threading_ClrThreadPool_GateThread__GateThreadStart () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#7  0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#8  0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#9  0x00007fd4f56976ba in start_thread (arg=0x7fd4ca7c0700) at pthread_create.c:333
#10 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7fd4cafd0700 (LWP 4739)):
#0  0x00007fd4f53aa827 in sched_yield () at ../sysdeps/unix/syscall-template.S:84
#1  0x00000000004611e6 in PalSwitchToThread () at /root/corert_3114860/src/Native/Runtime/unix/PalRedhawkUnix.cpp:571
#2  0x0000000000421455 in ThreadStore::SuspendAllThreads (this=<optimized out>, waitForGCEvent=<optimized out>, fireDebugEvent=<optimized out>) at /root/corert_3114860/src/Native/Runtime/threadstore.cpp:279
#3  0x000000000042fed5 in WKS::GCHeap::GarbageCollectGeneration (this=<optimized out>, gen=<optimized out>, reason=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:36525
#4  0x0000000000431e62 in WKS::gc_heap::try_allocate_more_space (acontext=<optimized out>, size=<optimized out>, flags=<optimized out>, gen_number=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:13915
#5  0x0000000000455db0 in WKS::gc_heap::allocate_more_space (size=<optimized out>, flags=0, alloc_generation_number=0, acontext=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:14350
#6  WKS::gc_heap::allocate (jsize=<optimized out>, acontext=<optimized out>, flags=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:14381
#7  WKS::GCHeap::Alloc (this=<optimized out>, context=<optimized out>, size=<optimized out>, flags=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:35807
#8  0x0000000000417dab in RhpGcAlloc (pEEType=<optimized out>, uFlags=0, cbSize=24, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/gcrhenv.cpp:288
#9  0x000000000046de9c in RhpNewObject () at /root/corert_3114860/src/Native/Runtime/unix/unixasmmacrosamd64.inc:429
#10 0x0000000000624f59 in jwsd_jwsHttpd_RequestDataPreprocessor__RecvComplete (sck=..., size=81, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:154
#11 0x00000000005f88f8 in jwsd_JNet_Sockets_JwsSocketReader__SocketReaderNetEvent (sck=..., eve=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/Sockets/JwsSocketReader.cs:228
#12 0x00000000005cf337 in jwsd_JNet_EpollNetDataMonitor__SocketWorkThreadFunc (this=..., o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:545
#13 0x00000000007afdae in S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\FacetChecker.cs:1167
#14 0x0000000000858d4c in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#15 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#16 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#17 0x00007fd4f56976ba in start_thread (arg=0x7fd4cafd0700) at pthread_create.c:333
#18 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 6 (Thread 0x7fd4cb7e0700 (LWP 4738)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x000000000094e1eb in <PInvokeDelegateWrapper>jwsd_JNet_TcpAccept_delegateAccept__<ForwardNativeFunctionWrapper>jwsd_JNet_TcpAccept_delegateAccept () at /_/src/System.Net.Http/src/System/Net/Http/Headers/ObjectCollection.cs:15
#5  0x00000000005ea8fd in jwsd_JNet_TcpAccept__AcceptLoop (this=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/TcpAccept.cs:168
#6  0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#7  0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#8  0x00007fd4f56976ba in start_thread (arg=0x7fd4cb7e0700) at pthread_create.c:333
#9  0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7fd4cbff0700 (LWP 4737)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000859bf9 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#5  0x000000000079d0f9 in S_P_CoreLib_System_Threading_WaitHandle__WaitOneNoCheck () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\ContentValidator.cs:573
#6  0x00000000007b2d65 in S_P_CoreLib_System_Threading_TimerQueue__TimerThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\NamespaceList.cs:159
#7  0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#8  0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#9  0x00007fd4f56976ba in start_thread (arg=0x7fd4cbff0700) at pthread_create.c:333
#10 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7fd4d8b90700 (LWP 4736)):
#0  0x00007fd4f53c7a13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x000000000094e160 in <PInvokeDelegateWrapper>jwsd_JNet_EpollNetDataMonitor_DelegateEpollWait__<ForwardNativeFunctionWrapper>jwsd_JNet_EpollNetDataMonitor_DelegateEpollWait () at /_/src/System.Net.Http/src/System/Net/Http/Headers/ObjectCollection.cs:15
#2  0x00000000005cb061 in jwsd_JNet_EpollNetDataMonitor__ThreadWaitEpollEvent (this=..., idObj=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:179
#3  0x000000000079b28d in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#4  0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#5  0x00007fd4f56976ba in start_thread (arg=0x7fd4d8b90700) at pthread_create.c:333
#6  0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7fd4d93a0700 (LWP 4735)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000461f01 in UnixEvent::Wait (this=<optimized out>, milliseconds=<optimized out>) at /root/corert_3114860/src/Native/Runtime/unix/PalRedhawkUnix.cpp:297
#2  0x000000000046fd8b in CLREventStatic::Wait (this=0x33adbc0, dwMilliseconds=<optimized out>, bAlertable=false, bAllowReentrantWait=false) at /root/corert_3114860/src/Native/Runtime/event.cpp:105
#3  0x0000000000421673 in ThreadStore::WaitForSuspendComplete (this=<optimized out>) at /root/corert_3114860/src/Native/Runtime/threadstore.cpp:322
#4  0x00000000004208d3 in Thread::WaitForSuspend (this=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:77
#5  RhpWaitForSuspend2 () at /root/corert_3114860/src/Native/Runtime/thread.cpp:946
#6  0x0000000000507392 in System_Console_Interop_Sys__Write () at /mnt/e/Jexus-Core-V1.0/JwsShared/JexusShreadFunctions.cs:55
#7  0x0000000000575bc4 in System_Console_System_ConsolePal__Write_0 (fd=..., bufPtr=0x7fd4dbec6448 "Epoll出现可读写事件并纳入工作线程,socket:8, 状态:CanRead\nRead\n", count=75, mayChangeCursorPosition=true) at /_/src/System.Console/src/System/ConsolePal.Unix.cs:1246
#8  0x000000000057566c in System_Console_System_ConsolePal__Write (fd=..., buffer=..., offset=<optimized out>, count=<optimized out>, mayChangeCursorPosition=<optimized out>) at /_/src/System.Console/src/System/ConsolePal.Unix.cs:1238
#9  0x0000000000809791 in S_P_CoreLib_System_IO_StreamWriter__Flush_0 () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\XsdDuration.cs:658
#10 0x000000000080a54d in S_P_CoreLib_System_IO_StreamWriter__WriteFormatHelper () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\XsdDuration.cs:658
#11 0x000000000080a6a9 in S_P_CoreLib_System_IO_StreamWriter__WriteLine_2 () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\XsdDuration.cs:658
#12 0x000000000086740e in S_P_CoreLib_System_IO_TextWriter_SyncTextWriter__WriteLine_16 () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#13 0x00000000005cec09 in jwsd_JNet_EpollNetDataMonitor__ToWorkCallback (this=..., info_id=<optimized out>, value=<optimized out>, fromRead=<optimized out>) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:514
#14 0x00000000005cb1a5 in jwsd_JNet_EpollNetDataMonitor__ThreadWaitEpollEvent (this=..., idObj=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:239
#15 0x000000000079b28d in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#16 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#17 0x00007fd4f56976ba in start_thread (arg=0x7fd4d93a0700) at pthread_create.c:333
#18 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7fd4dbe80700 (LWP 4734)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000461f01 in UnixEvent::Wait (this=<optimized out>, milliseconds=<optimized out>) at /root/corert_3114860/src/Native/Runtime/unix/PalRedhawkUnix.cpp:297
#2  0x00000000004166c5 in PalWaitForSingleObjectEx (arg1=0x33b34c0, arg2=4294967295, arg3=0) at /root/corert_3114860/src/Native/Runtime/Full/../PalRedhawkFunctions.h:172
#3  FinalizerStart (pContext=0x33b34c0) at /root/corert_3114860/src/Native/Runtime/FinalizerHelpers.cpp:59
#4  0x00007fd4f56976ba in start_thread (arg=0x7fd4dbe80700) at pthread_create.c:333
#5  0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7fd4f6ca0880 (LWP 4733)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000859bf9 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#5  0x0000000000859e16 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo(long,...)(...)( *) static () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#6  0x00000000006460df in jwsd_jwsHttpd_SvrMain__RealMain (args=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/SvrMain.cs:116
#7  0x000000000063f53c in jwsd_jwsHttpd_JwsWorkerServer__HttpdMain (args=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JwsWorkerServer.cs:103
#8  0x0000000000689c0e in jwsd_jws_JwsMain__Main1 (args=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsMain.cs:209
#9  0x0000000000689959 in jwsd_jws_JwsMain__Main (args=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsMain.cs:177
#10 0x0000000000952abd in jwsd__Module___StartupCodeMain () at <stdin>:15732480
#11 0x0000000000415be5 in main (argc=1, argv=0x7fffd24ac6e8) at /root/corert_3114860/src/Native/Bootstrap/main.cpp:395

@jkotas
Copy link
Member

jkotas commented Oct 8, 2019

Hmm, the stack traces do not ring any bells. It looks all normal.

Could you please step through the loop in ThreadStore::SuspendAllThreads and find the Thread* that causes keepWaiting to be set to true? Then look at the m_uPalThreadIdForLogging field of this Thread* to identify the threads that causing the problem via Thread ID.

@yunekit
Copy link
Author

yunekit commented Oct 8, 2019

thank you for your help.
This program uses function calls such as epoll_wait, and I suspect that thread hangs are related to this function.

@jkotas
Copy link
Member

jkotas commented Oct 8, 2019

Do you see that the loop inside ThreadStore::SuspendAllThreads is waiting for the thread that is calling epoll_wait ?

Would you mind sharing the bit of code in EpollNetDataMonitor.cs that calls the epoll_wait? One possibility is that the hang is caused by a bad PInvoke definition for epoll_wait.

@yunekit
Copy link
Author

yunekit commented Oct 8, 2019

This problem was temporarily resolved by modifying the epoll_wait call parameter. During the stress test, the program was not found to be stuck.

Old code:

var nfds = epoll_wait(epfd,  events,  max, -1);

New code:

var nfds = epoll_wait(epfd,  events,  max, 100);
if (nfds == 0){
    Thread.Sleep(1);
    continue;  //goto epoll_wait...
}

Whether this improvement can completely solve the problem, still need to continue to observe.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants