Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling RequestReJIT in the profiler API can cause a deadlock #97771

Closed
kalikin opened this issue Jan 31, 2024 · 5 comments · Fixed by #98400
Closed

Calling RequestReJIT in the profiler API can cause a deadlock #97771

kalikin opened this issue Jan 31, 2024 · 5 comments · Fixed by #98400
Assignees
Milestone

Comments

@kalikin
Copy link

kalikin commented Jan 31, 2024

Description

We experience an occasional deadlock while using ICorProfilerInfo4::RequestReJIT.

Profiler thread:

#14 0x00007ffff6c869ed in Holder<CrstBase*, &CrstBase::AcquireLock, &CrstBase::ReleaseLock, 0ul, &(int CompareDefault<CrstBase*>(CrstBase*, CrstBase*)), true>::Holder (this=0x7fbe71ff4a30, value=0x7ffff793ce10 <CodeVersionManager::s_lock>) at /runtime/src/coreclr/inc/holder.h:746
#15 0x00007ffff6cc1cfc in CodeVersionManager::LockHolder::LockHolder (this=0x7fbe71ff4a30) at /runtime/src/coreclr/vm/codeversion.h:673
#16 0x00007ffff6d12997 in CodeVersionManager::SetActiveILCodeVersions (this=0x555555682dc0, pActiveVersions=0x7fbe6033f230, cActiveVersions=42, pErrors=0x7fbe71ff4c50) at /runtime/src/coreclr/vm/codeversion.cpp:1549
#17 0x00007ffff6e609e8 in ReJitManager::UpdateActiveILVersions (cFunctions=42, rgModuleIDs=0x7fbe71ff4ee0, rgMethodDefs=0x7fbe71ff8d60, rgHrStatuses=0x0, fIsRevert=0, flags=0) at /runtime/src/coreclr/vm/rejit.cpp:662
#18 0x00007ffff6e60320 in ReJitManager::RequestReJIT (cFunctions=42, rgModuleIDs=0x7fbe71ff4ee0, rgMethodDefs=0x7fbe71ff8d60, flags=0) at /runtime/src/coreclr/vm/rejit.cpp:529
#19 0x00007ffff708d63a in ProfToEEInterfaceImpl::RequestReJIT (this=0x7fff70009560, cFunctions=42, moduleIds=0x7fbe71ff4ee0, methodIds=0x7fbe71ff8d60) at /runtime/src/coreclr/vm/proftoeeinterfaceimpl.cpp:9287

CLR internal thread:

#9  0x00007ffff6d21a16 in CrstBase::Enter (this=0x7ffff794ec28 <ReJitManager::s_csGlobalRequest>, noLevelCheckFlag=CrstBase::CRST_LEVEL_CHECK) at /runtime/src/coreclr/vm/crst.cpp:322
#10 0x00007ffff6c8d7b7 in CrstBase::AcquireLock (c=0x7ffff794ec28 <ReJitManager::s_csGlobalRequest>) at /runtime/src/coreclr/vm/crst.h:187
#11 0x00007ffff6c86c54 in CrstBase::CrstHolder::CrstHolder (this=0x7fff76638e90, pCrst=0x7ffff794ec28 <ReJitManager::s_csGlobalRequest>) at /runtime/src/coreclr/vm/crst.h:378
#12 0x00007ffff6e60381 in ReJitManager::UpdateActiveILVersions (cFunctions=1, rgModuleIDs=0x7fff76639320, rgMethodDefs=0x7fff7663931c, rgHrStatuses=0x0, fIsRevert=0, flags=0) at /runtime/src/coreclr/vm/rejit.cpp:551
#13 0x00007ffff6e60320 in ReJitManager::RequestReJIT (cFunctions=1, rgModuleIDs=0x7fff76639320, rgMethodDefs=0x7fff7663931c, flags=0) at /runtime/src/coreclr/vm/rejit.cpp:529
#14 0x00007ffff6dbfea6 in CEEInfo::reportInliningDecision (this=0x7fff7663b228, inlinerHnd=0x7fff7aabd310, inlineeHnd=0x7fff7aab9080, inlineResult=INLINE_PASS, reason=0x7fff76aef5bf "profitable inline") at /runtime/src/coreclr/vm/jitinterface.cpp:7981
#15 0x00007fff767f759e in InlineResult::Report (this=0x7fff76639758) at /runtime/src/coreclr/jit/inline.cpp:812
#16 0x00007fff766d71c5 in InlineResult::~InlineResult (this=0x7fff76639758) at /runtime/src/coreclr/jit/inline.h:463
#17 0x00007fff76720fbf in Compiler::fgInline (this=0x7fbec4363728) at /runtime/src/coreclr/jit/fginline.cpp:738
#18 0x00007fff766d9063 in CompilerPhaseWithStatus::DoPhase (this=0x7fff766398f8) at /runtime/src/coreclr/jit/phase.h:124
#19 0x00007fff768e35dc in Phase::Run (this=0x7fff766398f8) at /runtime/src/coreclr/jit/phase.cpp:61
#20 0x00007fff766d5fac in DoPhase (_compiler=0x7fbec4363728, _phase=PHASE_MORPH_INLINE, _action=(PhaseStatus (Compiler::*)(Compiler * const)) 0x7fff76720c50 <Compiler::fgInline()>) at /runtime/src/coreclr/jit/phase.h:136
#21 0x00007fff766c5940 in Compiler::compCompile (this=0x7fbec4363728, methodCodePtr=0x7fff7663abb0, methodCodeSize=0x7fff7663b004, compileFlags=0x7fff7663abd8) at /runtime/src/coreclr/jit/compiler.cpp:4550

Profiler is holding a lock, while trying to acquire another lock, which is held by another internal CLR thread.

It seems that setting pfShouldInline in ICorProfilerCallback::JITInlining callback has an effect on the issue.

Bigger thread dump

thread-dump.txt

Reproduction Steps

Expected behavior

No deadlock.

Actual behavior

Deadlock. Application hangs.

Regression?

No response

Known Workarounds

Set pfShouldInline to FALSE in ICorProfilerCallback::JITInlining callback.

Configuration

Debug runtime was built from .NET 7 branch.

git rev-parse HEAD
342951dec5dc326a4b0ba0d0949b7a4155154e90

Other information

No response

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 31, 2024
@ghost
Copy link

ghost commented Jan 31, 2024

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

We experience an occasional deadlock while using ICorProfilerInfo4::RequestReJIT.

Profiler thread:

#14 0x00007ffff6c869ed in Holder<CrstBase*, &CrstBase::AcquireLock, &CrstBase::ReleaseLock, 0ul, &(int CompareDefault<CrstBase*>(CrstBase*, CrstBase*)), true>::Holder (this=0x7fbe71ff4a30, value=0x7ffff793ce10 <CodeVersionManager::s_lock>) at /runtime/src/coreclr/inc/holder.h:746
#15 0x00007ffff6cc1cfc in CodeVersionManager::LockHolder::LockHolder (this=0x7fbe71ff4a30) at /runtime/src/coreclr/vm/codeversion.h:673
#16 0x00007ffff6d12997 in CodeVersionManager::SetActiveILCodeVersions (this=0x555555682dc0, pActiveVersions=0x7fbe6033f230, cActiveVersions=42, pErrors=0x7fbe71ff4c50) at /runtime/src/coreclr/vm/codeversion.cpp:1549
#17 0x00007ffff6e609e8 in ReJitManager::UpdateActiveILVersions (cFunctions=42, rgModuleIDs=0x7fbe71ff4ee0, rgMethodDefs=0x7fbe71ff8d60, rgHrStatuses=0x0, fIsRevert=0, flags=0) at /runtime/src/coreclr/vm/rejit.cpp:662
#18 0x00007ffff6e60320 in ReJitManager::RequestReJIT (cFunctions=42, rgModuleIDs=0x7fbe71ff4ee0, rgMethodDefs=0x7fbe71ff8d60, flags=0) at /runtime/src/coreclr/vm/rejit.cpp:529
#19 0x00007ffff708d63a in ProfToEEInterfaceImpl::RequestReJIT (this=0x7fff70009560, cFunctions=42, moduleIds=0x7fbe71ff4ee0, methodIds=0x7fbe71ff8d60) at /runtime/src/coreclr/vm/proftoeeinterfaceimpl.cpp:9287

CLR internal thread:

#9  0x00007ffff6d21a16 in CrstBase::Enter (this=0x7ffff794ec28 <ReJitManager::s_csGlobalRequest>, noLevelCheckFlag=CrstBase::CRST_LEVEL_CHECK) at /runtime/src/coreclr/vm/crst.cpp:322
#10 0x00007ffff6c8d7b7 in CrstBase::AcquireLock (c=0x7ffff794ec28 <ReJitManager::s_csGlobalRequest>) at /runtime/src/coreclr/vm/crst.h:187
#11 0x00007ffff6c86c54 in CrstBase::CrstHolder::CrstHolder (this=0x7fff76638e90, pCrst=0x7ffff794ec28 <ReJitManager::s_csGlobalRequest>) at /runtime/src/coreclr/vm/crst.h:378
#12 0x00007ffff6e60381 in ReJitManager::UpdateActiveILVersions (cFunctions=1, rgModuleIDs=0x7fff76639320, rgMethodDefs=0x7fff7663931c, rgHrStatuses=0x0, fIsRevert=0, flags=0) at /runtime/src/coreclr/vm/rejit.cpp:551
#13 0x00007ffff6e60320 in ReJitManager::RequestReJIT (cFunctions=1, rgModuleIDs=0x7fff76639320, rgMethodDefs=0x7fff7663931c, flags=0) at /runtime/src/coreclr/vm/rejit.cpp:529
#14 0x00007ffff6dbfea6 in CEEInfo::reportInliningDecision (this=0x7fff7663b228, inlinerHnd=0x7fff7aabd310, inlineeHnd=0x7fff7aab9080, inlineResult=INLINE_PASS, reason=0x7fff76aef5bf "profitable inline") at /runtime/src/coreclr/vm/jitinterface.cpp:7981
#15 0x00007fff767f759e in InlineResult::Report (this=0x7fff76639758) at /runtime/src/coreclr/jit/inline.cpp:812
#16 0x00007fff766d71c5 in InlineResult::~InlineResult (this=0x7fff76639758) at /runtime/src/coreclr/jit/inline.h:463
#17 0x00007fff76720fbf in Compiler::fgInline (this=0x7fbec4363728) at /runtime/src/coreclr/jit/fginline.cpp:738
#18 0x00007fff766d9063 in CompilerPhaseWithStatus::DoPhase (this=0x7fff766398f8) at /runtime/src/coreclr/jit/phase.h:124
#19 0x00007fff768e35dc in Phase::Run (this=0x7fff766398f8) at /runtime/src/coreclr/jit/phase.cpp:61
#20 0x00007fff766d5fac in DoPhase (_compiler=0x7fbec4363728, _phase=PHASE_MORPH_INLINE, _action=(PhaseStatus (Compiler::*)(Compiler * const)) 0x7fff76720c50 <Compiler::fgInline()>) at /runtime/src/coreclr/jit/phase.h:136
#21 0x00007fff766c5940 in Compiler::compCompile (this=0x7fbec4363728, methodCodePtr=0x7fff7663abb0, methodCodeSize=0x7fff7663b004, compileFlags=0x7fff7663abd8) at /runtime/src/coreclr/jit/compiler.cpp:4550

Profiler is holding a lock, while trying to acquire another lock, which is held by another internal CLR thread.

It seems that setting pfShouldInline in ICorProfilerCallback::JITInlining callback has an effect on the issue.

Bigger thread dump

thread-dump.txt

Reproduction Steps

Expected behavior

No deadlock.

Actual behavior

Deadlock. Application hangs.

Regression?

No response

Known Workarounds

Set pfShouldInline to FALSE in ICorProfilerCallback::JITInlining callback.

Configuration

Debug runtime was built from .NET 7 branch.

git rev-parse HEAD
342951dec5dc326a4b0ba0d0949b7a4155154e90

Other information

No response

Author: kalikin
Assignees: -
Labels:

area-Diagnostics-coreclr, untriaged

Milestone: -

@tommcdon tommcdon added this to the 9.0.0 milestone Feb 5, 2024
@tommcdon tommcdon removed the untriaged New issue has not been triaged by the area owner label Feb 5, 2024
@tommcdon
Copy link
Member

tommcdon commented Feb 5, 2024

@davmason PTAL
cc @noahfalk

@davmason
Copy link
Member

Hi @kalikin, thanks for this report. We shouldn't hold the code version lock when calling in to RequestReJIT.

Would you be able to add this commit to your branch and see if it fixes the deadlock for you?

@kalikin
Copy link
Author

kalikin commented Feb 13, 2024

@davmason I've tried the change from this commit and no longer see deadlocks like before, regardless of pfShouldInline value in JITInlining callback.

Looks good to me.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 14, 2024
@davmason
Copy link
Member

Thanks for validating, I opened #98400 to fix this issue

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 15, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants