Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multiple nested "no GC" region requests #68105

Merged

Conversation

BruceForstall
Copy link
Member

When doing fast tail call on arm32, we disable GC reporting.
But some IR nodes, like unrolled STORE_BLK, also disable GC
reporting. If one of those is within a fast tail call argument
setup region, we'll attempt to disable GC reporting twice. Since
we didn't keep track of nesting, we end up marking some of the
tail call region after the STORE_BLK as interruptible, leading
to be GC info in the argument area.

Change the enable/disable GC calls to keep a nesting count,
and only re-enable GC reporting when the count reaches zero.

Fixes #67046

Also, don't remove GT_NO_OP "non-removable" NOP. These were being
removed by liveness. They were being inserted in at least some cases
by fast tail call code to ensure there was at least one instruction in a
function that was GC interruptible. All the diffs are due to this.

When doing fast tail call on arm32, we disable GC reporting.
But some IR nodes, like unrolled STORE_BLK, also disable GC
reporting. If one of those is within a fast tail call argument
setup region, we'll attempt to disable GC reporting twice. Since
we didn't keep track of nesting, we end up marking some of the
tail call region after the STORE_BLK as interruptible, leading
to be GC info in the argument area.

Change the enable/disable GC calls to keep a nesting count,
and only re-enable GC reporting when the count reaches zero.

Fixes dotnet#67046
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 16, 2022
@ghost ghost assigned BruceForstall Apr 16, 2022
@ghost
Copy link

ghost commented Apr 16, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

When doing fast tail call on arm32, we disable GC reporting.
But some IR nodes, like unrolled STORE_BLK, also disable GC
reporting. If one of those is within a fast tail call argument
setup region, we'll attempt to disable GC reporting twice. Since
we didn't keep track of nesting, we end up marking some of the
tail call region after the STORE_BLK as interruptible, leading
to be GC info in the argument area.

Change the enable/disable GC calls to keep a nesting count,
and only re-enable GC reporting when the count reaches zero.

Fixes #67046

Also, don't remove GT_NO_OP "non-removable" NOP. These were being
removed by liveness. They were being inserted in at least some cases
by fast tail call code to ensure there was at least one instruction in a
function that was GC interruptible. All the diffs are due to this.

Author: BruceForstall
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Member Author

@jakobbotsch @dotnet/jit-contrib PTAL

@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for tracking this down.
Do you know why this only happens on ARM32? Do we not use STORE_BLK for args on other platforms as well?

@BruceForstall
Copy link
Member Author

Do you know why this only happens on ARM32? Do we not use STORE_BLK for args on other platforms as well?

It should happen on any non-x86 platform. It's possible that there are reasons why this particular case didn't get optimized quite right. It also requires a GC to actually happen in exactly the right place to be noticed. I'm not sure why GCStress doesn't make it happen reliably, and in fact in my testing, setting GCStress made the failure not happen.

@BruceForstall
Copy link
Member Author

If I remove the "NO_OP" change and run asm diffs, I don't see any diffs on win-x64 or win-arm64, but I do see about a dozen on linux-x64 where we extend the no-GC range (as would be expected).

(examples for libraries.crossgen2.Linux.x64:

Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEParameterSymbol:Create(Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEModuleSymbol,Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEPropertySymbol,int,System.Reflection.Metadata.ParameterHandle,Microsoft.CodeAnalysis.ParamInfo`1[Microsoft.CodeAnalysis.CSharp.Symbols.TypeSymbol],byref):Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEParameterSymbol
ArraySet:GetType(Microsoft.CodeAnalysis.Emit.EmitContext):Microsoft.Cci.ITypeReference:this
System.TupleExtensions:ToTuple(System.ValueTuple`7[System.__Canon, System.__Canon, System.__Canon, System.__Canon, System.__Canon, System.__Canon, System.__Canon]):System.Tuple`7[System.__Canon, System.__Canon, System.__Canon, System.__Canon, System.__Canon, System.__Canon, System.__Canon]

and libraries_tests:

ProjectAccessor:TryGetDatabaseId(Microsoft.CodeAnalysis.SQLite.v2.Interop.SqlConnection,System.ValueTuple`2[[Microsoft.CodeAnalysis.Storage.ProjectKey, Microsoft.CodeAnalysis.Workspaces, Version=4.3.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35],[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]],bool,byref):bool:this

)

(looks like there's a problem running spmi replay on linux-arm: we run out of memory and crash)

@BruceForstall
Copy link
Member Author

Most gcstress failures are #68112

@BruceForstall
Copy link
Member Author

The NOP change is triggering an x86 GCStress=C failure in baseservices\threading\generics\Monitor\EnterExit12\EnterExit12.dll:

Assert failure(PID 107252 [0x0001a2f4], Thread: 108464 [0x1a7b0]): CheckInstrBytePattern(epilogBase[offset] & X86_INSTR_RET, X86_INSTR_RET, epilogBase[offset]) || CheckInstrBytePattern(epilogBase[offset], X86_INSTR_JMP_NEAR_REL32, epilogBase[offset]) || CheckInstrWord(*PTR_WORD(epilogBase + offset), X86_INSTR_w_JMP_FAR_IND_IMM)

CORECLR! UnwindEspFrameEpilog + 0x1DC (0x5a8eab4c)
CORECLR! UnwindStackFrame + 0x230 (0x5a8eaf50)
CORECLR! DoGcStress + 0x263 (0x5aad21fb)
CORECLR! OnGcCoverageInterrupt + 0x160 (0x5aad284e)
CORECLR! IsGcMarker + 0x93 (0x5a8ff26f)
CORECLR! CLRVectoredExceptionHandlerShim + 0xC5 (0x5a8f46f5)
NTDLL! RtlIpv4AddressToStringA + 0x132 (0x778ae5a2)
NTDLL! RtlUnwind + 0x1D1 (0x778a9261)
NTDLL! KiUserExceptionDispatcher + 0x26 (0x778b7026)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x59c0de2c)
    File: C:\gh\runtime\src\coreclr\vm\eetwain.cpp Line: 3531
    Image: c:\gh\runtime\artifacts\tests\coreclr\windows.x86.Checked\Tests\Core_Root\corerun.exe

@BruceForstall
Copy link
Member Author

I'm going to split out the NOP change to a separate PR.

@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@BruceForstall BruceForstall merged commit 54e82d3 into dotnet:main Apr 18, 2022
@BruceForstall BruceForstall deleted the FixNestedNoGcRegionRequests branch April 18, 2022 16:12
directhex pushed a commit to directhex/runtime that referenced this pull request Apr 21, 2022
When doing fast tail call on arm32, we disable GC reporting.
But some IR nodes, like unrolled STORE_BLK, also disable GC
reporting. If one of those is within a fast tail call argument
setup region, we'll attempt to disable GC reporting twice. Since
we didn't keep track of nesting, we end up marking some of the
tail call region after the STORE_BLK as interruptible, leading
to be GC info in the argument area.

Change the enable/disable GC calls to keep a nesting count,
and only re-enable GC reporting when the count reaches zero.

Fixes dotnet#67046
@ghost ghost locked as resolved and limited conversation to collaborators May 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assert failure: !CREATE_CHECK_STRING(pMT && pMT->Validate())
2 participants