Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed '(igInLoop->igLoopBackEdge == nullptr) || (igInLoop->igLoopBackEdge == igLoopHeader)' #82729

Closed
AndyAyersMS opened this issue Feb 27, 2023 · 22 comments · Fixed by #83286
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

Libraries pgo, possibly recent regression in weekend runs (I had clean private runs on Friday, but this is PGO so may not always be testing the exact same codegen)

https://dev.azure.com/dnceng-public/public/_build/results?buildId=185788&view=ms.vss-test-web.build-test-results-tab

export DOTNET_TieredCompilation=1
export DOTNET_DbgEnableMiniDump=1
export DOTNET_EnableCrashReport=1
export DOTNET_DbgMiniDumpName=$HELIX_DUMP_FOLDER/coredump.%d.dmp
export DOTNET_ReadyToRun=0
export DOTNET_TC_QuickJitForLoops=1
export DOTNET_TieredPGO=1
export DOTNET_JitRandomGuardedDevirtualization=1
export DOTNET_JitRandomEdgeCounts=1
export DOTNET_JitRandomlyCollect64BitCounts=1

dotnet exec --runtimeconfig System.Runtime.Serialization.Xml.Tests.runtimeconfig.json --depsfile System.Runtime.Serialization.Xml.Tests.deps.json xunit.console.dll System.Runtime.Serialization.Xml.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Runtime.Serialization.Xml.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Serialization.Xml.Tests (found 278 of 279 test cases)
  Starting:    System.Runtime.Serialization.Xml.Tests (parallel test collections = on, max threads = 2)

Assert failure(PID 27 [0x0000001b], Thread: 33 [0x0021]): Assertion failed '(igInLoop->igLoopBackEdge == nullptr) || (igInLoop->igLoopBackEdge == igLoopHeader)' in 'System.Enum:TryParseByValueOrName[int](System.RuntimeType,System.ReadOnlySpan`1[ushort],bool,bool,byref):bool' during 'Generate code' (IL size 724; hash 0xef12aa33; Tier1)

    File: /__w/1/s/src/coreclr/jit/emit.cpp Line: 5713
    Image: /root/helix/work/correlation/dotnet
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 27, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 27, 2023
@AndyAyersMS AndyAyersMS added arch-arm64 and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 27, 2023
@ghost
Copy link

ghost commented Feb 27, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Libraries pgo, possibly recent regression in weekend runs (I had clean private runs on Friday, but this is PGO so may not always be testing the exact same codegen)

https://dev.azure.com/dnceng-public/public/_build/results?buildId=185788&view=ms.vss-test-web.build-test-results-tab

export DOTNET_TieredCompilation=1
export DOTNET_DbgEnableMiniDump=1
export DOTNET_EnableCrashReport=1
export DOTNET_DbgMiniDumpName=$HELIX_DUMP_FOLDER/coredump.%d.dmp
export DOTNET_ReadyToRun=0
export DOTNET_TC_QuickJitForLoops=1
export DOTNET_TieredPGO=1
export DOTNET_JitRandomGuardedDevirtualization=1
export DOTNET_JitRandomEdgeCounts=1
export DOTNET_JitRandomlyCollect64BitCounts=1

dotnet exec --runtimeconfig System.Runtime.Serialization.Xml.Tests.runtimeconfig.json --depsfile System.Runtime.Serialization.Xml.Tests.deps.json xunit.console.dll System.Runtime.Serialization.Xml.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Runtime.Serialization.Xml.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Serialization.Xml.Tests (found 278 of 279 test cases)
  Starting:    System.Runtime.Serialization.Xml.Tests (parallel test collections = on, max threads = 2)

Assert failure(PID 27 [0x0000001b], Thread: 33 [0x0021]): Assertion failed '(igInLoop->igLoopBackEdge == nullptr) || (igInLoop->igLoopBackEdge == igLoopHeader)' in 'System.Enum:TryParseByValueOrName[int](System.RuntimeType,System.ReadOnlySpan`1[ushort],bool,bool,byref):bool' during 'Generate code' (IL size 724; hash 0xef12aa33; Tier1)

    File: /__w/1/s/src/coreclr/jit/emit.cpp Line: 5713
    Image: /root/helix/work/correlation/dotnet
Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@kunalspathak kunalspathak self-assigned this Feb 27, 2023
@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 28, 2023
@ghost
Copy link

ghost commented Feb 28, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Libraries pgo, possibly recent regression in weekend runs (I had clean private runs on Friday, but this is PGO so may not always be testing the exact same codegen)

https://dev.azure.com/dnceng-public/public/_build/results?buildId=185788&view=ms.vss-test-web.build-test-results-tab

export DOTNET_TieredCompilation=1
export DOTNET_DbgEnableMiniDump=1
export DOTNET_EnableCrashReport=1
export DOTNET_DbgMiniDumpName=$HELIX_DUMP_FOLDER/coredump.%d.dmp
export DOTNET_ReadyToRun=0
export DOTNET_TC_QuickJitForLoops=1
export DOTNET_TieredPGO=1
export DOTNET_JitRandomGuardedDevirtualization=1
export DOTNET_JitRandomEdgeCounts=1
export DOTNET_JitRandomlyCollect64BitCounts=1

dotnet exec --runtimeconfig System.Runtime.Serialization.Xml.Tests.runtimeconfig.json --depsfile System.Runtime.Serialization.Xml.Tests.deps.json xunit.console.dll System.Runtime.Serialization.Xml.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Runtime.Serialization.Xml.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Serialization.Xml.Tests (found 278 of 279 test cases)
  Starting:    System.Runtime.Serialization.Xml.Tests (parallel test collections = on, max threads = 2)

Assert failure(PID 27 [0x0000001b], Thread: 33 [0x0021]): Assertion failed '(igInLoop->igLoopBackEdge == nullptr) || (igInLoop->igLoopBackEdge == igLoopHeader)' in 'System.Enum:TryParseByValueOrName[int](System.RuntimeType,System.ReadOnlySpan`1[ushort],bool,bool,byref):bool' during 'Generate code' (IL size 724; hash 0xef12aa33; Tier1)

    File: /__w/1/s/src/coreclr/jit/emit.cpp Line: 5713
    Image: /root/helix/work/correlation/dotnet
Author: AndyAyersMS
Assignees: kunalspathak
Labels:

arch-arm64, area-CodeGen-coreclr

Milestone: -

@BruceForstall BruceForstall added this to the 8.0.0 milestone Mar 4, 2023
@BruceForstall BruceForstall added the blocking-clean-ci-optional Blocking optional rolling runs label Mar 4, 2023
@kunalspathak
Copy link
Member

Will take a look.

@kunalspathak
Copy link
Member

@AndyAyersMS - Seems I am not able to repro this because of some randomness introduced? Is there a reliable way to force the situation to reproduce?

image

@kunalspathak
Copy link
Member

From the dumps it looks like we are somehow the loop we have identified has a back target in the middle of the loop:

IG40:    // loop_head

IG42:

IG46:
    branch to IG42

However, when I print the Disasm, I don't see so many IGs in the method which makes sense as why this issue is not reproducing. I think in the repro case, there are things getting inlined (due to DOTNET_JitRandom* variables) and I cannot reproduce it because things are not getting inlined, the way they should be to repro it.

I see this was also opened few months ago in #79986, but I was not able to reproduce it and had to close it. That one could have happened for the same reason.

@kunalspathak
Copy link
Member

kunalspathak commented Mar 10, 2023

After more poking in the dump, here is what I came up with.


IG15:
  ...
  jmp
  align for IG59       <--------- align instr# 1

-----LOOP# 1 START----------------------

IG59:

IG60:

  branch IG59
-----LOOP# 1 END----------------------
IG63:
   bhs
   align for IG64      <--------- align instr# 2

-----LOOP# 2 START ??? ----------------------
IG64: 		     <-- problematic head
   
   bhs ???
   ...
   ...
   b ... 
   align for IG66      <--------- align instr# 3
-----LOOP# 2 END----------------------

IG65:
  ...

-----LOOP# 3 START ----------------------
IG66:
  ...

IG70:
  ...
  b IG66
  align for IG76        <--------- align instr# 4
-----LOOP# 3 END----------------------
-----LOOP# 4 START---------------------
IG76:
  ...

IG77:
  ... 
  branch IG76
-----LOOP# 4 END----------------------

From emitAlignList, there are 4 loops that we want to align.
Loop# 1: IG59 - IG60
Loop# 2: IG64 - IG64 (this is questionable and where bug happens)
Loop# 3: IG66 - IG70
Loop# 4: IG76 - IG77

For IG64, it seems it has 2 branches bhs and b ( I was able to get this from emitJumpList). However, I am not sure where the target of the first branch bhs is. Things would be correct if it is to IG64 itself, but if not, there is something wrong and IG64 should not be treated as a loop to align. As of now, seems that either bhs doesn't point back to IG64 or if it does, then igLoopBackEdge of it is not pointing back to IG64. That is the reason, we go all the way up to IG70 and noticed that its backedge should point to IG64 (because we are still processing it), but instead it points to its own back edge target IG66.

Is there a way to find out the target of bhs instruction in IG64? @BruceForstall ?

@AndyAyersMS
Copy link
Member Author

Is there a reliable way to force the situation to reproduce?

No. All I can suggest is to run this repeatedly and either dump out the text PGO data or collect SPMI, so if it ever fails you can replay that particular compilation.

@BruceForstall
Copy link
Member

Is the JIT behavior deterministic if you force a particular random seed (and is it possible to force a particular random seed)? Then, you could run repeatedly with different random seeds and hopefully find one that reproduces the behavior.

I wonder if, for the random cases, we should someone print out (or otherwise expose) what random numbers / seeds were used. In the random jitstress/libraries-jitstress pipelines, the pipeline is responsible for choosing the random jitstress value, which then gets captured in the console log, to aid reproducibility.

@BruceForstall
Copy link
Member

Looks like DOTNET_JitRandomEdgeCounts=1 is used as the seed... maybe.

@BruceForstall
Copy link
Member

Is there a way to find out the target of bhs instruction in IG64? @BruceForstall ?

Follow the code from emitDispIns?

            if (id->idIsBound())
            {
                if (id->idAddr()->iiaHasInstrCount())
                {
                    printf("%3d instr", id->idAddr()->iiaGetInstrCount());
                }
                else
                {
                    emitPrintLabel(id->idAddr()->iiaIGlabel);
                }
            }
            else
            {
                printf("L_M%03u_" FMT_BB, emitComp->compMethodID, id->idAddr()->iiaBBlabel->bbNum);
            }

@kunalspathak
Copy link
Member

Follow the code from emitDispIns?

Yes, I realized that after asking. So here is the updated diagram with targets updated. Essentially, IG64 jumps back to IG63, so ideally IG63 should be the block that needs alignment, but somehow the align instruction is present to align IG64.


IG15:
  ...
  jmp
  align for IG59       <--------- align instr# 1

-----LOOP# 1 START----------------------

IG59:

IG60:

  branch IG59
-----LOOP# 1 END----------------------
IG63:
   bhs IG113
   ...
   bhi IG65
   align for IG64      <--------- align instr# 2

-----LOOP# 2 START ??? ----------------------
IG64: 		     <-- problematic head
   
   bhs IG73
   ...
   ...
   b IG63
   align for IG66      <--------- align instr# 3
-----LOOP# 2 END----------------------

IG65:
  ...

-----LOOP# 3 START ----------------------
IG66:
  ...

IG70:
  ...
  b IG66
  align for IG76        <--------- align instr# 4
-----LOOP# 3 END----------------------
-----LOOP# 4 START---------------------
IG76:
  ...

IG77:
  ... 
  branch IG76
-----LOOP# 4 END----------------------

@kunalspathak
Copy link
Member

Either we are missing an align instruction for IG63 and instead have it for IG64 or we should be not aligning IG63 / IG64 at all. It is hard to find out from the dumps. I will probably add some asserts in the code that will help us identify the cause, the next time it fails.

@kunalspathak
Copy link
Member

#83286 to print the state if this happens. It might be still hard to understand without a jitdump or a repro, but this should still solve some purpose.

@kunalspathak
Copy link
Member

This issue is not reproducible and consistent. I would remove the blocking label.

@kunalspathak kunalspathak removed the blocking-clean-ci-optional Blocking optional rolling runs label Mar 10, 2023
@jakobbotsch
Copy link
Member

@kunalspathak this repros for me on my Rpi box -- I can try to grab an SPMI context for you.

@jakobbotsch
Copy link
Member

Attached the context. Taken on bee217f

repro-24844.zip

@kunalspathak
Copy link
Member

Attached the context. Taken on bee217f

repro-24844.zip

Thanks @jakobbotsch . I will check it out. This is linux-arm64 context? I opened #83286 to print some traces, but this would be helpful in finding the root cause.

@jakobbotsch
Copy link
Member

This is linux-arm64 context?

Yep.

@kunalspathak
Copy link
Member

I was able to repro it on my local machine with that method context. Thanks once again @jakobbotsch . The problem is while optimizing the layout, we compact a block C into block A, where A is a predecessor block of a loop-block B, which gets the backward jump from block C.

Block_A:

Block_B:            ; needs alignment


Block_C:

    branch Block_B

After optimizing layout:


Block_A_C:

    FALL_THROUGH
Block B:                 ; needs alignment


Block NEW_BLOCK:

   branch Block_A_C

We should disable alignment for such blocks whose header comes before the block that needs alignment. I recall @AndyAyersMS fixed similar issue in #70936, but that would need revision. I will submit a fix later tomorrow.

@kunalspathak
Copy link
Member

I recall @AndyAyersMS fixed similar issue in #70936, but that would need revision.

The code does it job except that during compaction, we fail to update the bbNatLoopNum of the block into which we are compacting one of the loop block into. Hence, we do not realize that we are visiting a loop block before the top block and fail to mark the loop as "do not align".

@kunalspathak
Copy link
Member

Updated #83286 with the fix.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 17, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 20, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants