Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed 'paddingToAdd == paddingNeeded' in 'System.SpanTests.IndexOfAnyExceptTests`1[Byte][System.Byte]:SomeElementsDontMatch_ReturnsOffset(int,System.Int32[]):this' #68510

Closed
BruceForstall opened this issue Apr 25, 2022 · 5 comments · Fixed by #68869
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Milestone

Comments

@BruceForstall
Copy link
Member

pipeline: runtime-coreclr libraries-jitstress
job: net7.0-windows-Release-x86-CoreCLR_checked-jitstress1-Windows.10.Amd64.Open
test: System.Memory.Tests

https://dev.azure.com/dnceng/public/_build/results?buildId=1736001&view=ms.vss-test-web.build-test-results-tab&runId=46986296&paneView=debug

C:\h\w\A5DE098F\w\B5E40A10\e>set COMPlus 
COMPlus_JitStress=1
COMPlus_TieredCompilation=0

C:\h\w\A5DE098F\w\B5E40A10\e>call RunTests.cmd --runtime-path C:\h\w\A5DE098F\p 
----- start Mon 04/25/2022  8:03:28.98 ===============  To repro directly: ===================================================== 
pushd C:\h\w\A5DE098F\w\B5E40A10\e\
"C:\h\w\A5DE098F\p\dotnet.exe" exec --runtimeconfig System.Memory.Tests.runtimeconfig.json --depsfile System.Memory.Tests.deps.json xunit.console.dll System.Memory.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================

C:\h\w\A5DE098F\w\B5E40A10\e>"C:\h\w\A5DE098F\p\dotnet.exe" exec --runtimeconfig System.Memory.Tests.runtimeconfig.json --depsfile System.Memory.Tests.deps.json xunit.console.dll System.Memory.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Memory.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Memory.Tests (found 2406 of 2425 test cases)
  Starting:    System.Memory.Tests (parallel test collections = on, max threads = 4)

Assert failure(PID 5760 [0x00001680], Thread: 4896 [0x1320]): Assertion failed 'paddingToAdd == paddingNeeded' in 'System.SpanTests.IndexOfAnyExceptTests`1[Byte][System.Byte]:SomeElementsDontMatch_ReturnsOffset(int,System.Int32[]):this' during 'Emit code' (IL size 817; hash 0x8f2948ff; FullOpts)

    File: D:\a\_work\1\s\src\coreclr\jit\emitxarch.cpp Line: 10144
    Image: C:\h\w\A5DE098F\p\dotnet.exe

----- end Mon 04/25/2022  8:04:21.14 ----- exit code -1073740286 ----------------------------------------------------------

@kunalspathak @dotnet/jit-contrib

@BruceForstall BruceForstall added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 25, 2022
@BruceForstall BruceForstall added this to the 7.0.0 milestone Apr 25, 2022
@ghost
Copy link

ghost commented Apr 25, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

pipeline: runtime-coreclr libraries-jitstress
job: net7.0-windows-Release-x86-CoreCLR_checked-jitstress1-Windows.10.Amd64.Open
test: System.Memory.Tests

https://dev.azure.com/dnceng/public/_build/results?buildId=1736001&view=ms.vss-test-web.build-test-results-tab&runId=46986296&paneView=debug

C:\h\w\A5DE098F\w\B5E40A10\e>set COMPlus 
COMPlus_JitStress=1
COMPlus_TieredCompilation=0

C:\h\w\A5DE098F\w\B5E40A10\e>call RunTests.cmd --runtime-path C:\h\w\A5DE098F\p 
----- start Mon 04/25/2022  8:03:28.98 ===============  To repro directly: ===================================================== 
pushd C:\h\w\A5DE098F\w\B5E40A10\e\
"C:\h\w\A5DE098F\p\dotnet.exe" exec --runtimeconfig System.Memory.Tests.runtimeconfig.json --depsfile System.Memory.Tests.deps.json xunit.console.dll System.Memory.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================

C:\h\w\A5DE098F\w\B5E40A10\e>"C:\h\w\A5DE098F\p\dotnet.exe" exec --runtimeconfig System.Memory.Tests.runtimeconfig.json --depsfile System.Memory.Tests.deps.json xunit.console.dll System.Memory.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Memory.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Memory.Tests (found 2406 of 2425 test cases)
  Starting:    System.Memory.Tests (parallel test collections = on, max threads = 4)

Assert failure(PID 5760 [0x00001680], Thread: 4896 [0x1320]): Assertion failed 'paddingToAdd == paddingNeeded' in 'System.SpanTests.IndexOfAnyExceptTests`1[Byte][System.Byte]:SomeElementsDontMatch_ReturnsOffset(int,System.Int32[]):this' during 'Emit code' (IL size 817; hash 0x8f2948ff; FullOpts)

    File: D:\a\_work\1\s\src\coreclr\jit\emitxarch.cpp Line: 10144
    Image: C:\h\w\A5DE098F\p\dotnet.exe

----- end Mon 04/25/2022  8:04:21.14 ----- exit code -1073740286 ----------------------------------------------------------

@kunalspathak @dotnet/jit-contrib

Author: BruceForstall
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: 7.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 25, 2022
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 25, 2022
@kunalspathak kunalspathak self-assigned this Apr 25, 2022
@BruceForstall
Copy link
Member Author

Also seen in:

runtime-coreclr libraries-jitstress2-jitstressregs
net7.0-windows-Release-x64-CoreCLR_checked-jitstress2_jitstressregs0x80-Windows.10.Amd64.Open
System.Data.Common.Tests

https://dev.azure.com/dnceng/public/_build/results?buildId=1734215&view=ms.vss-test-web.build-test-results-tab&runId=46958230&paneView=debug&resultId=192407

C:\h\w\A76908CE\w\A7230904\e>set COMPlus 
COMPlus_JitStress=2
COMPlus_JitStressRegs=0x80
COMPlus_TieredCompilation=0

C:\h\w\A76908CE\w\A7230904\e>call RunTests.cmd --runtime-path C:\h\w\A76908CE\p 
----- start Sat 04/23/2022 10:40:00.03 ===============  To repro directly: ===================================================== 
pushd C:\h\w\A76908CE\w\A7230904\e\
"C:\h\w\A76908CE\p\dotnet.exe" exec --runtimeconfig System.Data.Common.Tests.runtimeconfig.json --depsfile System.Data.Common.Tests.deps.json xunit.console.dll System.Data.Common.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================

C:\h\w\A76908CE\w\A7230904\e>"C:\h\w\A76908CE\p\dotnet.exe" exec --runtimeconfig System.Data.Common.Tests.runtimeconfig.json --depsfile System.Data.Common.Tests.deps.json xunit.console.dll System.Data.Common.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Data.Common.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Data.Common.Tests (found 1814 of 1816 test cases)
  Starting:    System.Data.Common.Tests (parallel test collections = on, max threads = 4)

Assert failure(PID 5016 [0x00001398], Thread: 5416 [0x1528]): Assertion failed 'paddingToAdd == paddingNeeded' in 'System.Data.Tests.UniqueConstraintTest2:Ctor_NameDataColumnsPrimary():this' during 'Emit code' (IL size 169; hash 0x8ae9c3e5; FullOpts)

    File: D:\a\_work\1\s\src\coreclr\jit\emitxarch.cpp Line: 10144
    Image: C:\h\w\A76908CE\p\dotnet.exe

----- end Sat 04/23/2022 10:40:21.49 ----- exit code -1073740286 ----------------------------------------------------------

@kunalspathak
Copy link
Member

kunalspathak commented May 4, 2022

Here is what is happening here. We have a loop IG05 to IG07. IG06 is a block that ends with jmp and we add align instruction to align a future loop that starts at IG14. When setting the backedges, we realize that IG14 loop encloses another loop and hence we won't align it and remove the IGF_HAS_ALIGN from IG06. However, we do not remove the 15 bytes corresponding to the align from the igSize of IG06 because that will involve readjustments of offset and size of all IGs that follows it. Instead, we defer that to happen when we would calculate the alignment adjustment inside emitLoopAlignAdjustments().

We would first do the adjustment calculation of IG05 loop by calculating the loop size. The loop size includes the 15 bytes of IG06 as well. During adjustment calculation of IG06 loop, we get rid of those 15 bytes that are not needed. This mismatches the loop size calculation of IG05 when we verify it during outputting the align instruction and hence hit the assert.

IG04:
      align ; for IG05

IG05:
      ...
      jg IG07

IG06:
      ...
      jmp IG11
      align ; for IG14

IG07:    ; added by LSRA for resolution
      ...
      jmp IG05     

I did add an extra flag IGF_REMOVED_ALIGN as seen in #60787 (comment), but don't remember what workarounds I discussed with @BruceForstall and removed it. This issue might be there since then, but may be it was hidden until now because problematic loops were not aligned somehow and we never reached the point of assert. I will add few asserts to just see how many times we hit this problem in spmi.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 4, 2022
@kunalspathak
Copy link
Member

I will double check why placement phase decided to put align in a block that was part of a loop.

@kunalspathak
Copy link
Member

but don't remember what workarounds I discussed with @BruceForstall and removed it

Now I remember this. We decided to add a phase placeLoopAlignInstructions after LSRA that will rely upon bbNatLoopNum to decide which block is profitable to put align instruction into. However, the resolution blocks changes the ordering of the lexical flow and that disrupts our ability to decide the correct block to add align inside placeLoopAlignInstructions.

Here is an example for the repro. The loop L00 is BB04 ~ BB05 as seen below.

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
...
BB04 [0011]  2       BB03,BB05             4         0 [000..000)-> BB07 ( cond )                     i internal Loop bwd bwd-target LIR align 
BB05 [0013]  1       BB04                  4         0 [000..000)-> BB04 ( cond )                     i internal bwd LIR 

During LSRA, we split the edge BB05 -> BB04 and add BB50. We place BB50 after BB06 as seen here. Since we would lexically go through the basic blocks during codegen, we end up having a non-loop block BB06 inside BB04 ~ BB50 loop.

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
...
BB04 [0011]  2       BB03,BB50             4         0 [000..000)-> BB07 ( cond )                     i internal Loop bwd bwd-target LIR align 
BB05 [0013]  1       BB04                  4         0 [000..000)-> BB50 ( cond )                     i internal bwd LIR 
BB06 [0015]  2       BB05,BB49             0.50        [000..000)-> BB08 (always)                     i internal LIR 
BB50 [0205]  1       BB05                  2           [???..???)-> BB04 (always)                     internal bwd LIR 

Since it is not part of a loop, its bbNatLoopNum is NOT_IN_LOOP. In placeLoopAlignInstructions, we chose BB06 as the block to place align instruction. However, during loopSize calculation, since we calculate the code size lexically of all the blocks between top and bottom of the loop, we also account for BB06's size that leads to this assert.

To conclude, bbNatLoopNum is not 100% accurate (one of the place where this is fixed is in #68804), it might not be safe to rely on it all the time. I will think if we can fix it easily.

@BruceForstall BruceForstall added the blocking-clean-ci-optional Blocking optional rolling runs label May 5, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 11, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Projects
None yet
2 participants