Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure: XDocumentTests.Streaming.XStreamingElementAPI.NestedXStreamingElementPlusIEnumerable #76636

Closed
BruceForstall opened this issue Oct 4, 2022 · 13 comments · Fixed by #76695
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs JitStress CLR JIT issues involving JIT internal stress modes
Milestone

Comments

@BruceForstall
Copy link
Member

JitStress=1

pipeline: runtime-coreclr libraries-jitstress

https://dev.azure.com/dnceng-public/public/_build/results?buildId=39575&view=ms.vss-test-web.build-test-results-tab

C:\h\w\BA420A2D\w\A633097F\e>set COMPlus 
COMPlus_JitStress=1
COMPlus_TieredCompilation=0

C:\h\w\BA420A2D\w\A633097F\e>call RunTests.cmd --runtime-path C:\h\w\BA420A2D\p 
----- start Tue 10/04/2022  8:40:39.66 ===============  To repro directly: ===================================================== 
pushd C:\h\w\BA420A2D\w\A633097F\e\
"C:\h\w\BA420A2D\p\dotnet.exe" exec --runtimeconfig System.Xml.Linq.Streaming.Tests.runtimeconfig.json --depsfile System.Xml.Linq.Streaming.Tests.deps.json xunit.console.dll System.Xml.Linq.Streaming.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================

C:\h\w\BA420A2D\w\A633097F\e>"C:\h\w\BA420A2D\p\dotnet.exe" exec --runtimeconfig System.Xml.Linq.Streaming.Tests.runtimeconfig.json --depsfile System.Xml.Linq.Streaming.Tests.deps.json xunit.console.dll System.Xml.Linq.Streaming.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Xml.Linq.Streaming.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Xml.Linq.Streaming.Tests (found 39 test cases)
  Starting:    System.Xml.Linq.Streaming.Tests (parallel test collections = on, max threads = 4)
    XDocumentTests.Streaming.XStreamingElementAPI.NestedXStreamingElementPlusIEnumerable [FAIL]
      System.NullReferenceException : Object reference not set to an instance of an object.
      Stack Trace:
        /_/src/libraries/System.Private.Xml.Linq/tests/Streaming/StreamingOutput.cs(489,0): at XDocumentTests.Streaming.XStreamingElementAPI.NestedXStreamingElementPlusIEnumerable()
           at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
        /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodInvoker.cs(64,0): at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
  Finished:    System.Xml.Linq.Streaming.Tests
=== TEST EXECUTION SUMMARY ===

Passes without JitStress (or with JitStress=2).

@dotnet/jit-contrib

@BruceForstall BruceForstall added JitStress CLR JIT issues involving JIT internal stress modes area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs labels Oct 4, 2022
@BruceForstall BruceForstall added this to the 8.0.0 milestone Oct 4, 2022
@ghost
Copy link

ghost commented Oct 4, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

JitStress=1

pipeline: runtime-coreclr libraries-jitstress

https://dev.azure.com/dnceng-public/public/_build/results?buildId=39575&view=ms.vss-test-web.build-test-results-tab

C:\h\w\BA420A2D\w\A633097F\e>set COMPlus 
COMPlus_JitStress=1
COMPlus_TieredCompilation=0

C:\h\w\BA420A2D\w\A633097F\e>call RunTests.cmd --runtime-path C:\h\w\BA420A2D\p 
----- start Tue 10/04/2022  8:40:39.66 ===============  To repro directly: ===================================================== 
pushd C:\h\w\BA420A2D\w\A633097F\e\
"C:\h\w\BA420A2D\p\dotnet.exe" exec --runtimeconfig System.Xml.Linq.Streaming.Tests.runtimeconfig.json --depsfile System.Xml.Linq.Streaming.Tests.deps.json xunit.console.dll System.Xml.Linq.Streaming.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================

C:\h\w\BA420A2D\w\A633097F\e>"C:\h\w\BA420A2D\p\dotnet.exe" exec --runtimeconfig System.Xml.Linq.Streaming.Tests.runtimeconfig.json --depsfile System.Xml.Linq.Streaming.Tests.deps.json xunit.console.dll System.Xml.Linq.Streaming.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Xml.Linq.Streaming.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Xml.Linq.Streaming.Tests (found 39 test cases)
  Starting:    System.Xml.Linq.Streaming.Tests (parallel test collections = on, max threads = 4)
    XDocumentTests.Streaming.XStreamingElementAPI.NestedXStreamingElementPlusIEnumerable [FAIL]
      System.NullReferenceException : Object reference not set to an instance of an object.
      Stack Trace:
        /_/src/libraries/System.Private.Xml.Linq/tests/Streaming/StreamingOutput.cs(489,0): at XDocumentTests.Streaming.XStreamingElementAPI.NestedXStreamingElementPlusIEnumerable()
           at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
        /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodInvoker.cs(64,0): at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
  Finished:    System.Xml.Linq.Streaming.Tests
=== TEST EXECUTION SUMMARY ===

Passes without JitStress (or with JitStress=2).

@dotnet/jit-contrib

Author: BruceForstall
Assignees: -
Labels:

JitStress, area-CodeGen-coreclr, blocking-clean-ci-optional

Milestone: 8.0.0

@BruceForstall
Copy link
Member Author

Similar failure in net7.0-windows-Release-arm64-CoreCLR_checked-jitstress2-Windows.10.Arm64v8.Open, System.Text.Json.Nodes.Tests.ParentPathRootTests.GetPathAndRoot:

D:\h\w\A93608B4\w\B7E40A25\e>set COMPlus 
COMPlus_JitStress=2
COMPlus_TieredCompilation=0

D:\h\w\A93608B4\w\B7E40A25\e>call RunTests.cmd --runtime-path D:\h\w\A93608B4\p 
----- start Tue 10/04/2022  3:09:17.05 ===============  To repro directly: ===================================================== 
pushd D:\h\w\A93608B4\w\B7E40A25\e\
"D:\h\w\A93608B4\p\dotnet.exe" exec --runtimeconfig System.Text.Json.Tests.runtimeconfig.json --depsfile System.Text.Json.Tests.deps.json xunit.console.dll System.Text.Json.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================

D:\h\w\A93608B4\w\B7E40A25\e>"D:\h\w\A93608B4\p\dotnet.exe" exec --runtimeconfig System.Text.Json.Tests.runtimeconfig.json --depsfile System.Text.Json.Tests.deps.json xunit.console.dll System.Text.Json.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Text.Json.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Text.Json.Tests (found 5473 of 5538 test cases)
  Starting:    System.Text.Json.Tests (parallel test collections = on, max threads = 8)
    System.Text.Json.Nodes.Tests.ParentPathRootTests.GetPathAndRoot [FAIL]
      System.NullReferenceException : Object reference not set to an instance of an object.
      Stack Trace:
        /_/src/libraries/System.Text.Json/tests/System.Text.Json.Tests/JsonNode/ParentPathRootTests.cs(55,0): at System.Text.Json.Nodes.Tests.ParentPathRootTests.GetPathAndRoot()
           at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
        /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodInvoker.cs(64,0): at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
  Finished:    System.Text.Json.Tests
=== TEST EXECUTION SUMMARY ===

@AndyAyersMS
Copy link
Member

I'll take a look.

@AndyAyersMS AndyAyersMS self-assigned this Oct 5, 2022
@AndyAyersMS
Copy link
Member

For the first failure above, on arm64 windows at 144a33a, JitStressRange isolates this method:

JIT compiled XDocumentTests.Streaming.XStreamingElementAPI:AddIEnumerableOfXNodesPlusAttribute():this [FullOpts, IL size=199, code size=1288, hash=0x4322c3cc JitStress]

(there are possibly other methods that also lead to failures when stressed).

@AndyAyersMS
Copy link
Member

Fails with

set COMPlus_JitStressRange=4322c3cc
set COMPlus_JitStressModeNamesOnly=1
set COMPlus_JitStressModeNames=STRESS_BB_PROFILE, STRESS_RANDOM_INLINE

There is a phi-based RBO. Disabling this opt fixes the problem. Debugging now to see if this transformation is wrong or enables something else downstream to go wrong.

--- Trying RBO in BB08 ---
Relop [000333] BB08 value unknown, trying inference
... JT-PHI [interestingVN] in BB08 relop first operand VN is PhiDef for V28:3 $1b7
N003 (  5,  5) [000333] J------N---                         *  EQ        int    $408
N001 (  3,  2) [000331] -----------                         +--*  LCL_VAR   ref    V28 tmp24        u:3 $284
N002 (  1,  2) [000332] -----------                         \--*  CNS_INT   ref    null $VN.Null
Found local PHI [000587] for V28
... substituting ($1a1,$0) for ($284,$0) in $408 gives $40
... substituted VN implies relop is 0 when coming from pred BB06
BB06 is a false pred
Could not map phi inputs from pred BB07
BB07 is an ambiguous pred
Could not map phi inputs from pred BB16
BB16 is an ambiguous pred
Optimizing via jump threading
Jump flow from pred BB06 -> BB08 implies predicate false; we can safely redirect flow to be BB06 -> BB09
Setting edge weights for BB06 -> BB09 to [0 .. 3.402823e+38]
Will retry RBO in BB08 after partial optimization

@AndyAyersMS
Copy link
Member

Optimization seems like it is correct: in BB06 the nullcheck produces $1a1 which is a non-null VN, BB08 has a null test of that value, which will be false if control comes from BB06.

***** BB06
STMT00093 ( 0x098[E-] ... ??? )
N002 (  2,  2) [000325] ---X-------                         *  NULLCHECK byte   $VN.Void
N001 (  1,  1) [000324] -----------                         \--*  LCL_VAR   ref    V02 loc1         u:2 $1a1

***** BB06
STMT00094 ( INL22 @ 0x000[E-] ... ??? ) <- INL21 @ 0x000[E-] <- INLRT @ 0x098[E-]
N003 (  5,  4) [000330] -A------R--                         *  ASG       ref    $VN.Void
N002 (  3,  2) [000329] D------N---                         +--*  LCL_VAR   ref    V28 tmp24        d:2 $VN.Void
N001 (  1,  1) [000320] -----------                         \--*  LCL_VAR   ref    V02 loc1         u:2 (last use) $1a1

--------------------

***** BB08
STMT00123 ( ??? ... ??? )
N006 (  0,  0) [000587] -A------R--                         *  ASG       ref    $VN.Void
N005 (  0,  0) [000585] D------N---                         +--*  LCL_VAR   ref    V28 tmp24        d:3 $VN.Void
N004 (  0,  0) [000586] -----------                         \--*  PHI       ref    $284
N001 (  0,  0) [000599] ----------- pred BB07                  +--*  PHI_ARG   ref    V28 tmp24        u:5
N002 (  0,  0) [000596] ----------- pred BB16                  +--*  PHI_ARG   ref    V28 tmp24        u:4
N003 (  0,  0) [000595] ----------- pred BB06                  \--*  PHI_ARG   ref    V28 tmp24        u:2 $1a1

***** BB08
STMT00095 ( INL22 @ 0x00B[E-] ... ??? ) <- INL21 @ 0x000[E-] <- INLRT @ 0x098[E-]
N004 (  7,  7) [000334] -----------                         *  JTRUE     void   $VN.Void
N003 (  5,  5) [000333] J------N---                         \--*  EQ        int    $408
N001 (  3,  2) [000331] -----------                            +--*  LCL_VAR   ref    V28 tmp24        u:3 $284
N002 (  1,  2) [000332] -----------                            \--*  CNS_INT   ref    null $VN.Null

@AndyAyersMS
Copy link
Member

Looks like this is an issue where phi-based RBO leaves an invalid SSA graph that trips up assertion prop. Pre RBO we have

image (4)

VN is able to prove that V28.2 can't be null, and so phi-based RBO is able to prove that if flow follows from BB06 -> BB08 it must then proceed to BB09. Post RBO we have updated the flow but not the SSA graph:

image (3)

Note that BB09 and BB10 should now have PHI nodes, but don't, and they are reading the wrong SSA defs.

Assertion prop comes along and is now able to deduce that V28.3 is not zero in BB08 which is not the case.

I don't see an easy fix here yet. Phi-based RBO either needs to properly update SSA (which will be tricky given that dominators are now wrong, and we do not track where to find an SSA def's uses) or else avoid making changes in cases where the bypassed PHI value (V28.03, here) is live outside of the jump threading block (BB08, here). Unfortunately we don't have accurate last-use info; copy prop messes this up.

Given this issue and #76507 I am going to disable phi-based RBO for now while I think about whether it can be salvaged somehow.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Oct 6, 2022
This is exposing our lack of SSA update and leading downstream opts like CSE
and assertion prop to make bad decisions.

Disabling for now until I have time to figure out how to safely enable.

Fixes dotnet#76636, dotnet#76507
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 6, 2022
@BruceForstall
Copy link
Member Author

Given this issue and #76507 I am going to disable phi-based RBO for now while I think about whether it can be salvaged somehow.

Presumably we could re-build SSA/VN after RBO (if necessary)? Also presumably that would be too expensive for JITing, but maybe ok for AOT?

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 6, 2022
@BruceForstall
Copy link
Member Author

Are there any post-phase asserts we could add for the SSA form, especially asserts that might have prevented this issue? E.g., simplistically, can we assert that the PHI arity must match the count of immediate predecessors?

@AndyAyersMS
Copy link
Member

Are there any post-phase asserts we could add for the SSA form, especially asserts that might have prevented this issue? E.g., simplistically, can we assert that the PHI arity must match the count of immediate predecessors?

SSA does not give us one phi arg per pred; it has one phi arg per reaching SSA def (see eg SsaBuilder::AddPhiArg and callers). So there is no easy local check. This is something we should consider fixing, though there are some pathological cases where the phi arg list can grow very big (note: it can do this already, so the current "add if not found" logic will be quadratic for the right cases, but presumably the opt circuit breakers prevent us from getting burned too badly here).

Even nicer perhaps would be to have the phi arg ordering reflect the pred list ordering. But this might be too painful to maintain if/when we decide to support some form of SSA update.

@BruceForstall
Copy link
Member Author

BruceForstall commented Oct 6, 2022

SSA does not give us one phi arg per pred; it has one phi arg per reaching SSA def (see eg SsaBuilder::AddPhiArg and callers).

It seems like there are at least a few things relating preds and PHI args, though. E.g.,

  1. If a block has a single pred, it seems it shouldn't have any PHIs because there is no merging.
  2. A PHI can't have more args than block preds: if so, then one of the preds itself should have a PHI instead for the merge
  3. A PHI could have fewer args than preds assuming one arg represent a def that reaches along multiple pred paths
  4. A PHI should have at least two args

?

@BruceForstall
Copy link
Member Author

Even nicer perhaps would be to have the phi arg ordering reflect the pred list ordering. But this might be too painful to maintain if/when we decide to support some form of SSA update.

I see AddPhiArg says:

    // The argument order doesn't matter so just insert at the front of the list because
    // it's easier. It's also easier to insert in linear order since the first argument
    // will be first in linear order as well.

If phi args aren't 1-to-1 with preds then it seems like ordering is less interesting?

@AndyAyersMS
Copy link
Member

If the phi args name their preds (like they sort of do now) then ordering is not important.

Let me think about what sort of SSA checks we can plausibly do. I suspect we'll find that we very quickly do damage, and it will seem even more incautious to keep leveraging SSA the way we do now.

However perhaps we can track enough to allow a more limited form of the phi-disambiguation to be turned on (if we can show that no SSA update is needed). The main thing we need to know is if all the uses of a phi def are in the same block as the phi def.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs JitStress CLR JIT issues involving JIT internal stress modes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants