Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libraries-pgo] arm64 failure in System.Tests.DoubleTests.ParsePatterns #71005

Closed
AndyAyersMS opened this issue Jun 20, 2022 · 5 comments · Fixed by #71245
Closed

[libraries-pgo] arm64 failure in System.Tests.DoubleTests.ParsePatterns #71005

AndyAyersMS opened this issue Jun 20, 2022 · 5 comments · Fixed by #71245
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

This test has been failing off and on for a while with random PGO (and possibly other modes)

One common failure mode is

set COMPlus_TieredCompilation=1
set COMPlus_ReadyToRun=0
set COMPlus_TC_QuickJitForLoops=1
set COMPlus_TieredPGO=1
set COMPlus_JitRandomEdgeCounts=1

    System.Tests.DoubleTests.ParsePatterns [FAIL]
      Assert.Equal() Failure
                      ↓ (pos 6)
      Expected: 00000062
      Actual:   00000000
                      ↑ (pos 6)
      Stack Trace:
        C:\repos\runtime0\src\libraries\System.Runtime\tests\System\DoubleTests.cs(433,0): at System.Tests.DoubleTests.ParsePatterns()
           at InvokeStub_DoubleTests.ParsePatterns(Object, Object, IntPtr*)
           at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)

Though I have also seen

  Starting:    System.Runtime.Tests (parallel test collections = off, max threads = 8)
    System.Tests.DoubleTests.ParsePatterns [FAIL]
      System.AccessViolationException : Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
      Stack Trace:
        C:\repos\runtime0\src\libraries\System.Private.CoreLib\src\System\BitConverter.cs(678,0): at System.BitConverter.<>c.<ToString>b__43_0(Span`1 dst, ValueTuple`3 state)
        C:\repos\runtime0\src\libraries\System.Runtime\tests\System\DoubleTests.cs(433,0): at System.Tests.DoubleTests.ParsePatterns()
           at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
        C:\repos\runtime0\src\libraries\System.Private.CoreLib\src\System\Reflection\MethodInvoker.cs(69,0): at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)

and gc heap corruption.

Repros fairly readily with just this test enabled (add -parallel none -method System.Tests.DoubleTests.ParsePatterns to xunit args).

Binary search via COMPlus_JitEnablePgoRange indicates the problematic method is

Compiling 7228 System.Tests.DoubleTests::ParsePatterns, IL size = 287, hash=0xb7003f40 Tier1-OSR @0x102
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 20, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 20, 2022
@AndyAyersMS AndyAyersMS self-assigned this Jun 20, 2022
@ghost
Copy link

ghost commented Jun 20, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This test has been failing off and on for a while with random PGO (and possibly other modes)

One common failure mode is

set COMPlus_TieredCompilation=1
set COMPlus_ReadyToRun=0
set COMPlus_TC_QuickJitForLoops=1
set COMPlus_TieredPGO=1
set COMPlus_JitRandomEdgeCounts=1

    System.Tests.DoubleTests.ParsePatterns [FAIL]
      Assert.Equal() Failure
                      ↓ (pos 6)
      Expected: 00000062
      Actual:   00000000
                      ↑ (pos 6)
      Stack Trace:
        C:\repos\runtime0\src\libraries\System.Runtime\tests\System\DoubleTests.cs(433,0): at System.Tests.DoubleTests.ParsePatterns()
           at InvokeStub_DoubleTests.ParsePatterns(Object, Object, IntPtr*)
           at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)

Though I have also seen

  Starting:    System.Runtime.Tests (parallel test collections = off, max threads = 8)
    System.Tests.DoubleTests.ParsePatterns [FAIL]
      System.AccessViolationException : Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
      Stack Trace:
        C:\repos\runtime0\src\libraries\System.Private.CoreLib\src\System\BitConverter.cs(678,0): at System.BitConverter.<>c.<ToString>b__43_0(Span`1 dst, ValueTuple`3 state)
        C:\repos\runtime0\src\libraries\System.Runtime\tests\System\DoubleTests.cs(433,0): at System.Tests.DoubleTests.ParsePatterns()
           at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
        C:\repos\runtime0\src\libraries\System.Private.CoreLib\src\System\Reflection\MethodInvoker.cs(69,0): at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)

and gc heap corruption.

Repros fairly readily with just this test enabled (add -parallel none -method System.Tests.DoubleTests.ParsePatterns to xunit args).

Binary search via COMPlus_JitEnablePgoRange indicates the problematic method is

Compiling 7228 System.Tests.DoubleTests::ParsePatterns, IL size = 287, hash=0xb7003f40 Tier1-OSR @0x102
Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS AndyAyersMS added this to the 7.0.0 milestone Jun 20, 2022
@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Jun 20, 2022
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jun 21, 2022

Debugging this, found it easiest to extract the method as a standalone repro

using System;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;

class Assert
{
    public static void Equal(string s1, string s2)
    {
        if (!s1.Equals(s2))
        {
            Console.WriteLine($"Assert failed, '{s1}' != '{s2}'");
        }
    }
}

class X {

    internal static string SplitPairs(string input)
    {
        if (!BitConverter.IsLittleEndian)
        {
            return input.Replace("-", "");
        }

        return string.Concat(input.Split('-').Select(pair => Reverse(pair)));
    }

    internal static string Reverse(string s)
    {
        char[] charArray = s.ToCharArray();
        Array.Reverse(charArray);
        return new string(charArray);
    }

    public static void Main()
    {
        string path = Directory.GetCurrentDirectory();
        using (FileStream file = new FileStream(Path.Combine(path, "ibm-fpgen.txt"), FileMode.Open, FileAccess.Read))
        {
            using (var streamReader = new StreamReader(file))
            {
                int count = 0;
                string line = streamReader.ReadLine();
                while (line != null)
                {
                    count++;
                    string[] data = line.Split(' ');
                    string inputHalfBytes = data[0];
                    string inputFloatBytes = data[1];
                    string inputDoubleBytes = data[2];
                    string correctValue = data[3];

                    double doubleValue = double.Parse(correctValue, NumberFormatInfo.InvariantInfo);
                    string doubleBytes = BitConverter.ToString(BitConverter.GetBytes(doubleValue));
                    float floatValue = float.Parse(correctValue, NumberFormatInfo.InvariantInfo);
                    string floatBytes = BitConverter.ToString(BitConverter.GetBytes(floatValue));
                    Half halfValue = Half.Parse(correctValue, NumberFormatInfo.InvariantInfo);
                    string halfBytes = BitConverter.ToString(BitConverter.GetBytes(halfValue));

                    doubleBytes = SplitPairs(doubleBytes);
                    floatBytes = SplitPairs(floatBytes);
                    halfBytes = SplitPairs(halfBytes);

                    if (BitConverter.IsLittleEndian)
                    {
                        doubleBytes = Reverse(doubleBytes);
                        floatBytes = Reverse(floatBytes);
                        halfBytes = Reverse(halfBytes);
                    }

                    Assert.Equal(doubleBytes, inputDoubleBytes);
                    Assert.Equal(floatBytes, inputFloatBytes);
                    Assert.Equal(halfBytes, inputHalfBytes);
                    line = streamReader.ReadLine();
                }

                Console.WriteLine($"Passed {count} tests");
            }
        }
    }
}

Then fix jit so it only puts a patchpoint at offset 0xED, and run with

set COMPlus_TieredPGO=1
set COMPlus_JitRandomEdgeCounts=1
set COMPlus_ReadyToRun=0

and provide the input file that the test needs.

Looks like for some reason the untracked lifetime GC info the runtime sees is different than what the jit thinks it produces. In particular the jit places a GC ref at FP offset 0xE0 and consistently uses it this way in the code:

;  V311 tmp300      [V311,T25] (  3,   36.53)     ref  ->  [fp+E0H]   do-not-enreg[] V99.Item1(offs=0x00) P-DEP "field V99.Item1 (fldOffset=0x0)"
Stack slot id for offset 224 (0xe0) (frame) (untracked) = 7.

        97FEF43B          bl      CORINFO_HELP_NEWARR_1_VC
        FD000808          str     d8, [x0,#16]
        F9400724          ldr     x4, [x25,#8]
        F9006FA0          str     x0, [fp,#216]	// [V310 tmp299]
        B900E3BF          str     wzr, [fp,#224]	// [V311 tmp300]

but at runtime the decoded GC info shows all the untracked slots are 0x20 bytes lower and complains that the 0xC0 slot is not a valid object ref.

;; runtime
Untracked: +Fp+268 +Fp+260 +Fp+100 +Fp+f8 +Fp+d8 +Fp+c0 +Fp+a8 +Fp+88 +Fp+80 +Fp+60 +Fp+58 +Fp+38 +Fp+30 +Fp+278(interior) +Fp+110(interior) +Fp+98(interior) +Fp+70(interior) +Fp+48(interior)

;; jit  w/ notes
Stack slot id for offset 640 (0x280) (frame) (untracked) = 0.
Stack slot id for offset 648 (0x288) (frame) (untracked) = 1.
Stack slot id for offset 664 (0x298) (frame) (byref, untracked) = 2.   (278)
Stack slot id for offset 280 (0x118) (frame) (untracked) = 3.
Stack slot id for offset 288 (0x120) (frame) (untracked) = 4.
Stack slot id for offset 304 (0x130) (frame) (byref, untracked) = 5.   (110)
Stack slot id for offset 248 (0xf8) (frame) (untracked) = 6.
** Stack slot id for offset 224 (0xe0) (frame) (untracked) = 7.   (c0 == 192); e0 == 224 **
Stack slot id for offset 200 (0xc8) (frame) (untracked) = 8.
Stack slot id for offset 160 (0xa0) (frame) (untracked) = 9.
Stack slot id for offset 168 (0xa8) (frame) (untracked) = 10.
Stack slot id for offset 184 (0xb8) (frame) (byref, untracked) = 11.  (98)
Stack slot id for offset 120 (0x78) (frame) (untracked) = 12.
Stack slot id for offset 128 (0x80) (frame) (untracked) = 13.
Stack slot id for offset 144 (0x90) (frame) (byref, untracked) = 14.   (70)
Stack slot id for offset 80 (0x50) (frame) (untracked) = 15.  (30)
Stack slot id for offset 88 (0x58) (frame) (untracked) = 16.
Stack slot id for offset 104 (0x68) (frame) (byref, untracked) = 17. (48)

Other oddities: the jit reports 54 slots but the runtime only reports seeing 50.

So perhaps the GC info is not getting encoded properly or is getting corrupted somehow?

@AndyAyersMS
Copy link
Member Author

Looking at the GC info right after the OSR method is created, it seems match up with what the runtime reports later on. So there seems to be a discrepancy between what the jit thinks it is recording and what actually gets recorded.

@AndyAyersMS
Copy link
Member Author

Ah, I think the issue is that we are double reporting a stack slot

;  V110 tmp99       [V110    ] (  5,   6.09)  struct (16) [fp+C0H]   do-not-enreg[SFA] multireg-arg must-init "Inlining Arg"
;  V332 tmp321      [V332,T150] (  3,   4.35)     ref  ->  [fp+C0H]   do-not-enreg[] V110.Item1(offs=0x00) P-DEP "field V110.Item1 (fldOffset=0x0)"
...
Stack slot id for offset 192 (0xc0) (frame) (untracked) = 7.
...
Stack slot id for offset 192 (0xc0) (frame) = 24.

and this likely leads to the failures.

Seems like we can fix this by checking if an untracked on-frame (dependently) promoted struct has tracked GC fields, but perhaps the intention was that this combination should be impossible. If so perhaps OSR is missing some key bit of logic somewhere...?

@dotnet/jit-contrib does this sound familiar to anyone?

@AndyAyersMS
Copy link
Member Author

Semi-related: #67825 (comment) (also an untracked struct with gc fields).

One possible fix would be in gc encoding. If we have an untracked, on-frame local struct with GC fields, and those fields are dependently promoted and tracked, we should report the tracked promoted fields and not the untracked gc offsets from the raw struct layout.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Jun 24, 2022
If there is a gc struct local that is dependently promoted, the struct
local may be untracked while the promoted gc fields of the struct are
tracked.

If so, the jit will double report the stack offset for the gc field,
first as an untracked slot, and then as a tracked slot.

Detect this case and report the slot as tracked only.

Closes dotnet#71005.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 24, 2022
AndyAyersMS added a commit that referenced this issue Jun 29, 2022
If there is a gc struct local that is dependently promoted, the struct
local may be untracked while the promoted gc fields of the struct are
tracked.

If so, the jit will double report the stack offset for the gc field,
first as an untracked slot, and then as a tracked slot.

Detect this case and report the slot as tracked only.

Closes #71005.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 29, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant