Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure JIT\\HardwareIntrinsics\\General\\Vector256\\Vector256_r\\Vector256_r.cmd #76280

Closed
v-wenyuxu opened this issue Sep 28, 2022 · 68 comments · Fixed by #78537
Closed
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitStress CLR JIT issues involving JIT internal stress modes Known Build Error Use this to report build issues in the .NET Helix tab os-windows
Milestone

Comments

@v-wenyuxu
Copy link

v-wenyuxu commented Sep 28, 2022

Run: runtime-coreclr jitstress 20220926.3

Failed test:

coreclr windows x64 Checked zapdisable @ Windows.10.Amd64.Open

- JIT\\HardwareIntrinsics\\General\\Vector256\\Vector256_r\\Vector256_r.cmd

Error message:

Return code:      1
Raw output file:      C:\h\w\B3BB09D9\w\A3F60922\uploads\Reports\JIT.HardwareIntrinsics\General\Vector256\Vector256_r\Vector256_r.output.txt
Raw output:
BEGIN EXECUTION
"C:\h\w\B3BB09D9\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  Vector256_r.dll
Beginning test case Abs.Byte at 9/27/2022 7:13:05 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:06 AM
Beginning test case Abs.Double at 9/27/2022 7:13:06 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:07 AM
Beginning test case Abs.Int16 at 9/27/2022 7:13:07 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:07 AM
Beginning test case Abs.Int32 at 9/27/2022 7:13:07 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:07 AM
Beginning test case Abs.Int64 at 9/27/2022 7:13:07 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:07 AM
Beginning test case Abs.SByte at 9/27/2022 7:13:07 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:07 AM
Beginning test case Abs.Single at 9/27/2022 7:13:07 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginnin


Stack trace
   at JIT_HardwareIntrinsics._General_Vector256_Vector256_r_Vector256_r_._General_Vector256_Vector256_r_Vector256_r_cmd()
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
{ 
    "ErrorMessage":"JIT.HardwareIntrinsics\\General\\Vector256\\Vector256_",
    "BuildRetry": false
} 

Report

Build Definition Test Pull Request
78807 dotnet/runtime JIT\HardwareIntrinsics\General\Vector256\Vector256_ro\Vector256_ro.cmd
78797 dotnet/runtime JIT\HardwareIntrinsics\General\Vector256\Vector256_ro\Vector256_ro.cmd #77737
77376 dotnet/runtime JIT.HardwareIntrinsics.General.Vector256.WorkItemExecution
76912 dotnet/runtime JIT\HardwareIntrinsics\General\Vector256\Vector256_ro\Vector256_ro.cmd #77728
76381 dotnet/runtime JIT\HardwareIntrinsics\General\Vector256\Vector256_ro\Vector256_ro.cmd #77990
76377 dotnet/runtime JIT\HardwareIntrinsics\General\Vector256\Vector256_ro\Vector256_ro.cmd #77353
76043 dotnet/runtime JIT\HardwareIntrinsics\General\Vector256\Vector256_ro\Vector256_ro.cmd #76793
75886 dotnet/runtime JIT\HardwareIntrinsics\General\Vector256\Vector256_ro\Vector256_ro.cmd
70948 dotnet/runtime JIT\HardwareIntrinsics\General\Vector256\Vector256_ro\Vector256_ro.cmd #77798
68929 dotnet/runtime JIT.HardwareIntrinsics.General.Vector256.WorkItemExecution
67658 dotnet/runtime JIT.HardwareIntrinsics.General.Vector256.WorkItemExecution
66875 dotnet/runtime JIT.HardwareIntrinsics.General.Vector256.WorkItemExecution
66220 dotnet/runtime JIT.HardwareIntrinsics.General.Vector256.WorkItemExecution #73472
64828 dotnet/runtime JIT.HardwareIntrinsics.General.Vector256.WorkItemExecution
62054 dotnet/runtime JIT.HardwareIntrinsics.General.Vector256.WorkItemExecution

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 15
@v-wenyuxu v-wenyuxu added os-windows JitStress CLR JIT issues involving JIT internal stress modes arch-x64 blocking-clean-ci-optional Blocking optional rolling runs labels Sep 28, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 28, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 28, 2022
@ghost
Copy link

ghost commented Sep 28, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run: runtime-coreclr jitstress 20220926.3

Failed test:

coreclr windows x64 Checked zapdisable @ Windows.10.Amd64.Open

- JIT\\HardwareIntrinsics\\General\\Vector256\\Vector256_r\\Vector256_r.cmd

Error message:

Return code:      1
Raw output file:      C:\h\w\B3BB09D9\w\A3F60922\uploads\Reports\JIT.HardwareIntrinsics\General\Vector256\Vector256_r\Vector256_r.output.txt
Raw output:
BEGIN EXECUTION
"C:\h\w\B3BB09D9\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  Vector256_r.dll
Beginning test case Abs.Byte at 9/27/2022 7:13:05 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:06 AM
Beginning test case Abs.Double at 9/27/2022 7:13:06 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:07 AM
Beginning test case Abs.Int16 at 9/27/2022 7:13:07 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:07 AM
Beginning test case Abs.Int32 at 9/27/2022 7:13:07 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:07 AM
Beginning test case Abs.Int64 at 9/27/2022 7:13:07 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:07 AM
Beginning test case Abs.SByte at 9/27/2022 7:13:07 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario

Ending test case at 9/27/2022 7:13:07 AM
Beginning test case Abs.Single at 9/27/2022 7:13:07 AM
Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunClsVarScenario
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassLclFldScenario
Beginnin


Stack trace
   at JIT_HardwareIntrinsics._General_Vector256_Vector256_r_Vector256_r_._General_Vector256_Vector256_r_Vector256_r_cmd()
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Author: v-wenyuxu
Assignees: -
Labels:

os-windows, JitStress, arch-x64, area-CodeGen-coreclr, blocking-clean-ci-optional

Milestone: -

@BruceForstall
Copy link
Member

The error is:

      Vector256.ConvertToDouble<Double>(Vector256<Int64>): RunClassFldScenario failed:
       firstOp: (5354736389871458141, 5213547752395406011, 6568981105086642517, 411765540062621435)
        result: (5.354736389871458E+18, 5.213547752395406E+18, 6.568981101446955E+18, 4.1176553693904896E+17)
      
      Beginning scenario: RunStructLclFldScenario
      Beginning scenario: RunStructFldScenario
      ERROR!!!-System.Exception: One or more scenarios did not complete as expected.

with:

set COMPlus_TieredCompilation=0
set COMPlus_ReadyToRun=0
set COMPlus_ZapDisable=1

cc @tannergooding

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 28, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Sep 28, 2022
@JulieLeeMSFT
Copy link
Member

@tannergooding, please check if this needs to be backported to 7.0.

@tannergooding
Copy link
Member

The lowest 32-bits is being corrupted somehow.

Is:        6.568981101446955E+18 (0x43D6CA6D_60800000)
Should Be: 6.568981105086642E+18 (0x43D6CA6D_60B63C4E)

Is:        4.1176553693904896E+17 (0x4396DB8A_4C000000)
Should Be: 4.1176554006262144E+17 (0x4396DB8A_4EE8B7BC)

@tannergooding
Copy link
Member

Notably this is only happening on the "upper half" of the Vector256.

@tannergooding
Copy link
Member

tannergooding commented Sep 29, 2022

I can't actually repro this. It also isn't reproducing in CI anymore as of the latest run.

There notably isn't anything "obvious" in the commit range (of last failing CI run to latest CI run which passes) either: 6c2cfa4...789b420

If this reproduces again, I'll take another look.

@BruceForstall
Copy link
Member

@tannergooding A similar case failed again:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=37829&view=ms.vss-test-web.build-test-results-tab&runId=754460&resultId=100427&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab

coreclr windows x64 Checked zapdisable @ Windows.10.Amd64.Open

      Beginning test case ConvertToDouble.Int64 at 10/2/2022 6:44:56 AM
      Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro
      
      Beginning scenario: RunBasicScenario_UnsafeRead
      Beginning scenario: RunReflectionScenario_UnsafeRead
      Beginning scenario: RunClsVarScenario
      Beginning scenario: RunLclVarScenario_UnsafeRead
      Vector256.ConvertToDouble<Double>(Vector256<Int64>): RunLclVarScenario_UnsafeRead failed:
       firstOp: (6447432934267478723, 7446574367515794728, 1268019459039786021, 7143260941042548606)
        result: (6.447432934267479E+18, 7.446574367515794E+18, 1.2680194555442627E+18, 7.143260938076946E+18)
      
      Beginning scenario: RunClassLclFldScenario
      Beginning scenario: RunClassFldScenario
      Beginning scenario: RunStructLclFldScenario
      Beginning scenario: RunStructFldScenario
      ERROR!!!-System.Exception: One or more scenarios did not complete as expected.

@tannergooding
Copy link
Member

Could you remind me what ZapDisable is testing?

It's very odd this still isn't repro'ing locally. Perhaps some determinism issue with where the test runs in relation to the RNG it uses for inputs. It's interesting it changed to a LclVar where-as the previous was ClassFld

@BruceForstall
Copy link
Member

Could you remind me what ZapDisable is testing?

From the console log you can see:

set COMPlus_TieredCompilation=0
set COMPlus_ReadyToRun=0
set COMPlus_ZapDisable=1

I'm not sure if ZapDisable does anything anymore, but ReadyToRun=0 means we don't use any pre-compiled images, so everything gets JITed.

@tannergooding
Copy link
Member

Still not able to repro this locally. Have tried 5 different full runs and over 20 different runs of just Vector256_r

@mangod9
Copy link
Member

mangod9 commented Oct 4, 2022

Actually looks like the testcase failing for my PR was different:

      Beginning scenario: RunBasicScenario_UnsafeRead
      Beginning scenario: RunReflectionScenario_UnsafeRead
      Beginning scenario: RunClsVarScenario
      Beginning scenario: RunLclVarScenario_UnsafeRead
      Beginning scenario: RunClassLclFldScenario
      Beginning scenario: RunClassFldScenario
      Beginning scenario: RunStructLclFldScenario
      Beginning scenario: RunStructFldScenario
      
      Ending test case at 10/4/2022 4:36:41 AM
      Expected: 100
      Actual: 0
      END EXECUTION - FAILED
      FAILED
      Test Harness Exitcode is : 1
      To run the test:
      > set CORE_ROOT=C:\h\w\AD310950\p
      > C:\h\w\AD310950\w\AD84096D\e\JIT\HardwareIntrinsics\General\Vector256\Vector256_r\Vector256_r.cmd
      Expected: True
      Actual:   False
      Stack Trace:
           at JIT_HardwareIntrinsics._General_Vector256_Vector256_r_Vector256_r_._General_Vector256_Vector256_r_Vector256_r_cmd()
           at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
           at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
      Output:

@markples
Copy link
Member

markples commented Oct 4, 2022

Copying from previous failure before logs go away:

      Beginning test case ConditionalSelect.Double at 10/4/2022 1:27:47 AM
      Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro
      
      Beginning scenario: RunBasicScenario_UnsafeRead
      Beginning scenario: RunReflectionScenario_UnsafeRead
      Vector256.ConditionalSelect<Double>(Vector256<Double>, Vector256<Double>, Vector256<Double>): RunReflectionScenario_UnsafeRead failed:
       firstOp: (0.6989466141438795, 0.4866212012649612, 0.6618925643441699, 0.09845733460898387)
      secondOp: (0.22517924393768388, 0.7154067716167339, 0.006026498976175906, 0.27767764510478715)
       thirdOp: (0.7471133040949298, 0.5692147801486844, 0.8094755237966196, 0.4609477219455632)
        result: (0.17140560267558583, 0.6942809413626239, 0.8094755237966196, 0.4609477219455632)
Beginning test case LessThan.UInt64 at 10/4/2022 1:27:56 AM
      Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro
      
      Beginning scenario: RunBasicScenario_UnsafeRead
      Beginning scenario: RunReflectionScenario_UnsafeRead
      Vector256.LessThan<UInt64>(Vector256<UInt64>, Vector256<UInt64>): RunReflectionScenario_UnsafeRead failed:
          left: (3781761112722303931, 6235165959883877819, 5127093180115276564, 15915554120637788906)
         right: (1422405162339237022, 17563831884969782090, 11681625757407455840, 5912794640739462083)
        result: (0, 18446744073709551615, 18446744073709551615, 18446744073709551615)
Beginning test case Xor.Int32 at 10/4/2022 1:28:01 AM
      Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro
      
      Beginning scenario: RunBasicScenario_UnsafeRead
      Beginning scenario: RunReflectionScenario_UnsafeRead
      Vector256.Xor<Int32>(Vector256<Int32>, Vector256<Int32>): RunReflectionScenario_UnsafeRead failed:
          left: (1604413603, 1222379432, 1738335450, 989877695, 884295071, 1181582375, 238131376, 426120629)
         right: (2083039301, 7104645, 2146387439, 1913728278, 1335194097, 826300609, 924496805, 1104768183)
        result: (596230374, 1219522349, 410231093, 1225881769, 1335194097, 826300609, 924496805, 1104768183)

@markples
Copy link
Member

markples commented Oct 4, 2022

I hit failures in another configuration. This one is ildasm/ilasm roundtripping for an ilasm change, so theoretically it could be the change but it seems unlikely.

This uses

set COMPlus_TieredCompilation=0
set RunningIlasmRoundTrip=1

RunningIlasmRoundTrip is a test script variable that leads to the roundtripping. The runtime does not use it.

https://dev.azure.com/dnceng-public/public/_build/results?buildId=40034&view=logs&j=9d34e523-f5d6-52dc-46f2-0a66cb13e494&t=5aa7bffb-7d91-5080-bb39-aa229a02b9d3

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-76590-merge-7401007fc42f47e7a1/JIT.HardwareIntrinsics.General.Vector256/1/console.e1969e3c.log?helixlogtype=result

Beginning test case Multiply.Int16 at 10/4/2022 7:17:32 PM
      Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro
      
      Beginning scenario: RunBasicScenario_UnsafeRead
      Beginning scenario: RunReflectionScenario_UnsafeRead
      Vector256.Multiply<Int16>(Vector256<Int16>, Vector256<Int16>): RunReflectionScenario_UnsafeRead failed:
          left: (12349, 2077, 2517, 24831, 15335, 21277, 18310, 24742, 29224, 1501, 30542, 30209, 25839, 20619, 27223, 15463)
         right: (26331, 30859, 23718, 5423, 18949, 5399, 15719, 6382, 2686, 639, 9923, 32176, 9978, 23285, 4897, 26979)
        result: (-28113, -65, -5090, -17967, -3709, -10085, -19222, 27220, 0, 0, 0, 0, 0, 0, 0, 0)
Beginning test case Max.Double at 10/4/2022 7:17:29 PM
      Random seed: 20010415; set environment variable CORECLR_SEED to this value to repro
      
      Beginning scenario: RunBasicScenario_UnsafeRead
      Beginning scenario: RunReflectionScenario_UnsafeRead
      Vector256.Max<Double>(Vector256<Double>, Vector256<Double>): RunReflectionScenario_UnsafeRead failed:
          left: (0.30494365762218073, 0.8294232901322763, 0.3422121491014083, 0.7015113330918883)
         right: (0.4280050426851982, 0.980779036870589, 0.38619966916097315, 0.05648377959452745)
        result: (0.4280050426851982, 0.980779036870589, 0.38619966916097315, 0.05648377959452745)

@v-wenyuxu
Copy link
Author

v-wenyuxu commented Oct 24, 2022

Failed again in: runtime-coreclr jitstress 20221022.1

Failed test:

coreclr windows x64 Checked zapdisable @ Windows.10.Amd64.Open

- JIT\\HardwareIntrinsics\\General\\Vector256\\Vector256_r\\Vector256_r.cmd

Error message:

      Vector256.ConvertToDouble<Double>(Vector256<UInt64>): RunClassLclFldScenario failed:
       firstOp: (4093955593098357454, 8100612776087538354, 2810113380124182167, 3723585571893864373)
        result: (4.093955593098357E+18, 8.100612776087539E+18, 2.810113379975299E+18, 3.723585570157363E+18)

@jkotas
Copy link
Member

jkotas commented Oct 24, 2022

This seem to be failing only in Vector256.ConvertToDouble and only in JIT stress outer loop runs now.

@jkotas
Copy link
Member

jkotas commented Oct 26, 2022

Details from the latest hit (build 62054):

Failing test: JIT.HardwareIntrinsics.General.VectorUnaryOpTest__ConvertToDoubleInt64.RunClassLclFldScenario

Expected:
43cc685375ea6cf5 43dc1aca19a96c13 43c37fc3 9d846fe3 43c9d66aacb3c074

Actual:
43cc685375ea6cf5 43dc1aca19a96c13 43c37fc39d800000 43c9d66aac800000 <- 2x 22 bits in upper half of Vector256 flipped to 0

Environment:

DOTNET_ReadyToRun=0
DOTNET_TieredCompilation=0
PROCESSOR_IDENTIFIER=AMD64 Family 23 Model 49 Stepping 0, AuthenticAMD

@tannergooding
Copy link
Member

tannergooding commented Oct 26, 2022

Is there any way we can pull this explicit machine from the pool to do manual testing on?

Given we've only seen this for Vector256 and only with the upper half my presumption is that there is either a bug with save/restore upper halves logic (either in the JIT or in the thread save/resume logic) -or- its something like the microcode patch that was called out above.

We could also try to get more info out of HKEY_LOCAL_MACHINE\HARDWARE\DESCRIPTION\System\* (both BIOS and CentralProcess\0), although it might be masked with it being a VM still.

My current install has Update AGESA version to ComboAM5PI 1.0.0.3 patch A and the corresponding Update Revision in the registry is 0x0A60_1203.
-- Notably the CI machine is Zen 2, so it'd be on the AM4 AGESA versions.
-- CPUID reports I currently have Stepping 2, Revision RPL-B2

@jkotas
Copy link
Member

jkotas commented Oct 26, 2022

although it might be masked with it being a VM still

Yes, these details are masked out. Helix runs on Azure VMs so you cannot be even sure that the test runs on the same physical machine the whole time. The running VM can be migrated to a different physical machine in the middle of the test.

I have asked on the eng system support chat about interactive session on Helix VM.

@jkotas
Copy link
Member

jkotas commented Oct 28, 2022

@tannergooding You are cced on the Teams discussion with the eng team about getting access to the Helix VM.

@BruceForstall
Copy link
Member

A recent job failed with newly added FailFast instrumentation:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=66220&view=ms.vss-test-web.build-test-results-tab&runId=1351562&resultId=102648&paneView=debug

Process terminated. Temporary instrumentation to diagnose https://github.com/dotnet/runtime/issues/76280
at System.Environment.FailFast(System.String)
at JIT.HardwareIntrinsics.General.VectorUnaryOpTest__ConvertToDoubleInt64.ValidateResult(Int64[], Double[], System.String)
at JIT.HardwareIntrinsics.General.VectorUnaryOpTest__ConvertToDoubleInt64.ValidateResult(System.Runtime.Intrinsics.Vector256`1<Int64>, Void*, System.String)
at JIT.HardwareIntrinsics.General.VectorUnaryOpTest__ConvertToDoubleInt64.RunClassLclFldScenario()
at JIT.HardwareIntrinsics.General.Program.ConvertToDoubleInt64()
at JIT.HardwareIntrinsics.General.Program.Main(System.String[])

@jkotas
Copy link
Member

jkotas commented Oct 31, 2022

Here are the corrupted values from last 4 crashes (result local variable from JIT.HardwareIntrinsics.General.VectorUnaryOpTest__ConvertToDoubleUInt64.RunClassLclFldScenario method):

  • Build 66875
00000052`b7dce000  75ea6cf5 43cc6853 19a96c13 43dc1aca
00000052`b7dce010  9d800000 43c37fc3 ac800000 43c9d66a
  • Build 66220
000000f8`07d7df80  75ea6cf5 43cc6853 19a96c13 43dc1aca
000000f8`07d7df90  9d800000 43c37fc3 ac800000 43c9d66a
  • Build 64828
000000d7`1477e580  86000ab9 43d65e78 fbc053b7 43d9d5e2
000000d7`1477e590  fa000000 43b198e8 7e800000 43d8c87d
  • Build 62054
0000005c`b557e4a0  75ea6cf5 43cc6853 19a96c13 43dc1aca
0000005c`b557e4b0  9d800000 43c37fc3 ac800000 43c9d66a

@jkotas
Copy link
Member

jkotas commented Oct 31, 2022

The consistent pattern is that the low 23 bits of the higher 2 qwords are zeroed out. 23 bits is unusual number. Where can it come from?

@EgorBo
Copy link
Member

EgorBo commented Oct 31, 2022

The consistent pattern is that the low 23 bits of in the higher 2 qwords are zeroed out. 23 bits is unusual number. Where can it come from?

I'm just curious why you edited your comment - the size seems to match single-precision mantissa indeed like you noted?
image

@jkotas
Copy link
Member

jkotas commented Oct 31, 2022

I was thinking that it is half of the double mantissa and then immediately realized that my math is wrong (2 * 23 != 52).

Good point about single precision mantissa!

@jkotas
Copy link
Member

jkotas commented Oct 31, 2022

@tannergooding You are cced on the Teams discussion with the eng team about getting access to the Helix VM.

There is no good way to get remote access to the exact hardware that this is failing on. Azure uses multiple different process models for the machine category used by Helix VMs. Creating Helix VM of the same machine category tend to give you different processor model (I have tried multiple times).

If we need to gather more information about the machine config, the best way to do that is to add extra logging before the temporary FailFast in Vector256 test.

@tannergooding
Copy link
Member

My current concern is that if this is a microcode issue, then no amount of logging will provide the required information.

The only real way to validate is likely to get a machine, reliably repro, patch, and then try to repro again.

@jkotas
Copy link
Member

jkotas commented Oct 31, 2022

The only real way to validate is likely to get a machine, reliably repro, patch, and then try to repro again.

What would you do if you got a VM that reproduces it semi-reliably? We should be able to do the same in the CI, just the feedback loop would be slower.

(I am running out of ideas on what to do to diagnose this further.)

@tannergooding
Copy link
Member

I'd likely attach Intel VTune or AMD uProf and collect a system wide trace that includes when the process yields the timeslice.

From what we've seen from the dumps, all the disassembly looks correct and the input values are being corrupted somewhere between when the correct result is computed and the validation happens.

So, my guess is that it's either some state save/restore issue -or- something like what Egor linked above. Given this is effectively only happening in Vector256_r, I'd speculate the combination of "debug" (and therefore frequently spilling/loading) is causing heavy enough Vector256 usage that it triggers the issue Egor had found. In which case we'd patch the machine (install latest Windows/Microsoft Updates to start) and see if it continues reproing.

@jkotas
Copy link
Member

jkotas commented Oct 31, 2022

To re-iterate, this is what we know:

  • The window where the corruption happens is very small (~12 instructions). You can tell by inspecting local variables on the stack. result local has the corrupted value in var result = Vector256.ConvertToInt64(test._fld1); test line.
  • The corruption is hit by just a handful test cases. VectorUnaryOpTest__ConvertToDoubleInt64.RunClassLclFldScenario has been the only test case hitting it recently. This sort of determinism suggests that the problem is not caused by asynchronous interrupts.
  • The corruption is only hit on AMD64 Family 23 Model 49 Stepping 0, AuthenticAMD machines

This is the Window where the corruption occurs:

00007ff7`ba53f9a7 c5fc28c8        vmovaps ymm1,ymm0
00007ff7`ba53f9ab c4e375020d2b000000aa vpblendd ymm1,ymm1,ymmword ptr [System_Private_CoreLib!System.Runtime.Intrinsics.Vector256.ConvertToDouble(System.Runtime.Intrinsics.Vector256`1<Int64>)+0x40 (00007ff7`ba53f9e0)],0AAh
00007ff7`ba53f9b5 c5fd73d020      vpsrlq  ymm0,ymm0,20h
00007ff7`ba53f9ba c5fdef053e000000 vpxor   ymm0,ymm0,ymmword ptr [System_Private_CoreLib!System.Runtime.Intrinsics.Vector256.ConvertToDouble(System.Runtime.Intrinsics.Vector256`1<Int64>)+0x60 (00007ff7`ba53fa00)]
00007ff7`ba53f9c2 c5fd5c0556000000 vsubpd  ymm0,ymm0,ymmword ptr [System_Private_CoreLib!System.Runtime.Intrinsics.Vector256.ConvertToDouble(System.Runtime.Intrinsics.Vector256`1<Int64>)+0x80 (00007ff7`ba53fa20)]
00007ff7`ba53f9ca c5fd58c1        vaddpd  ymm0,ymm0,ymm1
00007ff7`ba53f9ce c5fd1101        vmovupd ymmword ptr [rcx],ymm0
00007ff7`ba53f9d2 488bc1          mov     rax,rcx
00007ff7`ba53f9d5 c5f877          vzeroupper
00007ff7`ba53f9d8 c3              ret

00007ff7`ba5409dc c5fd104590      vmovupd ymm0,ymmword ptr [rbp-70h]
00007ff7`ba5409e1 c5fd1145d0      vmovupd ymmword ptr [rbp-30h],ymm0 <- [ebp-30h] value is corrupted

jkotas added a commit to jkotas/runtime that referenced this issue Nov 18, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 18, 2022
@JulieLeeMSFT JulieLeeMSFT removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 18, 2022
jkotas added a commit that referenced this issue Nov 18, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Nov 18, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitStress CLR JIT issues involving JIT internal stress modes Known Build Error Use this to report build issues in the .NET Helix tab os-windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.