Failures in checked/release asm diffs #76347

BruceForstall · 2022-09-29T00:34:10Z

Pipeline https://dev.azure.com/dnceng-public/public/_build/results?buildId=29308&view=results has been reporting failures, that need to be investigated.

ghost · 2022-09-29T00:34:14Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Pipeline https://dev.azure.com/dnceng-public/public/_build/results?buildId=29308&view=results has been reporting failures, that need to be investigated.

Author:	BruceForstall
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	8.0.0

BruceForstall · 2022-09-29T04:08:18Z

One example:

C:\bugs\spmicollect4>C:\gh\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -c 155161 C:\gh\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\clrjit_win_x64_x64.dll C:\gh\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit_win_x64_x64.dll c:\spmi\mch\eb8352bd-0a13-4b5b-badb-58f9ecc40c44.windows.x64\coreclr_tests.run.windows.x64.checked.mch
Using jit(C:\gh\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\clrjit_win_x64_x64.dll) with input (c:\spmi\mch\eb8352bd-0a13-4b5b-badb-58f9ecc40c44.windows.x64\coreclr_tests.run.windows.x64.checked.mch)
 indexCount=1 (155161)
Jit startup took 2.277200ms
Jit startup took 8.512700ms
Code Size mismatch: Left=45, Right=39

-----------------------------------------------
Block:   Left
Size:    45
Address: 264133383d0
CodePtr: 26413338754
-----------------------------------------------
264133383d0: 55                         push    rbp
264133383d1: 57                         push    rdi
264133383d2: 48 83 ec 28                sub     rsp, 40
264133383d6: 48 8d 6c 24 30             lea     rbp, [rsp + 48]
264133383db: 83 3d 60 e7 74 34 00       cmp     dword ptr [rip + 880076640], 0
264133383e2: 74 05                      je      5
264133383e4: e8 10 47 ad 93             call    -1817360624
264133383e9: ff 15 c0 96 75 34          call    qword ptr [rip + 880121536]
264133383ef: 89 45 f4                   mov     dword ptr [rbp - 12], eax
264133383f2: 8b 45 f4                   mov     eax, dword ptr [rbp - 12]
264133383f5: 98                         cwde
264133383f6: 48 83 c4 28                add     rsp, 40
264133383fa: 5f                         pop     rdi
264133383fb: 5d                         pop     rbp
264133383fc: c3                         ret
-----------------------------------------------
-----------------------------------------------
Block:   Right
Size:    39
Address: 26413338f30
CodePtr: 26413338d74
-----------------------------------------------
26413338f30: 55                         push    rbp
26413338f31: 57                         push    rdi
26413338f32: 48 83 ec 28                sub     rsp, 40
26413338f36: 48 8d 6c 24 30             lea     rbp, [rsp + 48]
26413338f3b: 83 3d 60 e7 74 34 00       cmp     dword ptr [rip + 880076640], 0
26413338f42: 74 05                      je      5
26413338f44: e8 10 47 ad 93             call    -1817360624
26413338f49: ff 15 c0 96 75 34          call    qword ptr [rip + 880121536]
26413338f4f: 90                         nop
26413338f50: 48 83 c4 28                add     rsp, 40
26413338f54: 5f                         pop     rdi
26413338f55: 5d                         pop     rbp
26413338f56: c3                         ret
-----------------------------------------------
ISSUE: <ASM_DIFF> main method 155161 of size 6 differs

BruceForstall · 2022-09-29T16:14:36Z

In the above case, the issue is that the MC sets TailcallStress=1 because the test forces that during a run. The Checked compiler uses that and behaves differently (in this case, somewhat oddly differently?). Forcing TailcallStress=0 for the Checked compiler leads to no diffs.

This means we have a general problem with Checked/Release diffs of "run" collections that contain DEBUG-only configuration variables set that could affect behavior. We need to be able to force the DEBUG compiler to not use any such variables.

One very aggressive option would be to skip any MC that has any entry in the GetIntConfigValue or GetStringConfigValue tables.

We could explicitly override / clear variables using the -jitoption force ... and -jit2option force ... arguments, but that would require enumerating everything that might affect DEBUG only codegen, which might be fragile (and there might be command-line length issues).

BruceForstall · 2022-09-29T16:38:08Z

In the coreclr_tests collection, 1546 tests have GetIntConfigValue, 2383 have GetStringConfigValue (some will have both).

The current set of config values used in the tests is:

EnableAVX2
EnableHWIntrinsic
EnableSSE2
JitAggressiveInlining
JitConstCSE
JitDiffableDasm
JitDisasm
JitDoAssertionProp
JitDoRedundantBranchOpts
JitDoSsa
JitDoValueNumber
JitEnableFinallyCloning
JitFuncInfoLogFile
JITInlineDepth
JitNoCSE
JitNoForceFallback
JitNoStructPromotion
JitObjectStackAllocation
JitOptRepeat
JitProfileCasts
JitRandomGuardedDevirtualization
JitRandomOnStackReplacement
JitStdOutFile
JitStress
JitStressModeNames
JitStressModeNamesNot
JitStressModeNamesOnly
JitStressRegs
TailcallStress
TC_OnStackReplacement_InitialCounter

jakobbotsch · 2022-09-29T16:49:33Z

Do we know why this only started failing recently? The Sep 4th run has only x86 failures (that were fixed by #75338). The Sep 10th run has both x86 and x64 failures, so seems like something changed between Sep 4th and Sep 10th.

BruceForstall · 2022-09-29T17:22:28Z

I merged #74961 on Sept. 7, which is when the "run" collection appeared. (And removed the PMI collection the same day: #75211)

BruceForstall · 2022-09-29T20:43:31Z

After ignoring the Config values in the MCs, there are still 33 diffs. One example is JIT.HardwareIntrinsics.General.VectorAs__AsVectorUInt32:RunBasicScenario():this from (I believe) src\tests\JIT\HardwareIntrinsics\General\Vector128_1\AsVector.UInt32.cs, method context 63865 from coreclr_tests.run.windows.x64.checked.mch:

 55                  	push	rbp
 48 81 ec e0 00 00 00	sub	rsp, 224
 c5 f8 77            	vzeroupper
 48 8d ac 24 e0 00 00 00	lea	rbp, [rsp + 224]
 33 c0               	xor	eax, eax
 48 89 85 48 ff ff ff	mov	qword ptr [rbp - 184], rax
 c5 d8 57 e4         	vxorps	xmm4, xmm4, xmm4
 c5 f9 7f a5 50 ff ff ff	vmovdqa	xmmword ptr [rbp - 176], xmm4
 c5 f9 7f a5 60 ff ff ff	vmovdqa	xmmword ptr [rbp - 160], xmm4
 48 b8 70 ff ff ff ff ff ff ff	movabs	rax, -144
 c5 f9 7f 24 28      	vmovdqa	xmmword ptr [rax + rbp], xmm4
 c5 f9 7f 64 05 10   	vmovdqa	xmmword ptr [rbp + rax + 16], xmm4
 c5 f9 7f 64 05 20   	vmovdqa	xmmword ptr [rbp + rax + 32], xmm4
 48 83 c0 30         	add	rax, 48
 75 e9               	jne	-23
 48 89 4d 10         	mov	qword ptr [rbp + 16], rcx
 48 b9 78 86 b9 83 87 02 00 00	movabs	rcx, 2781053814392
 ff 15 e0 51 cf 07   	call	qword ptr [rip + 131027424]
 ff 15 08 f1 e7 07   	call	qword ptr [rip + 132641032]
 c5 f9 6e c0         	vmovd	xmm0, eax
 c4 e2 79 58 c0      	vpbroadcastd	xmm0, xmm0
 c5 f9 29 45 f0      	vmovapd	xmmword ptr [rbp - 16], xmm0
 c5 f9 28 45 f0      	vmovapd	xmm0, xmmword ptr [rbp - 16]
 c5 f8 28 c0         	vmovaps	xmm0, xmm0
 c5 fd 11 45 d0      	vmovupd	ymmword ptr [rbp - 48], ymm0
 48 8b 4d 10         	mov	rcx, qword ptr [rbp + 16]
 48 89 4d 98         	mov	qword ptr [rbp - 104], rcx
 c5 fd 10 45 d0      	vmovupd	ymm0, ymmword ptr [rbp - 48]
 c5 fd 11 45 b0      	vmovupd	ymmword ptr [rbp - 80], ymm0
 c5 f9 28 45 f0      	vmovapd	xmm0, xmmword ptr [rbp - 16]
 c5 f9 29 45 a0      	vmovapd	xmmword ptr [rbp - 96], xmm0
 48 8b 4d 98         	mov	rcx, qword ptr [rbp - 104]
 48 8d 55 b0         	lea	rdx, [rbp - 80]
 4c 8d 45 a0         	lea	r8, [rbp - 96]
 49 b9 78 86 b9 83 87 02 00 00	movabs	r9, 2781053814392
 ff 15 78 10 15 08   	call	qword ptr [rip + 135598200]
 c5 fe 6f 45 d0      	vmovdqu	ymm0, ymmword ptr [rbp - 48]        // Checked
 c5 f9 10 45 d0         vmovupd xmm0, xmmword ptr [rbp - 48]        // Release
 c5 f9 29 45 f0      	vmovapd	xmmword ptr [rbp - 16], xmm0
 48 8b 4d 10         	mov	rcx, qword ptr [rbp + 16]
 48 89 8d 48 ff ff ff	mov	qword ptr [rbp - 184], rcx
 c5 f9 28 45 f0      	vmovapd	xmm0, xmmword ptr [rbp - 16]
 c5 f9 29 45 80      	vmovapd	xmmword ptr [rbp - 128], xmm0
 c5 fd 10 45 d0      	vmovupd	ymm0, ymmword ptr [rbp - 48]
 c5 fd 11 85 50 ff ff ff	vmovupd	ymmword ptr [rbp - 176], ymm0
 48 8b 8d 48 ff ff ff	mov	rcx, qword ptr [rbp - 184]
 48 8d 55 80         	lea	rdx, [rbp - 128]
 4c 8d 85 50 ff ff ff	lea	r8, [rbp - 176]
 49 b9 78 86 b9 83 87 02 00 00	movabs	r9, 2781053814392
 ff 15 90 10 15 08   	call	qword ptr [rip + 135598224]
 90                  	nop
 c5 f8 77            	vzeroupper                    // Checked
 48 81 c4 e0 00 00 00	add	rsp, 224
 5d                  	pop	rbp
 c3                  	ret

@tannergooding Any idea what code is generating this? We need to find something that is causing DEBUG and non-DEBUG code to be different.

BruceForstall · 2022-09-29T20:49:20Z

In the Checked build dump, the diff code IR is:

Generating: N072 (???,???) [000061] -----------                            IL_OFFSET void   INLRT @ 0x029[E-] REG NA
Generating: N074 (  3,  2) [000021] -c---------                   t21 =    LCL_VAR   simd32<System.Numerics.Vector`1[System.UInt32]> V02 loc1          NA REG NA
                                                                        /--*  t21    simd32
Generating: N076 (  4,  3) [000022] -----------                   t22 = *  HWINTRINSIC simd16 uint GetLower REG mm0
IN0015:        vmovdqu  ymm0, ymmword ptr[V02 rbp-30H]

tannergooding · 2022-09-29T21:18:49Z

In the Checked build dump, the diff code IR is:

Are you saying that part of the IR doesn't exist in the release build?

It's, in general, odd that we'd switch like this. I'm not aware of any logic that differs this way, especially not between debug/release.

We have a couple places that can differ between "minOpts" and "optimizationsEnabled", but those are normal/expected.

BruceForstall · 2022-09-29T21:43:00Z

Are you saying that part of the IR doesn't exist in the release build?

No. It's very hard to see the IR in Release builds since we don't have the dumpers. So I presume some form of this IR does exist, but there's some kind of bug where we are inadvertently doing something in Checked that we shouldn't.

For this particular case, shouldn't the code generated for GetLower be what the Release version generates, namely vmovupd xmm0, xmmword ptr [rbp - 48]? It seems like the Checked version is loading the full 32-bits into ymm0 instead of loading into xmm0. The code is implementing Vector<Uint32>.AsVector128() call (with Vector<T> being SIMD32).

tannergooding · 2022-09-29T21:56:46Z

Is this using any COMPlus_EnableIsa=0 flags (like COMPlus_EnableAVX2=0)?

For the "default" scenario (AVX2 enabled) We import this as NI_Vector256_GetLower: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicxarch.cpp#L731-L736

This will then hit here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicxarch.cpp#L2356-L2366, which will generate the HWINTRINSIC simd16 <baseType> GetLower node we're seeing.

This node gets effectively no handling outside the common handling for HWIntrinsics (like VN/CSE) and isn't touched again until lowering (generalized handling) and lsra (special handling): https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lsraxarch.cpp#L2101-L2122

BruceForstall · 2022-09-29T22:09:40Z

There are no COMPlus variables set in this scenario.

One thing I noticed which looks dangerous (but doesn't seem to be the problem here) is that assertIsContainableHWIntrinsicOp(), inside a #ifdef DEBUG block, calls Lowering::TryGetContainableHWIntrinsicOp(). That function is documented as potentially having side-effects. It is generally illegal for code to have IR side-effects inside an #ifdef DEBUG block since those side-effects won't occur in a Release build. Can you see any way where this could be a problem?

tannergooding · 2022-09-29T22:23:13Z

Can you see any way where this could be a problem?

Not off the top of my head. The handling here is about removing a NI_Vector128/256_CreateScalarUnsafe and consuming the underlying scalar op directly...

By the time we hit codegen we shouldn't have any such cases that should mutate. It would be good to fix this (likely move the logic to part of LowerCreateScalarUnsafe instead), but there's no way it could cause GetLower to suddenly read 32-bytes from memory instead of 16.

tannergooding · 2022-09-29T22:25:57Z

Something is flipping either the node type or the computed EA_ATTR to be SIMD32 and causing YMM to be used instead.

BruceForstall · 2022-09-29T23:13:38Z

Btw, this is a Tier-0 compilation, so OptimizationEnabled() is false

BruceForstall · 2022-09-29T23:14:14Z

Note that that Checked version is using YMM whereas the Release version is using XMM.

tannergooding · 2022-09-30T00:50:36Z

Right, which is "too large a read". GetLower returns a TYP_SIMD16 and so should just read 16-bytes.

I'll see if I can repro this locally and if I can determine what's breaking where...

tannergooding · 2022-09-30T16:54:55Z

Ok, so the Release code generation isn't a side effect of GetLower or other HWIntrinsic codegen AFAICT.

I changed the instruction being emitted for the relevant path and it still emits C5F91045D0 vmovupd xmm0, xmmword ptr [rbp-30H] in Release

Notably I did find two places where we were tracking the "wrong" simdSize for NI_Vector256_GetLower and where we could emit "better" codegen. I've put up a PR to fix those here: #76456

This has a side-effect of making checked consistent with release but it isn't actually the root cause. There still remains some actual issue causing Checked/Release to differ.

Maybe there is something special about LCL_VAR simd32 and how its handled in morph/rationalize?

In checked we get:

    [ 0]  41 (0x029) ldloc.1
    [ 1]  42 (0x02a) call 2B000186
In Compiler::impImportCall: opcode is call, kind=0, callRetType is struct, structSize is 16
Named Intrinsic System.Runtime.Intrinsics.Vector128.AsVector128: Recognized
  Known type Vector128<uint>
  Known type SIMD Vector<uint>
  Known type SIMD Vector<uint>

    [ 1]  47 (0x02f) stloc.0

STMT00004 ( 0x029[E-] ... ??? )
               [000025] -A---------                         *  ASG       simd16 (copy)
               [000023] D------N---                         +--*  LCL_VAR   simd16<System.Runtime.Intrinsics.Vector128`1[System.UInt32]> V01 loc0         
               [000022] -----------                         \--*  HWINTRINSIC simd16 uint GetLower
               [000021] -----------                            \--*  LCL_VAR   simd32<System.Numerics.Vector`1[System.UInt32]> V02 loc1

The only real transform to this happens in rationalize where we rewrite asg(LCL_VAR, X) to STORE_LCL_VAR(X), thus getting:

N001 (  3,  2) [000021] -----------                   t21 =    LCL_VAR   simd32<System.Numerics.Vector`1[System.UInt32]> V02 loc1         
                                                            /--*  t21    simd32 
N002 (  4,  3) [000022] -----------                   t22 = *  HWINTRINSIC simd16 uint GetLower
                                                            /--*  t22    simd16 
N004 (  8,  6) [000025] DA---------                         *  STORE_LCL_VAR simd16<System.Runtime.Intrinsics.Vector128`1[System.UInt32]> V01 loc0

tannergooding · 2022-09-30T17:28:42Z

Found the bug... AsVector128 has an assert which mutates the simdSize

Recorded SPMI method contexts include configuration environment variables such as `COMPlus_JITMinOpts` that are replayed. However, when doing asmdiffs replays to compare a Release to a Checked compiler (non-DEBUG to DEBUG), there may be codegen-altering configuration variables such as JitStress that are only read and interpreted by the DEBUG compiler. This leads to asm diffs. Introduce a `-ignoreStoredConfig` argument to superpmi.exe, and use it in superpmi.py when doing Checked/Release asm diffs, that pretends there are no stored config variables. This assumes that the stored config variables only alter JIT behavior but that they JIT will succeed with or without them. This is also slightly more than necessary: if there is a config variable that the Release compiler knows about, we won't use that, either. However, we have no easy way (currently) to distinguish which variables are DEBUG and which are both DEBUG and non-DEBUG available. Contributes to dotnet#76347

Recorded SPMI method contexts include configuration environment variables such as `COMPlus_JITMinOpts` that are replayed. However, when doing asmdiffs replays to compare a Release to a Checked compiler (non-DEBUG to DEBUG), there may be codegen-altering configuration variables such as JitStress that are only read and interpreted by the DEBUG compiler. This leads to asm diffs. Introduce a `-ignoreStoredConfig` argument to superpmi.exe, and use it in superpmi.py when doing Checked/Release asm diffs, that pretends there are no stored config variables. This assumes that the stored config variables only alter JIT behavior but that they JIT will succeed with or without them. This is also slightly more than necessary: if there is a config variable that the Release compiler knows about, we won't use that, either. However, we have no easy way (currently) to distinguish which variables are DEBUG and which are both DEBUG and non-DEBUG available. Contributes to #76347

BruceForstall · 2022-10-03T15:08:05Z

There are still failures in arm64:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=37989&view=results

BruceForstall · 2022-10-03T23:12:01Z

In at least one case, the issue is that we have a tier-0 with/PGO instrumentation being replayed. The JIT calls allocPgoInstrumentationBySchema() which returns a pointer on replay. Of course, this pointer can't be the same as during collection, meaning the pointer will be different between different replays. The NearDiffer doesn't know about this pointer, and to ignore its differences (or map it back to what would be the original value if the original collection address were used).

BruceForstall added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 29, 2022

BruceForstall added this to the 8.0.0 milestone Sep 29, 2022

BruceForstall self-assigned this Sep 29, 2022

BruceForstall added the blocking-clean-ci-optional Blocking optional rolling runs label Sep 29, 2022

runfoapp bot mentioned this issue Sep 29, 2022

Infrastructure - Status/Health #702

Closed

tannergooding mentioned this issue Sep 30, 2022

Ensure NI_Vector128_AsVector128 (aka Vector128<T> AsVector128(this Vector<T> value)) doesn't have a side-effect in its assert #76460

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 30, 2022

BruceForstall mentioned this issue Sep 30, 2022

Fix DEBUG/non-DEBUG SuperPMI asm diffs #76470

Merged

BruceForstall closed this as completed in #76460 Oct 2, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 2, 2022

BruceForstall reopened this Oct 3, 2022

JulieLeeMSFT mentioned this issue Oct 4, 2022

[release/7.0] Ensure NI_Vector128_AsVector128 (aka Vector128<T> AsVector128(this Vector<T> value)) doesn't have a side-effect in its assert #76547

Merged

BruceForstall mentioned this issue Oct 4, 2022

Support Arm64 "constructed" constants in SuperPMI asm diffs #76616

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 4, 2022

BruceForstall closed this as completed in #76616 Oct 5, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 5, 2022

ghost locked as resolved and limited conversation to collaborators Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failures in checked/release asm diffs #76347

Failures in checked/release asm diffs #76347

BruceForstall commented Sep 29, 2022

ghost commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

jakobbotsch commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

tannergooding commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

tannergooding commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

tannergooding commented Sep 29, 2022

tannergooding commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

tannergooding commented Sep 30, 2022

tannergooding commented Sep 30, 2022 •

edited

Loading

tannergooding commented Sep 30, 2022

BruceForstall commented Oct 3, 2022

BruceForstall commented Oct 3, 2022 •

edited

Loading

Failures in checked/release asm diffs #76347

Failures in checked/release asm diffs #76347

Comments

BruceForstall commented Sep 29, 2022

ghost commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

jakobbotsch commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

tannergooding commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

tannergooding commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

tannergooding commented Sep 29, 2022

tannergooding commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

BruceForstall commented Sep 29, 2022

tannergooding commented Sep 30, 2022

tannergooding commented Sep 30, 2022 • edited Loading

tannergooding commented Sep 30, 2022

BruceForstall commented Oct 3, 2022

BruceForstall commented Oct 3, 2022 • edited Loading

tannergooding commented Sep 30, 2022 •

edited

Loading

BruceForstall commented Oct 3, 2022 •

edited

Loading