-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failures in checked/release asm diffs #76347
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsPipeline https://dev.azure.com/dnceng-public/public/_build/results?buildId=29308&view=results has been reporting failures, that need to be investigated.
|
One example:
|
In the above case, the issue is that the MC sets TailcallStress=1 because the test forces that during a run. The Checked compiler uses that and behaves differently (in this case, somewhat oddly differently?). Forcing TailcallStress=0 for the Checked compiler leads to no diffs. This means we have a general problem with Checked/Release diffs of "run" collections that contain DEBUG-only configuration variables set that could affect behavior. We need to be able to force the DEBUG compiler to not use any such variables. One very aggressive option would be to skip any MC that has any entry in the GetIntConfigValue or GetStringConfigValue tables. We could explicitly override / clear variables using the |
In the coreclr_tests collection, 1546 tests have GetIntConfigValue, 2383 have GetStringConfigValue (some will have both). The current set of config values used in the tests is:
|
Do we know why this only started failing recently? The Sep 4th run has only x86 failures (that were fixed by #75338). The Sep 10th run has both x86 and x64 failures, so seems like something changed between Sep 4th and Sep 10th. |
After ignoring the Config values in the MCs, there are still 33 diffs. One example is
@tannergooding Any idea what code is generating this? We need to find something that is causing DEBUG and non-DEBUG code to be different. |
In the Checked build dump, the diff code IR is:
|
Are you saying that part of the IR doesn't exist in the release build? It's, in general, odd that we'd switch like this. I'm not aware of any logic that differs this way, especially not between debug/release. We have a couple places that can differ between "minOpts" and "optimizationsEnabled", but those are normal/expected. |
No. It's very hard to see the IR in Release builds since we don't have the dumpers. So I presume some form of this IR does exist, but there's some kind of bug where we are inadvertently doing something in Checked that we shouldn't. For this particular case, shouldn't the code generated for GetLower be what the Release version generates, namely |
Is this using any For the "default" scenario (AVX2 enabled) We import this as This will then hit here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicxarch.cpp#L2356-L2366, which will generate the This node gets effectively no handling outside the common handling for HWIntrinsics (like VN/CSE) and isn't touched again until lowering (generalized handling) and lsra (special handling): https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lsraxarch.cpp#L2101-L2122 |
There are no COMPlus variables set in this scenario. One thing I noticed which looks dangerous (but doesn't seem to be the problem here) is that |
Not off the top of my head. The handling here is about removing a By the time we hit codegen we shouldn't have any such cases that should mutate. It would be good to fix this (likely move the logic to part of |
Something is flipping either the node type or the computed |
Btw, this is a Tier-0 compilation, so |
Note that that Checked version is using YMM whereas the Release version is using XMM. |
Right, which is "too large a read". I'll see if I can repro this locally and if I can determine what's breaking where... |
Ok, so the I changed the instruction being emitted for the relevant path and it still emits Notably I did find two places where we were tracking the "wrong" This has a side-effect of making Maybe there is something special about In checked we get:
The only real transform to this happens in
|
Found the bug... |
Recorded SPMI method contexts include configuration environment variables such as `COMPlus_JITMinOpts` that are replayed. However, when doing asmdiffs replays to compare a Release to a Checked compiler (non-DEBUG to DEBUG), there may be codegen-altering configuration variables such as JitStress that are only read and interpreted by the DEBUG compiler. This leads to asm diffs. Introduce a `-ignoreStoredConfig` argument to superpmi.exe, and use it in superpmi.py when doing Checked/Release asm diffs, that pretends there are no stored config variables. This assumes that the stored config variables only alter JIT behavior but that they JIT will succeed with or without them. This is also slightly more than necessary: if there is a config variable that the Release compiler knows about, we won't use that, either. However, we have no easy way (currently) to distinguish which variables are DEBUG and which are both DEBUG and non-DEBUG available. Contributes to dotnet#76347
Recorded SPMI method contexts include configuration environment variables such as `COMPlus_JITMinOpts` that are replayed. However, when doing asmdiffs replays to compare a Release to a Checked compiler (non-DEBUG to DEBUG), there may be codegen-altering configuration variables such as JitStress that are only read and interpreted by the DEBUG compiler. This leads to asm diffs. Introduce a `-ignoreStoredConfig` argument to superpmi.exe, and use it in superpmi.py when doing Checked/Release asm diffs, that pretends there are no stored config variables. This assumes that the stored config variables only alter JIT behavior but that they JIT will succeed with or without them. This is also slightly more than necessary: if there is a config variable that the Release compiler knows about, we won't use that, either. However, we have no easy way (currently) to distinguish which variables are DEBUG and which are both DEBUG and non-DEBUG available. Contributes to #76347
There are still failures in arm64: https://dev.azure.com/dnceng-public/public/_build/results?buildId=37989&view=results |
In at least one case, the issue is that we have a tier-0 with/PGO instrumentation being replayed. The JIT calls |
Pipeline https://dev.azure.com/dnceng-public/public/_build/results?buildId=29308&view=results has been reporting failures, that need to be investigated.
The text was updated successfully, but these errors were encountered: