Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPMI with JitNoInline=1 asserts in instrumentation code #74718

Closed
SingleAccretion opened this issue Aug 27, 2022 · 7 comments · Fixed by #78594
Closed

SPMI with JitNoInline=1 asserts in instrumentation code #74718

SingleAccretion opened this issue Aug 27, 2022 · 7 comments · Fixed by #78594
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Milestone

Comments

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Aug 27, 2022

Reproduction: run an SPMI replay of the aspnet collection with -jitoption JitNoInline=1.

Expected result: clean replay.

Actual result:

ISSUE: <ASSERT> #1 C:\Users\Accretion\source\dotnet\runtime\src\coreclr\jit\fgprofile.cpp (1709)
Assertion failed 'reinterpret_cast<uint8_t*>(h32->HandleTable) == &m_profileMemory[tableEntry.Offset]'
in 'LoggingConfiguration:Close():this' during 'Profile instrumentation'
(IL size 112; hash 0xe34aa6a7; Instrumented Tier1-OSR)

Preceded by a number of ERROR: AllocPgoInstrumentationBySchema mismatch.

JitNoInline can be a nice setting to verify changes "modulo inlining", it would be nice to keep it assert-free.

@SingleAccretion SingleAccretion added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 27, 2022
@SingleAccretion SingleAccretion added this to the 8.0.0 milestone Aug 27, 2022
@ghost
Copy link

ghost commented Aug 27, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Reproduction: run an SPMI replay of the aspnet collection with -jitoption JitNoInline=1.

Expected result: clean replay.

Actual result:

ISSUE: <ASSERT> #1 C:\Users\Accretion\source\dotnet\runtime\src\coreclr\jit\fgprofile.cpp (1709)
Assertion failed 'reinterpret_cast<uint8_t*>(h32->HandleTable) == &m_profileMemory[tableEntry.Offset]'
in 'LoggingConfiguration:Close():this' during 'Profile instrumentation'
(IL size 112; hash 0xe34aa6a7; Instrumented Tier1-OSR)

Preceded by a number of ERROR: AllocPgoInstrumentationBySchema mismatch.

JitNoInline can be a nice setting to verify changes "modulo inlining", it would be nice to keep assert-free.

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: 8.0.0

@EgorBo EgorBo self-assigned this Aug 29, 2022
@jakobbotsch
Copy link
Member

Different inlining decisions in general will cause a different PGO schema to have to be allocated for optimized code with instrumentation.

This seems like a bug in SPMI. allocPgoInstrumentationBySchema should be treated as a compilation result (like ICorJitInfo::allocMem, ICorJitInfo::reserveUnwindInfo, etc.), not as something that must replay correctly.

@BruceForstall
Copy link
Member

BruceForstall commented Nov 19, 2022

@jakobbotsch We're getting checked/release asm diffs on x64/arm64 with:

[23:59:03] ERROR: AllocPgoInstrumentationBySchema mismatch: countSchemaItems record 33, replay 31
[23:59:03] ERROR: AllocPgoInstrumentationBySchema mismatch: [29].InstrumentationKind record 177, replay 65
[23:59:03] ERROR: AllocPgoInstrumentationBySchema mismatch: [29].ILOffset record 649, replay 656
[23:59:03] ERROR: AllocPgoInstrumentationBySchema mismatch: [29].Other record -2147483648, replay 0
[23:59:03] ERROR: AllocPgoInstrumentationBySchema mismatch: [30].InstrumentationKind record 195, replay 65
[23:59:03] ERROR: AllocPgoInstrumentationBySchema mismatch: [30].ILOffset record 649, replay 666
[23:59:03] ERROR: AllocPgoInstrumentationBySchema mismatch: [30].Count record 8, replay 1
[23:59:03] ERROR: AllocPgoInstrumentationBySchema mismatch: [30].Other record -2147483648, replay 0
... many more...

https://dev.azure.com/dnceng-public/public/_build/results?buildId=88241&view=results

(if this is a separate problem, we can split this to its own issue)

@BruceForstall BruceForstall added the blocking-clean-ci-optional Blocking optional rolling runs label Nov 19, 2022
@jakobbotsch
Copy link
Member

@BruceForstall are these actually diffs or "just" failures?

I did see these a couple of weeks ago, but I assumed it was due to recent inliner changes between the time of the collection and the time when superpmi-asmdiffs-checked-release runs (#77600, #77845, #78386). So I assumed it would stabilize again.

However now that I think about it, that should have also caused superpmi-replay to fail. So there may be something related to inliner decisions in checked vs release builds. I will prioritize looking into that.

@jakobbotsch
Copy link
Member

jakobbotsch commented Nov 19, 2022

Also, as Andy pointed out recently, we wouldn't expect inlining to even alter the schema currently, so something is definitely strange here.

@jakobbotsch
Copy link
Member

However now that I think about it, that should have also caused superpmi-replay to fail. So there may be something related to inliner decisions in checked vs release builds. I will prioritize looking into that.

Actually, it seems like -ignoreStoredConfig is what causes the schema allocation to be different. It reproduces with a simple replay when you pass that too.

@jakobbotsch
Copy link
Member

The difference is that the particular context sets DOTNET_JitProfileCasts=1, so it does make sense that the schema is different. So we probably just need to implement actual schema layout in SPMI.

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Nov 19, 2022
Fallback to doing actual layout of the schema data when the schema does
not match the recorded schema. This ensures replays are still consistent
with the recording in cases where the JIT and environment variables
match, but that we can still succeed a replay on changes to these.

Fix dotnet#74718
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 19, 2022
jakobbotsch added a commit that referenced this issue Nov 21, 2022
Fallback to doing actual layout of the schema data when the schema does
not match the recorded schema. This ensures replays are still consistent
with the recording in cases where the JIT and environment variables
match, but that we can still succeed a replay on changes to these.

Fix #74718
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Nov 21, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants