Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Enable edge based profiles for all scenarios #80481

Conversation

AndyAyersMS
Copy link
Member

Enable edge based profiles for OSR, partial compilation, and optimized plus instrumented cases.

For OSR this requires deferring flow graph modifications until after we have built the initial probe list, so that the initial list reflects the entirety of the method. This set of candidate edge probes is thus the same no matter how the method is compiled. A given compile may schematize a subset of these probes and materialize a subset of what gets schematized; this is tolerated by the PGO mechanism provided that the initial instrumented jitting produces a schema which is a superset of the schema produced by any subsequent instrumented rejitting. This is normally the case.

Partial compilation may still need some work to ensure full schematization but it is currently off by default. Will address this subsequently.

For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind of probe relocation abilities that we have in the BlockCountInstrumentor. In particular we need to move probes that might appear in return blocks that follow implicit tail call blocks, since those return blocks must remain empty.

The details on how we do this are a bit different but the idea is the same: we create duplicate copies of any probe that was going to appear in the return block and instead instrument each pred. If the pred reached the return via a critical edge, we split the edge and put the probe there. This analysis relies on cheap preds, so to ensure we can use them we move all the critial edge splitting so it happens before we need the cheap pred lists.

The ability to do block profiling is retained but will no longer be used without special config settings.

There were also a few bug fixes in the spanning tree visitor. It must visit a superset of the blocks we end up importing and was missing visits in some cases.

This should improve jit time and code quality for instrumented code.

Fixes #47942.
Fixes #66101.
Contributes to #74873.

Enable edge based profiles for OSR, partial compilation, and optimized plus
instrumented cases.

For OSR this requires deferring flow graph modifications until after we have
built the initial probe list, so that the initial list reflects the entirety
of the method. This set of candidate edge probes is thus the same no matter how
the method is compiled. A given compile may schematize a subset of these probes
and materialize a subset of what gets schematized; this is tolerated by the PGO
mechanism provided that the initial instrumented jitting produces a schema which
is a superset of the schema produced by any subsequent instrumented rejitting.
This is normally the case.

Partial compilation may still need some work to ensure full schematization but
it is currently off by default. Will address this subsequently.

For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind
of probe relocation abilities that we have in the BlockCountInstrumentor. In
particular we need to move probes that might appear in return blocks that follow
implicit tail call blocks, since those return blocks must remain empty.

The details on how we do this are a bit different but the idea is the same: we
create duplicate copies of any probe that was going to appear in the return block
and instead instrument each pred. If the pred reached the return via a critical
edge, we split the edge and put the probe there. This analysis relies on cheap
preds, so to ensure we can use them we move all the critial edge splitting so it
happens before we need the cheap pred lists.

The ability to do block profiling is retained but will no longer be used without
special config settings.

There were also a few bug fixes in the spanning tree visitor. It must visit a
superset of the blocks we end up importing and was missing visits in some cases.

This should improve jit time and code quality for instrumented code.

Fixes dotnet#47942.
Fixes dotnet#66101.
Contributes to dotnet#74873.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 11, 2023
@ghost ghost assigned AndyAyersMS Jan 11, 2023
@ghost
Copy link

ghost commented Jan 11, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Enable edge based profiles for OSR, partial compilation, and optimized plus instrumented cases.

For OSR this requires deferring flow graph modifications until after we have built the initial probe list, so that the initial list reflects the entirety of the method. This set of candidate edge probes is thus the same no matter how the method is compiled. A given compile may schematize a subset of these probes and materialize a subset of what gets schematized; this is tolerated by the PGO mechanism provided that the initial instrumented jitting produces a schema which is a superset of the schema produced by any subsequent instrumented rejitting. This is normally the case.

Partial compilation may still need some work to ensure full schematization but it is currently off by default. Will address this subsequently.

For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind of probe relocation abilities that we have in the BlockCountInstrumentor. In particular we need to move probes that might appear in return blocks that follow implicit tail call blocks, since those return blocks must remain empty.

The details on how we do this are a bit different but the idea is the same: we create duplicate copies of any probe that was going to appear in the return block and instead instrument each pred. If the pred reached the return via a critical edge, we split the edge and put the probe there. This analysis relies on cheap preds, so to ensure we can use them we move all the critial edge splitting so it happens before we need the cheap pred lists.

The ability to do block profiling is retained but will no longer be used without special config settings.

There were also a few bug fixes in the spanning tree visitor. It must visit a superset of the blocks we end up importing and was missing visits in some cases.

This should improve jit time and code quality for instrumented code.

Fixes #47942.
Fixes #66101.
Contributes to #74873.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

@EgorBo PTAL
cc @dotnet/jit-contrib

Expect sizeable diffs in ASP.NET.

@EgorBo
Copy link
Member

EgorBo commented Jan 11, 2023

Nice diffs for the aspnet collection! Btw, perhaps it makes sense to enable Dynamic PGO for coreclr_tests.run collection

I'll take a look tomorrow if you don't mind, I'm OOF today

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo, runtime-coreclr pgostress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@AndyAyersMS
Copy link
Member Author

I'll take a look tomorrow if you don't mind, I'm OOF today

Sure ... might be easier to ignore whitespace when you look at diffs as I moved / re-indented some of the block profiling code.

@AndyAyersMS
Copy link
Member Author

Extended pgo testing is seeing some failures:

  • Assert failure(PID 4392 [0x00001128], Thread: 8428 [0x20ec]): Assertion failed 'curr->KindIs(BBJ_COND, BBJ_SWITCH, BBJ_ALWAYS)' in 'System.StubHelpers.CSTRMarshaler:ConvertToNative(int,System.String,long):long' during 'Profile instrumentation' (IL size 194; hash 0xc31c0dcd; Instrumented Tier1)

    File: D:\a_work\1\s\src\coreclr\jit\fgbasic.cpp Line: 4374

@EgorBo
Copy link
Member

EgorBo commented Jan 11, 2023

@AndyAyersMS Instrumented Tier1 is when we re-jit R2R in order to instrument it. That tier was using block instrumentation because of optimizations enabled.
When we re-jit normal Tier0 to instrumentation we use edge-based instrumentation (because we don't use optimizations).

So, for easier testing, we can always use optimizations in the instrumentation tier even if we rejit just IL-only tier0. (or it can be stress-mode based).

@AndyAyersMS
Copy link
Member Author

So, for easier testing, we can always use optimizations in the instrumentation tier even if we rejit just IL-only tier0. (or it can be stress-mode based)

Probably a good stress mode to add, yeah.

@AndyAyersMS
Copy link
Member Author

Recent failures all looked like infra issues, retrying.

@AndyAyersMS
Copy link
Member Author

@EgorBo ping

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Played with your branch locally and it looks great!

@AndyAyersMS AndyAyersMS merged commit eb76787 into dotnet:main Jan 13, 2023
@AndyAyersMS
Copy link
Member Author

TE improvements (instrumented Tier0 code perf). Note we're still quite a ways below Tier0 (which is quite a ways below optimized).

image

image

@EgorBo
Copy link
Member

EgorBo commented Jan 17, 2023

Wow, nice wins! block counters seem to be expensive

@ghost ghost locked as resolved and limited conversation to collaborators Feb 17, 2023
@jeffhandley jeffhandley added the blog-candidate Completed PRs that are candidate topics for blog post coverage label Mar 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blog-candidate Completed PRs that are candidate topics for blog post coverage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix interaction of PGO and partial compilation JIT: resolve issues with OSR and PGO
3 participants