-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: add OSR patchpoint strategy, inhibit tail duplication #66208
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsTwo changes for OSR:
Adaptive placement may end up working out better overall, but
|
@BruceForstall PTAL |
src/coreclr/jit/importer.cpp
Outdated
// | ||
assert(!block->hasHndIndex()); | ||
const int patchpointStrategy = JitConfig.TC_PatchpointStrategy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be clearer to have enum PatchpointStrategy { BackedgeSource, BackedgeTarget, Adaptive };
and return enum value from TC_PatchpointStrategy()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config system is kind of primitive, so probably not.
Looks like this will need some tweaking. We may not be at stack empty point at the top of a source block and so might miss out setting some needed patchpoints. IIRC ecma requires that IL be stack empty at backedge targets, so the issue hasn't come up before. Also we may see backedge sources in handler regions; can't put patchpoints there. |
Two changes for OSR: * add new strategies for placing patchpoints -- either at backedge sources (instead of targets) or adaptive. depending on number of backedges. Change default to adaptive, since this works better with the flow we see from C# `for` loops. * inhibit tail duplication for OSR as it may end up interfering with loop recognition. We may not be able to place patchpoints at sources, for various reasons; if so we fall back to placing them at targets.
734a821
to
d0228c8
Compare
Ok, should work a bit better now. |
@@ -551,6 +551,8 @@ enum BasicBlockFlags : unsigned __int64 | |||
BBF_HAS_ALIGN = MAKE_BBFLAG(39), // BB ends with 'align' instruction | |||
BBF_TAILCALL_SUCCESSOR = MAKE_BBFLAG(40), // BB has pred that has potential tail call | |||
|
|||
BBF_BACKWARD_JUMP_SOURCE = MAKE_BBFLAG(41), // Block is a source of a backward jump |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate that we have/need bits for this (and BBF_BACKWARD_JUMP_TARGET). Lots of code just compares bbNum, but that depends on bbNum being up-to-date; presumably you can't depend on that? Oh, I guess in addition, you don't want to depend on the preds list being up-to-date and walking the preds list? These bits can obviously themselves get out of date if the flow graph is reordered. Also, do we properly update these bits when, say, we merge blocks and delete the one with the bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The def/use happens very early, in the flow graph build and importer, before anything else can mess with the flow graph. I could look at source/target nums instead, but that would require enumeration of succs (for sources) and preds (for targets) for every block which would be more costly, and I'd have to build cheap preds.
Nothing later depends on these flag bits. I could scrub them out (at least in debug) if you think leaving them around and potentially gone stale is an attractive nuisance.
Or we could figure out how to have limited-lifetime flags that automatically expire at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful if we knew when any of the block bits were expected to be valid, either as documentation here, or better, with some kind of checking. Scrubbing them might not be possible (they're only 0/1 already, so scrubbing to 0 doesn't help), unless we add a "valid/invalid" bit for each one, with a debug accessor/checker. Certainly not required with this change (unless you wanted to add comments describing their lifetime).
src/coreclr/jit/importer.cpp
Outdated
|
||
if (succBlock->bbNum <= block->bbNum) | ||
{ | ||
// The succBlock had better agree it's as target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as => a ?
Most failures look like some kind of CI outage or configuration change
Test failure almost certainly unrelated as OSR is not enabled innerloop except in a handful of runtime tests. |
Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
Sigh, using github to edit code has its downsides. Will fix. |
/azp run runtime-jit-experimental |
Azure Pipelines successfully started running 1 pipeline(s). |
OSR stress testing revealed a few issues:
Fixes forthcoming. |
/azp run runtime-jit-experimental |
Azure Pipelines successfully started running 1 pipeline(s). |
Several of bindhandle tests failing in jit-experimental. Can't repro locally so far
There is a concurrent run in main going on right now, will see if it hits those as well. |
The bindhandle failures seem to be somewhat widespread, we hit them in #65682 as well (#65682 (comment)). |
Yeah -- the jit-experimental run on main also seeing those failures. |
Looks like both this change and main see 146 failures in jit-experimental. Spot checked a bunch and all matched, so assuming they exactly overlap. |
OSX runs seem to be consistently timing out. Nothing here is arch/os specific, and it's almost all off by default. Will ignore. |
Two changes for OSR:
backedge sources (instead of targets) or adaptive. depending
on number of backedges. Change default to adaptive, since this
works better with the flow we see from C#
for
loops.with loop recognition.
We may not be able to place patchpoints at sources, for various reasons;
if so we fall back to placing them at targets.