-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: build basic block pred lists before morph #1309
JIT: build basic block pred lists before morph #1309
Conversation
Build basic block pred lists before morph, instead of after, and add an early flow optimization pass. Fix up a few places where ref counts or pred lists were not properly maintained in morph. The early flow opt pass enhances the local assertion prop run in morph (by fusing blocks), allows the jit to avoid morphing some unreachable blocks (thus saving a bit of TP), and lays the groundwork for more aggressive early branch folding that would be useful eg dotnet#27935).
Still need to look at TP impact and drill into some diffs. Initial jit-diffs looks promising, though.
|
Cool, I was wondering recently if predecessors can't be computed earlier. With predecessors available perhaps we can make What's the cause of the huge "tracing" assembly diff? Struct copies eliminated by assertion prop I guess... |
Preliminary look at diffs (just looking at asm, so guesswork here, will take a while to root cause these, especially in the larger methods): Good cases
Bad cases
|
Looking deeper at some of the code size diffs. Improvements
|
Still have an arm assert to chase down -- can't get it to repro with altjits. |
@dotnet/jit-contrib PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm cool change!
Throughput (pin icount on release crossgen spc, median of 5 runs each) is basically a wash....
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice change! Just a few little things
// the first block is not scratch and is targeted by a | ||
// branch. | ||
assert(fgFirstBB->bbRefs >= 1); | ||
fgFirstBB->bbRefs--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're here, it works be nice to update the header, including when and under what pre-conditions it will be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
// Don't build address modes out of non-foldable constants | ||
if (!op2->AsIntConCommon()->ImmedValCanBeFolded(compiler, addr->OperGet())) | ||
{ | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is a necessary fix, otherwise we fold handle + const into an LEA on x86, and lose track of a reloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; so it was an existing issue that was exposed by this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of ... looking at this again perhaps I should fix it differently.
VN-based constant prop won't propagate handles for 32 bit targets when opts.compReloc
is true. But assertion-based constant prop doesn't have this restriction. And so with this change, the early flow opts enable an assertion-based constant prop of a handle, and on x86 constant handle is fodder for LEA formation.
So maybe I should update optConstantAssertionProp
with similar restrictions as vn based const prop and revert the codegen change (or turn it into an assert).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will split this part of as a separate change, which should be zero-diff and will allow bit more refactoring than I'd do if I tried folding it in here. Should have that PR up soon.
|
||
if (fldObj != nullptr) | ||
{ | ||
return comp->fgAddrCouldBeNull(fldObj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was presumably needed because this is now called before morph has removed these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
|
||
// Run an early flow graph simplification pass | ||
if (opts.OptimizationEnabled()) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this was just cut-and-pasted, but it would be nice to update the comment style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
FWIW there are a lot of old-style comments throughout fgMorph
. And I'm tempted to "inline" fgMorph
into compCompile
so that most of the phases in the jit are captured in one method. Maybe I'll do this bigger change as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good.
…relocs This gives local constant prop the same behavior as value-number-based constant prop; both now block handle propagation when the jit must generate relocs (R2R), to ensure that handle references appear only in simple instruction forms like movs. Prerequisite to dotnet#1309, which can enable such propagation.
…relocs (#1593) This gives local constant prop the same behavior as value-number-based constant prop; both now block handle propagation when the jit must generate relocs (R2R), to ensure that handle references appear only in simple instruction forms like movs. Prerequisite to #1309, which can enable such propagation.
Still chasing down one last issue related to tail call validation that came up in offline testing. |
You should probably trigger outerloop jitstress jobs when doing final testing. I'm not sure what shape those jobs are in, though: as far as I can tell, "runtime-coreclr jitstress" has never actually succeeded: https://dev.azure.com/dnceng/public/_build?definitionId=658&_a=summary. You might want to try a Pri-1 normal / JitStress=2 run locally, to be safe. |
Are local tests broken? I can't get any incantation of src\coreclr\tests\runtest.cmd working; it is looking for "dotnet.cmd" and other stuff that now seems to be gone. |
@AndyAyersMS Are you up-to-date? Works for me. Note that dotnet.cmd has moved to the repo root. |
Not in that repo, no. I would prefer not to rebase just yet. |
Ran jitstress=2 locally on the pri-1 tests, and there were 39 failures. I don't know the baseline number though the rolling runs seem to have maybe 30 failures. At least one failure mode looks related so I'll investigate further:
|
Building tests via build.cmd rather than via build-test.cmd seems to go better. I believe that last commit fixes all the unique JitStress2 pri1 failures, though on rerun I got some new failures that appear to be unrelated (missing mscorrc). So re-rerunning.... Baseline run has 28 failing tests. Probably worth digging in and trying to get those cleaned up. |
Re-run of this change also has 28 failures... |
@BruceForstall any other suggestions for testing? |
Normally, I'd suggest kicking off the AzDO jitstress pipeline so you get all-platform coverage. I don't know what the state of these are currently. |
Looks like there are ~950 or so baseline failures.... /azp run runtime-coreclr jitstress |
Got a partial jitstress run through. Doesn't look like there are any unique failures, though it is hard to be 100% sure. |
@BruceForstall waiting on you to sign off or suggest other tests to run. |
Will look soon |
return P1(o); | ||
} | ||
|
||
// F3 needs to be too big to inline without being marked noinline, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of those tests that seems like it could be easily not testing what we want it to test with a few changes/improvements to JIT capabilities and heuristics. And we wouldn't know (or be notified about) when that happens. I wonder if there would be some way to specify for a test/function "I expect this to tail call", or "I expect this to be inlined", for example, and if it doesn't happen, the test fails (e.g., JIT assert).
@@ -2729,13 +2729,31 @@ inline unsigned Compiler::fgThrowHlpBlkStkLevel(BasicBlock* block) | |||
*/ | |||
inline void Compiler::fgConvertBBToThrowBB(BasicBlock* block) | |||
{ | |||
// If we're converting a BBJ_CALLFINALLY block to a BBJ_THROW block, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment not related to your change: the size of some of these "inline" functions has gotten ridiculous. Seems like we should get rid of the ".hpp" file entirely and assume a good compiler will inline appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, was going to move this one in particular, but perhaps they all should go.
Perhaps part of some broader breakup of the bigger files into related sets of functionality or something.
// | ||
// We maintain the invariant that a scratch BB ends with BBJ_NONE or | ||
// BBJ_ALWAYS, so that when adding independent bits of initialization, | ||
// callers can generally append to the fgFirstBB block without worring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: worring => worrying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see these before merging; will fix the typos in some future change.
@@ -453,6 +461,15 @@ void Compiler::fgEnsureFirstBBisScratch() | |||
block->inheritWeight(fgFirstBB); | |||
} | |||
|
|||
// The first block has an implicit ref count which we must | |||
// remove. Note the ref count could be greater that one, if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar-o: that => than
void Compiler::fgEnsureFirstBBisScratch() | ||
{ | ||
// This method does not update predecessor lists and so must only be called before they are computed. | ||
assert(!fgComputePredsDone); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this function now be called both before and after preds are computed? Or only after? Doesn't preds computation set bbRefs, and you assert on that, so it has to be after? In which case, could you assert(fgComputePredsDone)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be called both before and after.
bbRefs
gets set early on, before preds, and is mostly right, for most blocks.
Assertion prop uses `O2K_CONST_INT` for handles, even on 64 bit targets. Make suitable adjustments to the handle propagation blocking logic. Problem was introduced in dotnet#1593 and exposed by dotnet#1309. Fixes dotnet#1777
This gives us a single method that controls most of the jit's phase behavior. Largely a manual inline, though I updated some comments and changed one assert. Follow up from dotnet#1309; also see dotnet#2109.
Build basic block pred lists before morph, instead of after, and add an early
flow optimization pass. Fix up a few places where ref counts or pred lists
were not properly maintained in morph.
The early flow opt pass enhances the local assertion prop run in morph (by
fusing blocks), allows the jit to avoid morphing some unreachable blocks
(thus saving a bit of TP), and lays the groundwork for more aggressive early
branch folding that would be useful eg #27935).