-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Visit switch successors in increasing likelihood order for RPO-based layout #101935
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Here's the diff summary on win-x64, since the RPO-based layout is disabled by default. Diffs aren't that big, and they aren't overwhelmingly in one direction... Diffs are based on 2,534,676 contexts (987,849 MinOpts, 1,546,827 FullOpts). MISSED contexts: 2,922 (0.12%) Overall (-8,298 bytes)
FullOpts (-8,298 bytes)
|
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. If you don't think it's worth churning the RPO layout implementation while running an experiment in the perf lab (especially since the diffs don't look like particularly large wins), I'm happy to undo those changes and just keep the block visitor cleanup work. |
@jakobbotsch this cleans up the |
@@ -2497,7 +2501,7 @@ class AllSuccessorEnumerator | |||
|
|||
public: | |||
// Constructs an enumerator of all `block`'s successors. | |||
AllSuccessorEnumerator(Compiler* comp, BasicBlock* block, const bool useProfile = false); | |||
AllSuccessorEnumerator(Compiler* const comp, BasicBlock* const block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: marking parameters as const in declarations is not very beneficial (it is a property of the definition, not of the declaration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a number of places where we have these const
. IMO we should enable this rule of clang-tidy: https://clang.llvm.org/extra/clang-tidy/checks/readability/avoid-const-params-in-decls.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a number of places where we have these const. IMO we should enable this rule of clang-tidy:
That seems reasonable. I can open a PR for that.
Compiler::SwitchUniqueSuccSet sd = comp->GetDescriptorForSwitch(this); | ||
jitstd::sort(sd.nonDuplicates, (sd.nonDuplicates + sd.numDistinctSuccs), | ||
[](FlowEdge* const lhs, FlowEdge* const rhs) { | ||
return (lhs->getLikelihood() * lhs->getDupCount()) < (rhs->getLikelihood() * rhs->getDupCount()); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd for the visitor function to be mutating the descriptor.
It would make more sense to me if this sorting step was done ahead of time as part of the block layout phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make more sense to me if this sorting step was done ahead of time as part of the block layout phase.
Are you thinking of something like a pass over the block list that sorts all the switch descriptors, before computing the DFS? If so, considering the diffs don't look all that significant, I'm tempted to just visit switch successors in their unsorted order to avoid cluttering the visitor APIs, or adding another loop to the layout algorithm that won't do anything most of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that, or if that is undesirable, then some map created lazily in the context of the block layout phase (i.e. when we visit the switch block). I guess the latter is not much different to just allocating new memory to sort these successors into.
But at that point it makes more sense to me if we change AllSuccessorEnumerator
slightly: in reality it's really just a vector with small vector optimization for N=4
. If you phrase fgRunDfs
in terms of a callback that returns this small vector of successors, then the memory where we can do the sorting without mutating existing data structures will already be available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, considering the diffs don't look all that significant, I'm tempted to just visit switch successors in their unsorted order to avoid cluttering the visitor APIs
I would suggest this, unless you have some information that order of successors matters for switches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you phrase fgRunDfs in terms of a callback that returns this small vector of successors, then the memory where we can do the sorting without mutating existing data structures will already be available.
I tried something like this just now, but this ends up touching quite a bit of code if we want to keep the callback naive to the block kind (which seems desirable). I was thinking we'd adjust AllSuccessorEnumerator
's underlying array to hold FlowEdge
pointers instead of BasicBlock
pointers to the block's successors, so that the callback can easily sort the array by edge likelihoods -- otherwise, we'd need to get the edge of each successor block with fgGetPredForBlock
, which seems needlessly expensive. This means adjusting BasicBlock::VisitAllSuccs
to take a callback that operates on FlowEdge*
instead of BasicBlock*
, but this doesn't work well in VisitEHSuccs
, as EHblkDsc
points to EH successors using BasicBlock
pointers instead of FlowEdge
pointers. We don't create edges to those successors, so we can't switch that data structure over to using edges.
I would suggest this, unless you have some information that order of successors matters for switches.
I think it makes most sense to leave switch successors alone for now, but still abstract the layout-specific ordering code out of VisitAllSuccs
. If we're only doing anything special for BBJ_COND
blocks for now, do we want a callback-based approach in fgRunDfs
(I presume this callback would just manually compare the likelihoods of conditional blocks' true/false targets) that can be easily expanded later? Or are we ok with having a separate visitor method, VisitAllSuccsInLikelihoodOrder
, that handles the BBJ_COND
case while setting up the array in AllSuccessorEnumerator
?
Follow-up to #101473. Also cleans up the
BasicBlock
successor visitor API surface a bit by separating logic for visiting successors in increasing likelihood order intoBasicBlock::VisitAllSuccsInLikelihoodOrder
.