-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce TP for targets with more than 64 Registers Part 1 #112704
Conversation
Tagging subscribers to this area: @hoyosjs |
ae81451
to
58b7c1a
Compare
4acef83
to
f726c2e
Compare
32047d1
to
7ec2042
Compare
ac9e948
to
a32c3cd
Compare
… SingleTypeRegSet instead.
/azp run runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 1 pipeline(s). |
src/coreclr/jit/lsra.cpp
Outdated
{ | ||
regNumber killedReg = genFirstRegNumFromMaskAndToggle(killedRegs); | ||
regNumber killedReg = (regNumber)(BitOperations::BitScanForward(killedRegs) + regBase); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider having an equivalent method of genFirstRegNumFromMaskAndToggle()
that takes SingleTypeRegSet
as parameter and operate on it. You can also pass lowBase
and highBase
, if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I should not have added regBase
where I did either. The following xor is slightly incorrect.
Should be
regNumber killedReg = (regNumber)(genFirstRegNumFromMaskAndToggle(killedRegs) + regBase); RegRecord* regRecord = getRegisterRecord(killedReg);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change as well. I'm done making changes now
src/coreclr/jit/lsra.cpp
Outdated
// regBase - `0` or `64` based on the `killedRegs` being processed | ||
// | ||
void LinearScan::freeKilledRegs(RefPosition* killRefPosition, | ||
regMaskSmall killedRegs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using regMaskSmall
instead of SingleTypeRegSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
/azp run runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cc @dotnet/jit-contrib |
/ba-g infra related changes |
Theory:
My profiling shows among others 3 methods whose tpdiff spikes when we have more than 64 registers –
processKills()
,freeRegisters()
,processBlockStartLocations()
. These 3 iterate over aregMaskTP
and modifies it. The idea here is to work on 2regMaskSmall
s separately instead ofregMaskTP
Expectation:
Targets with less than 64 registers will not see any impact. Targets with more than 64 registers(currently only arm64) will see tpdiff improvement.
Relevant issue:
#113670
Result
Using arm64 as proxy for target with more than 64 registers, from CI
linux arm64
Overall (-0.87% to -0.61%)
MinOpts (-1.39% to -0.88%)
FullOpts (-0.72% to -0.60%)
Local tests done:
I added some temporary changes locally so that x64 has more than 64 registers and ran superpmi tpdiff with/without the optimization to simulate impact with more than 64 registers
The results from local testing is below
Overall (-0.66% to -0.47%)
MinOpts (-1.14% to -0.65%)
FullOpts (-0.62% to -0.43%)