Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce TP for targets with more than 64 Registers Part 1 #112704

Merged
merged 5 commits into from
Mar 20, 2025

Conversation

DeepakRajendrakumaran
Copy link
Contributor

@DeepakRajendrakumaran DeepakRajendrakumaran commented Feb 19, 2025

Theory:

My profiling shows among others 3 methods whose tpdiff spikes when we have more than 64 registers – processKills(), freeRegisters(), processBlockStartLocations(). These 3 iterate over a regMaskTP and modifies it. The idea here is to work on 2regMaskSmalls separately instead of regMaskTP

Expectation:

Targets with less than 64 registers will not see any impact. Targets with more than 64 registers(currently only arm64) will see tpdiff improvement.

Relevant issue:

#113670

Result

Using arm64 as proxy for target with more than 64 registers, from CI

linux arm64

Overall (-0.87% to -0.61%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.71%
benchmarks.run_pgo.linux.arm64.checked.mch -0.75%
benchmarks.run_tiered.linux.arm64.checked.mch -0.87%
coreclr_tests.run.linux.arm64.checked.mch -0.78%
libraries.crossgen2.linux.arm64.checked.mch -0.68%
libraries.pmi.linux.arm64.checked.mch -0.67%
libraries_tests.run.linux.arm64.Release.mch -0.80%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.61%
realworld.run.linux.arm64.checked.mch -0.64%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.72%
MinOpts (-1.39% to -0.88%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.99%
benchmarks.run_pgo.linux.arm64.checked.mch -1.13%
benchmarks.run_tiered.linux.arm64.checked.mch -1.12%
coreclr_tests.run.linux.arm64.checked.mch -0.88%
libraries.crossgen2.linux.arm64.checked.mch -0.89%
libraries.pmi.linux.arm64.checked.mch -0.98%
libraries_tests.run.linux.arm64.Release.mch -1.12%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.89%
realworld.run.linux.arm64.checked.mch -1.25%
smoke_tests.nativeaot.linux.arm64.checked.mch -1.39%
FullOpts (-0.72% to -0.60%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.70%
benchmarks.run_pgo.linux.arm64.checked.mch -0.71%
benchmarks.run_tiered.linux.arm64.checked.mch -0.63%
coreclr_tests.run.linux.arm64.checked.mch -0.71%
libraries.crossgen2.linux.arm64.checked.mch -0.68%
libraries.pmi.linux.arm64.checked.mch -0.67%
libraries_tests.run.linux.arm64.Release.mch -0.70%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.60%
realworld.run.linux.arm64.checked.mch -0.63%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.72%

Local tests done:

I added some temporary changes locally so that x64 has more than 64 registers and ran superpmi tpdiff with/without the optimization to simulate impact with more than 64 registers

The results from local testing is below

Overall (-0.66% to -0.47%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.66%
benchmarks.run.windows.x64.checked.mch -0.51%
benchmarks.run_pgo.windows.x64.checked.mch -0.64%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch -0.51%
benchmarks.run_tiered.windows.x64.checked.mch -0.59%
coreclr_tests.run.windows.x64.checked.mch -0.63%
libraries.crossgen2.windows.x64.checked.mch -0.49%
libraries.pmi.windows.x64.checked.mch -0.57%
libraries_tests.run.windows.x64.Release.mch -0.66%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch -0.54%
realworld.run.windows.x64.checked.mch -0.54%
smoke_tests.nativeaot.windows.x64.checked.mch -0.47%
MinOpts (-1.14% to -0.65%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.87%
benchmarks.run.windows.x64.checked.mch -0.75%
benchmarks.run_pgo.windows.x64.checked.mch -0.86%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch -0.75%
benchmarks.run_tiered.windows.x64.checked.mch -0.82%
coreclr_tests.run.windows.x64.checked.mch -0.69%
libraries.crossgen2.windows.x64.checked.mch -0.65%
libraries.pmi.windows.x64.checked.mch -0.71%
libraries_tests.run.windows.x64.Release.mch -0.92%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch -0.75%
realworld.run.windows.x64.checked.mch -1.14%
smoke_tests.nativeaot.windows.x64.checked.mch -0.93%
FullOpts (-0.62% to -0.43%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.62%
benchmarks.run.windows.x64.checked.mch -0.51%
benchmarks.run_pgo.windows.x64.checked.mch -0.60%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch -0.51%
benchmarks.run_tiered.windows.x64.checked.mch -0.43%
coreclr_tests.run.windows.x64.checked.mch -0.59%
libraries.crossgen2.windows.x64.checked.mch -0.49%
libraries.pmi.windows.x64.checked.mch -0.57%
libraries_tests.run.windows.x64.Release.mch -0.57%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch -0.53%
realworld.run.windows.x64.checked.mch -0.53%
smoke_tests.nativeaot.windows.x64.checked.mch -0.47%

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 19, 2025
Copy link
Contributor

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

@DeepakRajendrakumaran DeepakRajendrakumaran changed the title Testing out r2r tests TP diff check Mar 4, 2025
@DeepakRajendrakumaran DeepakRajendrakumaran force-pushed the 112329 branch 3 times, most recently from ae81451 to 58b7c1a Compare March 11, 2025 00:16
@DeepakRajendrakumaran DeepakRajendrakumaran force-pushed the 112329 branch 2 times, most recently from 4acef83 to f726c2e Compare March 17, 2025 23:39
@DeepakRajendrakumaran DeepakRajendrakumaran changed the title TP diff check Reduce TP for targets with more than 64 Registers Mar 17, 2025
@DeepakRajendrakumaran DeepakRajendrakumaran force-pushed the 112329 branch 2 times, most recently from 32047d1 to 7ec2042 Compare March 18, 2025 00:50
@DeepakRajendrakumaran DeepakRajendrakumaran changed the title Reduce TP for targets with more than 64 Registers Draft : https://github.com/dotnet/runtime/pull/112704/filesReduce TP for targets with more than 64 Registers Mar 18, 2025
@DeepakRajendrakumaran DeepakRajendrakumaran changed the title Draft : https://github.com/dotnet/runtime/pull/112704/filesReduce TP for targets with more than 64 Registers Draft : Reduce TP for targets with more than 64 Registers Mar 18, 2025
@DeepakRajendrakumaran DeepakRajendrakumaran force-pushed the 112329 branch 2 times, most recently from ac9e948 to a32c3cd Compare March 18, 2025 21:38
@DeepakRajendrakumaran
Copy link
Contributor Author

@DeepakRajendrakumaran DeepakRajendrakumaran marked this pull request as ready for review March 18, 2025 22:56
@DeepakRajendrakumaran DeepakRajendrakumaran changed the title Draft : Reduce TP for targets with more than 64 Registers Reduce TP for targets with more than 64 Registers Mar 18, 2025
@DeepakRajendrakumaran DeepakRajendrakumaran changed the title Reduce TP for targets with more than 64 Registers Reduce TP for targets with more than 64 Registers Part 1 Mar 18, 2025
@kunalspathak
Copy link
Member

/azp run runtime-coreclr superpmi-replay

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

{
regNumber killedReg = genFirstRegNumFromMaskAndToggle(killedRegs);
regNumber killedReg = (regNumber)(BitOperations::BitScanForward(killedRegs) + regBase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider having an equivalent method of genFirstRegNumFromMaskAndToggle() that takes SingleTypeRegSet as parameter and operate on it. You can also pass lowBase and highBase, if required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I should not have added regBase where I did either. The following xor is slightly incorrect.

Should be

regNumber killedReg = (regNumber)(genFirstRegNumFromMaskAndToggle(killedRegs) + regBase); RegRecord* regRecord = getRegisterRecord(killedReg);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change as well. I'm done making changes now

// regBase - `0` or `64` based on the `killedRegs` being processed
//
void LinearScan::freeKilledRegs(RefPosition* killRefPosition,
regMaskSmall killedRegs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using regMaskSmall instead of SingleTypeRegSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@kunalspathak
Copy link
Member

/azp run runtime-coreclr superpmi-replay

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BruceForstall
Copy link
Member

cc @dotnet/jit-contrib

@kunalspathak
Copy link
Member

/ba-g infra related changes

@kunalspathak kunalspathak merged commit bb5c6c3 into dotnet:main Mar 20, 2025
110 of 113 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Infrastructure-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants