-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] X64 - More replacement sequences for integer multiplication by a constant #77137
Conversation
… Made SuperFileCheck anchors more likely to match.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescription Refactors some of the codegen multiply optimizations by moving them to lowering. Acceptance Criteria
|
Can you please run fuzz pipelines yourself once you're done with changes? |
/azp run Fuzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
Looks like Fuzzlyn x64 passed. I need to update one of the disasm checks as I forgot a minor thing. |
What about: - mov edx, eax
- shl edx, 1
+ lea edx, [2*rax] |
@tannergooding We already handle that case for
|
@dotnet/jit-contrib This is ready again - I fixed the tests and one of the earlier commits did pass fuzzlyn. |
Have you created a BDN microbenchmark to show the performance before/after your optimizations? |
Have not, but will do that now. |
@BruceForstall I've provided microbenchmark results in the description of this PR. |
Your SuperPMI jobs are failing with:
This has been fixed. Maybe you just need to rebase/re-push to trigger updated CI testing? |
@TIHan Note that the formatting job failed. If you don't already, try to get into the habit of running "jit-format -f" before pushing a PR change to GitHub. Also, it looks like the unix-x64 superpmi-diffs job failed for unknown reasons. |
@dotnet/jit-contrib @BruceForstall looks like CI is passing - is there anything else that I should do in the PR? |
Description
Resolves this issue: #75119 - we are only planning on doing 3-instruction replacement sequences or less, no more than that.
This PR does these two optimizations:
This PR also lifts the restriction that only allowed
GT_LCL_VAR
to be the first operand for "multiply by constant"; now it will create a tmp local if needs to - this ensures that we can take advantage of these optimizations.I wanted to keep the "multiply by constant" optimizations in one place, and that place is lowering. Doing this required to disable any "multiply by constant" optimizations from happening in Tier-0, which I think is reasonable. This means Tier-0 will always emit
imul
, except for the cases that can emitlea
.We only do the "multiply by constant" ->
lea
instruction in codegen.Notes:
There are other replacement sequences that were considered, but ultimately, those replacement sequences have higher latency totals than the single
imul
for modern CPUs. These cases are described and tested in theIntMultiply
disasm tests.Microbenchmark Results
Before:
After:
Acceptance Criteria