Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Optimize "X & 1 == 0" to "X & 1" #61412

Closed
EgorBo opened this issue Nov 10, 2021 · 21 comments · Fixed by #74806
Closed

JIT: Optimize "X & 1 == 0" to "X & 1" #61412

EgorBo opened this issue Nov 10, 2021 · 21 comments · Fixed by #74806
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI good first issue Issue should be easy to implement, good for first-time contributors help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Nov 10, 2021

static bool Test(int x) => (x & 1) == 0;

Just a good first issue for anyone interested in contributing to CLR JIT.

Current codegen:

test     cl, 1
sete     al
movzx    rax, al
ret

Expected codegen:

mov      eax, ecx
and      eax, 1
ret

I noticed it when I was trying to implement a faster IsNegative for floats:

static bool IsNegative(double x) => 
    (Sse2.MoveMask(Vector128.CreateScalarUnsafe(x)) & 1) != 0;
@EgorBo EgorBo added the tenet-performance Performance related issue label Nov 10, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Nov 10, 2021
@ghost
Copy link

ghost commented Nov 10, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details
static bool Test(int x) => (x & 1) == 0;

Just a good first issue for anyone interested in contributing to CLR JIT.

Current codegen:

test     cl, 1
sete     al
movzx    rax, al
ret

Expected codegen:

mov      eax, ecx
and      eax, 1
ret

I noticed it when I was trying to implement a faster IsNegative for floats:

static bool IsNegative(double x) => 
    (Sse2.MoveMask(Vector128.CreateScalarUnsafe(x)) & 1) != 0;
Author: EgorBo
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo EgorBo added good first issue Issue should be easy to implement, good for first-time contributors help wanted [up-for-grabs] Good issue for external contributors labels Nov 10, 2021
@EgorBo EgorBo added this to the Future milestone Nov 10, 2021
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Nov 10, 2021
@SkiFoD
Copy link
Contributor

SkiFoD commented Nov 10, 2021

@EgorBo Hey, I'm interested to work with it, but I'll need your help because I'm not experienced in building and debugging the CLR.

@EgorBo
Copy link
Member Author

EgorBo commented Nov 10, 2021

@EgorBo Hey, I'm interested to work with it, but I'll need your help because I'm not experienced in building and debugging the CLR.

Cool! Then let me experiment on you to understand how good and first-time-contributor friendly our docs are 🙂
Here are those docs: https://github.com/dotnet/runtime/blob/main/docs/workflow/debugging/coreclr/debugging.md
try to follow them and your task is to eventually be able to place a breakpoint, say, in Compiler::fgMorphTree for a simple hello world app you created locally. Feel free to ask any questions.

@hopperpl
Copy link

oh, I'm interested in contributing as well, I found a lot of minor code generation issues that I'd like to improve...

here is an example:

M02_L00:
       7FFB7916B0D8 test      edx,10000
       7FFB7916B0DE je        short M02_L01
       7FFB7916B0E0 lea       rdx,[rsi+20]
       7FFB7916B0E4 and       dword ptr [rdx],0FF81FFFF
M02_L01:
       7FFB7916B0EA mov       edx,[rdi+18]
       7FFB7916B0ED test      edx,1000000
       7FFB7916B0F3 je        short M02_L02
       7FFB7916B0F5 lea       rdx,[rsi+20]
       7FFB7916B0F9 and       dword ptr [rdx],0C3FFFFFF
M02_L02:
       7FFB7916B0FF mov       edx,[rdi+18]
       7FFB7916B102 test      edx,2000000
       7FFB7916B108 je        short M02_L03
       7FFB7916B10A lea       rdx,[rsi+20]
       7FFB7916B10E or        dword ptr [rdx],20000000
M02_L03:

lea instructions are not necessary; rdi+18h should not be read multiple times, etc. I also want more "cmov" instructions instead of chains of jcc blocks with the ? : ternary operator when all are constants. So yeah, I have a lot of asm optimization ideas on my list.

Anyway... I know how to debug the JIT, but is there some documentation on how to extract the JIT generated code? DotnetBenchmark does somewhat of a job, but I don't know how to control the disassembly output. Visual Studio surely also allows copy & paste from the disassembly window, but I was more thinking of some form of automation first to find areas of "bad code" gen. Is there some documentation on how to do this? How did you extract the code snippet above?

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Nov 10, 2021

Anyway... I know how to debug the JIT, but is there some documentation on how to extract the JIT generated code?

@hopperpl usually this is done by using the various environment variables, e. g. for getting disassembly it would be DOTNET_JitDisasm=Class::Method.

but I was more thinking of some form of automation first to find areas of "bad code" gen

I am not aware of such automation. There is an assembly scanner tool: https://github.com/dotnet/jitutils/tree/main/src/AnalyzeAsm, but in general the addition of peepholes in the Jit is done in an ad-hoc manner (with exceptions of course).

@tkekalainen
Copy link
Contributor

tkekalainen commented Nov 11, 2021

Maybe I'm missing something, but isn't this optimization wrong?
(x & 1) == 0 returns 1/true if the first (least significant) bit is zero, and 0/false if the first bit is a one. x & 1 returns 1 if the first bit is a one, and 0 if the first bit is a zero.
Doesn't x & 1 equal (x & 1) == 1, not (x & 1) == 0? (Of course at low level, not in C#)

Here's what Clang generates:
https://godbolt.org/z/Kfoaa644a

@SkiFoD
Copy link
Contributor

SkiFoD commented Nov 11, 2021

@EgorBo Thank you. The guide was helpful even though I needed to do some extra research to make it work :) I managed to set a breakpoint in Compiler::fgMorphTree and even used DOTNET_JitDisasm=Class::Method, so the experiment was successful. The JIT code is very new to me, so It would be cool if you gave me a hint where is the best point to start at.

@EgorBo
Copy link
Member Author

EgorBo commented Nov 14, 2021

I needed to do some extra research to make it work

Could you please share more details what exactly was needed so we can improve the doc?

It would be cool if you gave me a hint where is the best point to start at.

you can take a look at the existing optimizations in morph.cpp, e.g. this one #52524 (it's for (x compre y) & 1 => x compare y.

Doesn't x & 1 equal (x & 1) == 1, not (x & 1) == 0?

Yes, thanks, it was a typo

lea instructions are not necessary; rdi+18h should not be read multiple times, etc. I also want more "cmov" instructions instead of chains of jcc blocks with the ? : ternary operator when all are constants. So yeah, I have a lot of asm optimization ideas on my list.

The best way is to file an issue for a specific pattern where we can discuss it and give you pointers where to look at. E.g. 'cmov' related optimizations aren't as easy as they sound.

For the assembly code you can try my VS2022 add-in Disasmo: https://github.com/EgorBo/Disasmo

@SkiFoD
Copy link
Contributor

SkiFoD commented Nov 15, 2021

Could you please share more details what exactly was needed so we can improve the doc?
I came across with errors during building of the INSTALL project after doing some researches I found a solution. The coreclr solution needed clr+libs had been built, so basically it was my fault not the doc's :)

@hopperpl
Copy link

this was my experience...

I was not able to build the runtime as instructed here https://github.com/dotnet/runtime/tree/main/docs/workflow/building/coreclr

I'm not sure if it is related to #60061 or not. I think I received a different error about an invalid parameter (cmake). It referred me to log files but there were no files in the log directory. This "Send-VsDevShellTelemetry" is quite noisy and confused me a lot until I noticed that this was not the cause of the error, it just pulled all my attention.

Then I switched to -msbuild instead as indirectly suggested by the doc. This worked without a problem.

Regarding the "How to Debug" instruction, this worked like described.

A hint about the needed storage size for the build would be nice. I need 2.5 GB for the source and then 24 GB on top for the build. About 8 GB per configuration (Debug, Checked, Release)

SkiFoD added a commit to SkiFoD/runtime that referenced this issue Mar 15, 2022
SkiFoD added a commit to SkiFoD/runtime that referenced this issue Mar 15, 2022
SkiFoD added a commit to SkiFoD/runtime that referenced this issue Mar 15, 2022
SkiFoD added a commit to SkiFoD/runtime that referenced this issue Mar 15, 2022
SkiFoD added a commit to SkiFoD/runtime that referenced this issue Mar 15, 2022
JulieLeeMSFT pushed a commit that referenced this issue Mar 21, 2022
* Add optimization "X & 1 == 1" to "X & 1" (#61412)

* Moved the optimization to the morph phase (#61412)

* Done in post-order (#61412)

* Moved the optimization into fgOptimizeEqualityComparisonWithConst (#61412)

* Some corrections due the comments (#61412)

* Fix of the picture (#61412)

* Add optNarrowTree use (#61412)

* Change narrowing to the type check (#61412)

* Fix regressions (#61412)

* Moved the optimization to the lowering phase (#61412)

* Reverted Morph changes (#61412)

* Moved the optimization into OptimizeConstCompare method (#61412)

* Add GT_EQ check(#61412)
radekdoulik pushed a commit to radekdoulik/runtime that referenced this issue Mar 30, 2022
* Add optimization "X & 1 == 1" to "X & 1" (dotnet#61412)

* Moved the optimization to the morph phase (dotnet#61412)

* Done in post-order (dotnet#61412)

* Moved the optimization into fgOptimizeEqualityComparisonWithConst (dotnet#61412)

* Some corrections due the comments (dotnet#61412)

* Fix of the picture (dotnet#61412)

* Add optNarrowTree use (dotnet#61412)

* Change narrowing to the type check (dotnet#61412)

* Fix regressions (dotnet#61412)

* Moved the optimization to the lowering phase (dotnet#61412)

* Reverted Morph changes (dotnet#61412)

* Moved the optimization into OptimizeConstCompare method (dotnet#61412)

* Add GT_EQ check(dotnet#61412)
@En3Tho
Copy link
Contributor

En3Tho commented Aug 26, 2022

This can be closed @EgorBo

@stephentoub
Copy link
Member

Was this fully addressed?

The PR that was cited as closing this addressed the case:

static bool Equal1(int x) => (x & 1) == 1;

turning this on .NET 6:

       test      dl,1
       setne     al
       movzx     eax,al
       ret

into this on .NET 7:

       mov       eax,edx
       and       eax,1
       ret

but for the original example:

static bool Equal0(int x) => (x & 1) == 0;

on .NET 6 I get:

       test      dl,1
       sete      al
       movzx     eax,al
       ret

and on .NET 7 I get:

       xor       eax,eax
       test      dl,1
       sete      al
       ret

Wouldn't we expect it to be more like:

       mov       eax,edx
       not       eax
       and       eax,1
       ret

?

The PR also didn't seem to address the case of != 0 and != 1?

@En3Tho
Copy link
Contributor

En3Tho commented Aug 28, 2022

Sorry, this might have been a disinformation from me then. I've been looking at help-wanted issues and some of them were still active but with linked pull requests already merged like this one.

Should we add a new tag like pr-merged-on-trial so new contributors could see that issue is sorta resolved but needs confirmation. Now you have to check it manually.

Should we reopen this or open a new separate issue?

@SkiFoD are you willing to work on this some more?

@SingleAccretion
Copy link
Contributor

Let's reopen this; the case of (x & 1) != 0 has not been addressed.

@SkiFoD
Copy link
Contributor

SkiFoD commented Aug 28, 2022

@En3Tho Hey, if you want to continue working on the issue I don't mind :)

@En3Tho
Copy link
Contributor

En3Tho commented Aug 28, 2022

I will try. Thanks.

En3Tho added a commit to En3Tho/runtime that referenced this issue Aug 30, 2022
@dubiousconst282
Copy link
Contributor

This would be covered by my PR #73120, I unfortunately missed this issue until now. Should I revert it and only cover #72986?

@En3Tho
Copy link
Contributor

En3Tho commented Aug 30, 2022

I guess if your change is a broader one then we should go with it? Also, you've already received reviews on it.

Mine covers only x & 1 =/!= 1/0 and it's a new one.

@dubiousconst282
Copy link
Contributor

Yeah, I think that makes sense.

On the other hand, my PR doesn't handle comparisons to zero ((X & 1) == 0 => ((NOT X) & 1)), so there's still room for improvement if you're interested.

@En3Tho
Copy link
Contributor

En3Tho commented Aug 30, 2022

Yeah my pr handles those cases. I guess we should let team decide what to merge. Maybe you can incorporate my changes somehow (if they are acceptable), dunno.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 23, 2022
@BruceForstall BruceForstall assigned TIHan and unassigned TIHan Sep 28, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 11, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI good first issue Issue should be easy to implement, good for first-time contributors help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
None yet
10 participants