Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT: should always decompose GT_MOD into a - (a / b) * b for CSE #32615

Open
EgorBo opened this issue Feb 20, 2020 · 0 comments
Open

RyuJIT: should always decompose GT_MOD into a - (a / b) * b for CSE #32615

EgorBo opened this issue Feb 20, 2020 · 0 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Feb 20, 2020

A % B can be transformed into A - ((A / B) * B) and it's always done for ARM because it doesn't have the remainder instruction.
Also for some reason it's done on x64 only when B is a const and a power of two and I suggest we always use what is used for ARM.

It gives VN/CSE more opportunities to find what to optimize, e.g.

int x = a / b;
int y = a % b;

can be decomposed into:

int x = a / b;
int y = a - ((a / b) * b);

then VN/CSE optimizes it to:

int x = a / b;
int y = a - x * b;

So the existing hack in Math.DivRem won't be needed, this pattern when we need both / and % is quite popular, e.g.: MemoryExtensions:Overlaps
or here is a jit-diff:

Top method improvements (percentages):
         -15 (-23.08% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.XmlCharType:SplitSurrogateChar(int,byref,byref)
         -15 (-23.08% of base) : System.Private.Xml.dasm - System.Xml.XmlCharType:SplitSurrogateChar(int,byref,byref)
         -13 (-21.31% of base) : System.Runtime.Extensions.dasm - System.Net.WebUtility:ConvertSmpToUtf16(int,byref,byref)
          -9 (-14.52% of base) : System.Collections.dasm - System.Collections.Generic.BitHelper:MarkBit(int):this
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int32],System.ReadOnlySpan`1[Int32],byref):bool
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Double],System.ReadOnlySpan`1[Double],byref):bool
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Vector`1],System.ReadOnlySpan`1[Vector`1],byref):bool
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int64],System.ReadOnlySpan`1[Int64],byref):bool
         -14 (-10.22% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[__Canon],System.ReadOnlySpan`1[__Canon],byref):bool
          -7 (-9.33% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Double][System.Double]:SetValue(int,double):this
         -11 (-8.80% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int16],System.ReadOnlySpan`1[Int16],byref):bool
          -6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[__Canon][System.__Canon]:SetValue(int,System.__Canon):this
          -6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Int32][System.Int32]:SetValue(int,int):this
          -6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Int64][System.Int64]:SetValue(int,long):this
          -6 (-8.33% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Byte][System.Byte]:SetValue(int,ubyte):this

the only thing, lowering should compose it back to GT_MOD on x64 if CSE didn't find anything (when op2 is not a constant).

I think this optimization should be way easier to implement than the extraction of both values from idiv (#5213)

@dotnet/jit-contrib

category:cq
theme:optimization
skill-level:intermediate
cost:medium
impact:small

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 20, 2020
@BruceForstall BruceForstall changed the title RuyJIT: should always decompose GT_MOD into a - (a / b) * b for CSE RyuJIT: should always decompose GT_MOD into a - (a / b) * b for CSE Mar 12, 2020
@BruceForstall BruceForstall added optimization and removed untriaged New issue has not been triaged by the area owner labels Mar 12, 2020
@BruceForstall BruceForstall added this to the Future milestone Mar 12, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

3 participants