Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: inconsistent CQ for div/mod by power of 2 idioms #11442

Open
AndyAyersMS opened this issue Nov 10, 2018 · 6 comments
Open

JIT: inconsistent CQ for div/mod by power of 2 idioms #11442

AndyAyersMS opened this issue Nov 10, 2018 · 6 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Nov 10, 2018

From dotnet/corefx#33367:

Method Mean Error StdDev Ratio
DivAndMod 19.96 us 0.0407 us 0.0340 us 1.00
Div32Rem 16.27 us 0.0267 us 0.0237 us 0.81

Could be that these aren't exactly equivalent for negative values. But here i is non-negative.

public class BitArrayPerf
{
    private const int N = 10_000;
    private readonly int[] data;

    public BitArrayPerf()
    {
        data = new int[N];
        var rand = new Random(42);
        for (int i = 0; i < N; i++)
        {
            data[i] = rand.Next();
        }
    }

    [Benchmark(Baseline = true)]
    public int DivAndMod()
    {
        for (int i = 0; i < N; i++)
        {
            data[i / 32] |= (1 << (i % 32));
        }
        return data[N - 1];
    }

    [Benchmark]
    public int Div32Rem()
    {
        for (int i = 0; i < N; i++)
        {
            int quotient = Div32Rem(i, out int remainder);
            data[quotient] |= 1 << remainder;
        }
        return data[N - 1];
    }

    private static int Div32Rem(int number, out int remainder)
    {
        int quotient = number >> 5;   // Divide by 32.
        remainder = number - (quotient << 5);   // Multiply by 32.
        return quotient;
    }
}

category:cq
theme:basic-cq
skill-level:intermediate
cost:small
impact:small

@gfoidl
Copy link
Member

gfoidl commented Nov 11, 2018

Although for the modulo a bit-operation (& (b - 1)) should be used.

private static int Div32Rem(int number, out int remainder)
{
    int quotient = number >> 5;     // Divide by 32.
    remainder = number & 31;        // Modulo 32
    return quotient;
}

as it's faster.


PS: I assume you are absolutely aware of this, but just that this is noted 😉

@mikedn
Copy link
Contributor

mikedn commented Nov 11, 2018

Although for the modulo a bit-operation (& (b - 1)) should be used.

Yep, the JIT already does that for unsigned modulo.

And it's worth mentioning that it's faster:

Method Mean Error StdDev Scaled
DivAndMod 21.02 us 0.2705 us 0.2530 us 1.00
Div32Rem 16.82 us 0.1139 us 0.1009 us 0.80
Div32Rem_2 15.38 us 0.0987 us 0.0923 us 0.73

Not only that it's one operation less but it's also independent from the quotient computation so it can be done in parallel.

@erozenfeld
Copy link
Member

@dotnet/jit-contrib

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020
@TIHan TIHan self-assigned this Oct 31, 2022
@danmoseley
Copy link
Member

danmoseley commented Dec 22, 2022

Yep, the JIT already does that for unsigned modulo.

I may be missing something, but does that mean that these should produce the same code?

 public static uint X(uint i)
    {
        return (uint)(i % ((long)uint.MaxValue + 1));
    }
    
    public static uint X2(uint i)
    {
        return (uint)(i % 0x1_0000_0000L);
    }    
    
    public static uint Y(uint i)
    {
        return (uint)i & uint.MaxValue;
    }

I get

XX.X(UInt32)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: sub esp, 8
    L0006: push 0
    L0008: push ecx
    L0009: push 1
    L000b: push 0
    L000d: call 0x72042250
    L0012: mov [ebp-8], eax
    L0015: mov [ebp-4], edx
    L0018: mov eax, [ebp-8]
    L001b: mov esp, ebp
    L001d: pop ebp
    L001e: ret

XX.X2(UInt32)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: sub esp, 8
    L0006: push 0
    L0008: push ecx
    L0009: push 1
    L000b: push 0
    L000d: call 0x72042250
    L0012: mov [ebp-8], eax
    L0015: mov [ebp-4], edx
    L0018: mov eax, [ebp-8]
    L001b: mov esp, ebp
    L001d: pop ebp
    L001e: ret

XX.Y(UInt32)
    L0000: mov eax, ecx
    L0002: and eax, 0xffffffff
    L0005: ret

Is that covered by this issue?

edit: I guess 32 bit only.
https://sharplab.io/#v2:C4LghgzgtgPgAgJgAQA0VIN4FgBQT9K4GE65wCMAbEgK4CWAdsKgBT1NJ0CURB2exAnADsSNo2BcWdJAFIxLADYB7BgHMu7YADoAsmAAeANTCKaAUyQBqJOS5cA3L3wBfZyUEVqW1AnEdud35BIVF/SWk5JAAGA3IAfWik6MTkgBlHdxdid3ckL1oJJABNcM4eAXxgkPywrS4ZADJCpj1DEzNzJ0qkNx7cFyA===
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgAIANS8gbwFgAocl8p1txp4gRiXICuASwB2GKgAphY8kICU7Vg2YdWxAOzkpojHIlDyAUi0SANhBEBzOdIwA6ALLYEANWymBMcgGpyPOXIA3IosAL4hnKq8/LZUpNoy8hHKqmqaCbr6RuQADAg8APo5xTlFJQAyQRGhHBER5NGCOuQAmhmyCiosKakN6bZyBgBkTWKOzm4eMMFd5OGzTKFAA==

maybe it's low priority then.

@TIHan
Copy link
Contributor

TIHan commented Jan 11, 2023

Conerning this example:

    public int DivAndMod()
    {
        for (int i = 0; i < N; i++)
        {
            data[i / 32] |= (1 << (i % 32));
        }
        return data[N - 1];
    }

This could actually do the i & (32 - 1) transformation in the JIT if we verify that i is never negative.
The only way for this to work is if you know the range which we do for N since it is a const of 10_000.

@TIHan TIHan modified the milestones: Future, 8.0.0 Jan 17, 2023
@TIHan
Copy link
Contributor

TIHan commented Jan 28, 2023

Related: #12218

@TIHan TIHan removed their assignment Jan 31, 2023
@TIHan TIHan modified the milestones: 8.0.0, Future Jun 9, 2023
@BruceForstall BruceForstall added the tenet-performance Performance related issue label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

8 participants