Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Codegen for combined mod/div with power of 2 constants. #8739

Closed
redknightlois opened this issue Aug 15, 2017 · 6 comments
Closed

[JIT] Codegen for combined mod/div with power of 2 constants. #8739

redknightlois opened this issue Aug 15, 2017 · 6 comments

Comments

@redknightlois
Copy link

Power of 2 constants with divide and modulus missing optimization.

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int GetNumberOfOverflowPages(long overflowSize)
        {
            overflowSize += Constants.Tree.PageHeaderSize;
            return (int)(overflowSize / Constants.Storage.PageSize) + (overflowSize % Constants.Storage.PageSize == 0 ? 0 : 1);
        }

This code generates:

00007FFA13645D12 48 63 CB             movsxd      rcx,ebx  
00007FFA13645D15 48 83 C1 40          add         rcx,40h  
00007FFA13645D19 48 8B D1             mov         rdx,rcx  
00007FFA13645D1C 4C 8B C2             mov         r8,rdx  
00007FFA13645D1F 49 C1 F8 3F          sar         r8,3Fh  
00007FFA13645D23 49 81 E0 FF 1F 00 00 and         r8,1FFFh  
00007FFA13645D2A 4C 03 C2             add         r8,rdx  
00007FFA13645D2D 49 C1 F8 0D          sar         r8,0Dh  
00007FFA13645D31 41 8B D0             mov         edx,r8d  
00007FFA13645D34 4C 8B C1             mov         r8,rcx  
00007FFA13645D37 49 C1 F8 3F          sar         r8,3Fh  
00007FFA13645D3B 49 81 E0 FF 1F 00 00 and         r8,1FFFh  
00007FFA13645D42 4C 03 C1             add         r8,rcx  
00007FFA13645D45 49 81 E0 00 E0 FF FF and         r8,0FFFFFFFFFFFFE000h  
00007FFA13645D4C 49 2B C8             sub         rcx,r8  

However you can actually rewrite that (knowing that the constant is power of 2) into:

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int GetNumberOfOverflowPages(long overflowSize)
        {            
            overflowSize += Constants.Tree.PageHeaderSize;
            return (int)(overflowSize >> Constants.Storage.PageSizeShift) + ((overflowSize & Constants.Storage.PageSizeMask) == 0 ? 0 : 1);
        }

Which will output:

00007FFA13635FF2 48 63 CB             movsxd      rcx,ebx  
00007FFA13635FF5 48 83 C1 40          add         rcx,40h  
00007FFA13635FF9 48 8B D1             mov         rdx,rcx  
00007FFA13635FFC 48 C1 FA 0D          sar         rdx,0Dh  
00007FFA13636000 F7 C1 FF 1F 00 00    test        ecx,1FFFh  
00007FFA13636006 74 07                je          00007FFA1363600F  
00007FFA13636008 B9 01 00 00 00       mov         ecx,1  
00007FFA1363600D EB 02                jmp         00007FFA13636011  
00007FFA1363600F 33 C9                xor         ecx,ecx  
00007FFA13636011 44 8D 34 0A          lea         r14d,[rdx+rcx] 

That is half the bytes.

@omariom
Copy link
Contributor

omariom commented Aug 15, 2017

what if you change overflowSize to ulong?

@mikedn
Copy link
Contributor

mikedn commented Aug 15, 2017

what if you change overflowSize to ulong?

That should generate code that's similar to the code generated by the "fixed" example without requiring PageSizeShift and PageSizeMask.

Some clarifications:

Signed division by power of 2 generates code like:

       8BC1                 mov      eax, ecx
       8BD0                 mov      edx, eax
       C1FA1F               sar      edx, 31
       81E2FF1F0000         and      edx, 0x1FFF
       03D0                 add      edx, eax
       8BC2                 mov      eax, edx
       C1F80D               sar      eax, 13

Signed modulo power of 2 generates code like:

       8BC1                 mov      eax, ecx
       8BD0                 mov      edx, eax
       C1FA1F               sar      edx, 31
       81E2FF1F0000         and      edx, 0x1FFF
       03D0                 add      edx, eax
       81E200E0FFFF         and      edx, -0x2000
       2BC2                 sub      eax, edx

In case anyone wonders why the generated code doesn't reduce to a single shift/and - that's due to the way truncation is done. Integer division truncates toward 0 so (-3 / 2) is -1. Right shift truncates toward 0 as well but due to the way negative numbers are represented it really truncates toward negative infinity so (-3 >> 1) is -2. The generated code compensates for this by adjusting the dividend when it is negative so it computes ((-3 + 1) >> 1).

The adjustment happens to be similar for both division and modulus so there is some redundant code if we have both x / 4 and x % 4. It may be possible to avoid the redundant code by morphing x % 4 to x - 4 * (x / 4) and hope that CSE picks up the redundant x / 4. The JIT already does this for division by non power of 2 constants.

Still, using unsigned division (when the dividend is known to be positive) results in better code:

; unsigned division
       8BC1                 mov      eax, ecx
       C1E80D               shr      eax, 13
; unsigned modulo
       8BC1                 mov      eax, ecx
       25FF1F0000           and      eax, 0x1FFF

It would be nice if the JIT could detect case when the dividend is known to be positive and replace signed division with unsigned division but it's not trivial. And in the absence of interprocedural optimization it's probably not very useful.

@redknightlois
Copy link
Author

Closing confirmed that using unsigned long will recognize it and generate the proper code.

@mikedn
Copy link
Contributor

mikedn commented Aug 16, 2017

You could leave this open, it may be useful to take a closer look at what the JIT can do to improve this.

@redknightlois
Copy link
Author

As you pointed out without a proper way to reason about the sign of the value the optimization is generally not safe to be done on long; so it doesn't make sense to keep it open. But better, lets ask people in the know about what to do about it. cc @AndyAyersMS @CarolEidt @stephentoub

@mikedn
Copy link
Contributor

mikedn commented Aug 16, 2017

As you pointed out without a proper way to reason about the sign of the value the optimization is generally not safe to be done on long;

Yes, in general it's rather expensive to prove that values are positive (or within a given range). But in some case this can be done relatively cheaply by piggy-backing on range check elimination:

for (int i = 0; i < a.Length; i++) {
    a[i] = i / 2; // JIT already knows that i is within array bounds and thus positive
}

The big question is if it's worth doing this or not.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants