Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@JosephTremoulet
Copy link

Modify CSE to identify compares which are functions of CSE candidate array
length nodes; when those length nodes get CSEd and consequently have their
value numbers updated, also update the value number on the compare
consuming the array length; this way the assertions generated in assertion
prop for the different compares on the CSEd array will use the same value
numbers, and range check elimination becomes more effective.

Resolves #5371

@JosephTremoulet
Copy link
Author

@briansull @dotnet/jit-contrib PTAL.

This gets the motivating case from #5371.

All diffs are bounds checks removed from loops from which we'd already hoisted the array length, as in this example (System.Dictionary.Clear):

--- ".\\DictionaryClear-before.asm"	2017-02-15 11:21:33.158935800 -0800
+++ ".\\DictionaryClear-after.asm"	2017-02-15 11:22:02.755138400 -0800
@@ -1,54 +1,49 @@
 ; Assembly listing for method Dictionary`2:Clear():this
 ; Emitting BLENDED_CODE for X64 CPU with SSE2
 ; optimized code
 ; rsp based frame
 ; fully interruptible
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T03] ( 23,  16.5)     ref  ->  rsi         this
-;  V01 loc0         [V01,T01] ( 13,  41.5)     int  ->  rcx        
+;  V01 loc0         [V01,T01] ( 12,  40.5)     int  ->  rcx        
 ;  V02 tmp0         [V02,T00] (  6,  48  )     ref  ->  rax        
 ;  V03 OutArgs      [V03    ] (  1,   1  )  lclBlk (32) [rsp+0x00]  
-;  V04 cse0         [V04,T02] ( 15,  25  )     int  ->  rdx        
+;  V04 cse0         [V04,T02] ( 14,  24  )     int  ->  rdx        
 ;  V05 cse1         [V05,T04] ( 11,  12.5)     ref  ->   r8        
 ;
 ; Lcl frame size = 32
 G_M52189_IG01:
        push     rsi
        sub      rsp, 32
        mov      rsi, rcx
 G_M52189_IG02:
        cmp      dword ptr [rsi+56], 0
        jle      SHORT G_M52189_IG05
        xor      ecx, ecx
        mov      r8, gword ptr [rsi+8]
        mov      edx, dword ptr [r8+8]
        test     edx, edx
        jle      SHORT G_M52189_IG04
 G_M52189_IG03:
        mov      rax, r8
-       cmp      ecx, edx
-       jae      SHORT G_M52189_IG06
        movsxd   r9, ecx
        mov      dword ptr [rax+4*r9+16], -1
        inc      ecx
        cmp      edx, ecx
        jg       SHORT G_M52189_IG03
 G_M52189_IG04:
        mov      rcx, gword ptr [rsi+16]
        mov      r8d, dword ptr [rsi+56]
        xor      edx, edx
        call     Array:Clear(ref,int,int)
        mov      dword ptr [rsi+64], -1
        xor      eax, eax
        mov      dword ptr [rsi+56], eax
        mov      dword ptr [rsi+68], eax
        inc      dword ptr [rsi+60]
 G_M52189_IG05:
        add      rsp, 32
        pop      rsi
        ret      
-G_M52189_IG06:
-       call     CORINFO_HELP_RNGCHKFAIL
-       int3     
-; Total bytes of code 98, prolog size 8 for method Dictionary`2:Clear():this
+; Total bytes of code 88, prolog size 8 for method Dictionary`2:Clear():this

SPMI diffs report 4.06% codesize improvement across 1777 affected methods.

jit-diff results (checked jit against release full diff set):

Summary:
(Note: Lower is better)

Total bytes of diff: -1265 (-0.01 % of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
       -1114 : System.Private.CoreLib.dasm (-0.03 % of base)
         -43 : Microsoft.CSharp.dasm (-0.01 % of base)
         -27 : System.Text.RegularExpressions.dasm (-0.03 % of base)
         -21 : System.Security.Principal.Windows.dasm (-0.06 % of base)
         -16 : System.Security.Cryptography.Cng.dasm (-0.03 % of base)

9 total files with size differences (9 improved, 0 regressed).

Top method improvements by size (bytes):
        -406 : System.Private.CoreLib.dasm - Dictionary`2:OnDeserialization(ref):this (29 methods)
        -290 : System.Private.CoreLib.dasm - Dictionary`2:Clear():this (29 methods)
        -290 : System.Private.CoreLib.dasm - Dictionary`2:Initialize(int):this (29 methods)
         -18 : System.Text.RegularExpressions.dasm - Match:Reset(ref,ref,int,int,int):this
         -15 : System.Private.CoreLib.dasm - GregorianCalendarHelper:GetEra(struct):int:this

31 total methods with size differences (31 improved, 0 regressed).

Throughput results within noise levels. cqNgenTP lab run reports:

Benchmark delta (positive => improvement)
mscorlib-amd64 0.77%
System.Data.Entity 3.40%
System.Windows.Forms -0.71%
System.Design -0.89%
System.Web-amd64 -1.27%
System.ServiceModel -1.17%
geomean 0.009%

Modify CSE to identify compares which are functions of CSE candidate array
length nodes; when those length nodes get CSEd and consequently have their
value numbers updated, also update the value number on the compare
consuming the array length; this way the assertions generated in assertion
prop for the different compares on the CSEd array will use the same value
numbers, and range check elimination becomes more effective.

Resolves #5371
// Arguments:
// compare - The compare node to check

void Compiler::updateCseArrLenMap(GenTreePtr compare)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be named optCSEUpdateArrLenMap or something similar?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named the field cseArrLenMap. I suppose I could rename it to optCseArrLenMap and this method as you suggest, but I don't really understand what the benefit is... as a general rule, are we continuing this prefix naming stuff with new code?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of such prefixes but considering that Compiler has a zillion members maybe it's better to prefix stuff. The whole thing looks like a ton of C code dumped into a C++ class...

VNFuncApp cmpVNFuncApp;

if (!vnStore->GetVNFunc(compareVN, &cmpVNFuncApp) ||
(cmpVNFuncApp.m_func != GetVNFuncForOper(compare->OperGet(), (compare->gtFlags & GTF_UNSIGNED) != 0)))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(compare->gtFlags & GTF_UNSIGNED) != 0)

Side note: maybe we should add bool gtUnsigned() to GenTree, this stuff is nightmarish. Too bad we don't have unsigned relops.

@mikedn
Copy link

mikedn commented Feb 15, 2017

I'm not familiar enough with JIT VN to comment much on this but it looks like a rather involved change for little benefit. I wonder if it's truly worth it...

// re-numbered with the array length to improve range check elimination

// Given a compare, look for a cse candidate arrlen feeding it and add a map entry if found.
void updateCseArrLenMap(GenTreePtr compare);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the other CSE related methods defined in compiler.h do start with an opt prefix, so it probably best to continue to do so here.

@briansull
Copy link

LGTM

This lines up better with other names in code that optimizes common sub-
expressions.
@JosephTremoulet
Copy link
Author

Updated with the suggested renames and factoring out signedness testing into helper IsUnsigned() (looking through gentree.h, it looked to me like IsFoo() was a more common pattern for naming flag test helpers than gtFoo()).

it looks like a rather involved change for little benefit. I wonder if it's truly worth it...

I'm inclined to check it in; the paranoia checks make the code a bit verbose, but that's mostly confined to a helper, and doesn't seem conceptually very complex. Throughput-wise, it fires sparsely enough (statements with comparisons and CSE candidate array lengths) to not seem worrisome, and loops walking through arrays are right at the top of our list of things that are worth spending compile-time to optimize...

@JosephTremoulet JosephTremoulet merged commit 5c7dcce into dotnet:master Feb 17, 2017
@JosephTremoulet JosephTremoulet deleted the UpdateCompareVN branch February 17, 2017 19:40
@mikedn
Copy link

mikedn commented Feb 17, 2017

and loops walking through arrays are right at the top of our list of things that are worth spending compile-time to optimize...

Of course but sometimes I'm getting the impression that range checks are unfairly targeted. But I've seen cases where the lack of induction variable widening optimizations can be quite costly. Or the lack of proper BB layout in loops. And sometimes range checks can be quite bad but not because of the cost of the actual instructions they generate but because of the impact they have on the surrounding IR/code (e.g. they introduce additional lclvars).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants