-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Update array length compare value numbers on CSE #9615
Update array length compare value numbers on CSE #9615
Conversation
|
@briansull @dotnet/jit-contrib PTAL. This gets the motivating case from #5371. All diffs are bounds checks removed from loops from which we'd already hoisted the array length, as in this example (System.Dictionary.Clear): --- ".\\DictionaryClear-before.asm" 2017-02-15 11:21:33.158935800 -0800
+++ ".\\DictionaryClear-after.asm" 2017-02-15 11:22:02.755138400 -0800
@@ -1,54 +1,49 @@
; Assembly listing for method Dictionary`2:Clear():this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
; V00 this [V00,T03] ( 23, 16.5) ref -> rsi this
-; V01 loc0 [V01,T01] ( 13, 41.5) int -> rcx
+; V01 loc0 [V01,T01] ( 12, 40.5) int -> rcx
; V02 tmp0 [V02,T00] ( 6, 48 ) ref -> rax
; V03 OutArgs [V03 ] ( 1, 1 ) lclBlk (32) [rsp+0x00]
-; V04 cse0 [V04,T02] ( 15, 25 ) int -> rdx
+; V04 cse0 [V04,T02] ( 14, 24 ) int -> rdx
; V05 cse1 [V05,T04] ( 11, 12.5) ref -> r8
;
; Lcl frame size = 32
G_M52189_IG01:
push rsi
sub rsp, 32
mov rsi, rcx
G_M52189_IG02:
cmp dword ptr [rsi+56], 0
jle SHORT G_M52189_IG05
xor ecx, ecx
mov r8, gword ptr [rsi+8]
mov edx, dword ptr [r8+8]
test edx, edx
jle SHORT G_M52189_IG04
G_M52189_IG03:
mov rax, r8
- cmp ecx, edx
- jae SHORT G_M52189_IG06
movsxd r9, ecx
mov dword ptr [rax+4*r9+16], -1
inc ecx
cmp edx, ecx
jg SHORT G_M52189_IG03
G_M52189_IG04:
mov rcx, gword ptr [rsi+16]
mov r8d, dword ptr [rsi+56]
xor edx, edx
call Array:Clear(ref,int,int)
mov dword ptr [rsi+64], -1
xor eax, eax
mov dword ptr [rsi+56], eax
mov dword ptr [rsi+68], eax
inc dword ptr [rsi+60]
G_M52189_IG05:
add rsp, 32
pop rsi
ret
-G_M52189_IG06:
- call CORINFO_HELP_RNGCHKFAIL
- int3
-; Total bytes of code 98, prolog size 8 for method Dictionary`2:Clear():this
+; Total bytes of code 88, prolog size 8 for method Dictionary`2:Clear():thisSPMI diffs report 4.06% codesize improvement across 1777 affected methods. jit-diff results (checked jit against release full diff set): Throughput results within noise levels. cqNgenTP lab run reports:
|
Modify CSE to identify compares which are functions of CSE candidate array length nodes; when those length nodes get CSEd and consequently have their value numbers updated, also update the value number on the compare consuming the array length; this way the assertions generated in assertion prop for the different compares on the CSEd array will use the same value numbers, and range check elimination becomes more effective. Resolves #5371
10e0f1c to
04e0c4c
Compare
src/jit/optcse.cpp
Outdated
| // Arguments: | ||
| // compare - The compare node to check | ||
|
|
||
| void Compiler::updateCseArrLenMap(GenTreePtr compare) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be named optCSEUpdateArrLenMap or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named the field cseArrLenMap. I suppose I could rename it to optCseArrLenMap and this method as you suggest, but I don't really understand what the benefit is... as a general rule, are we continuing this prefix naming stuff with new code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of such prefixes but considering that Compiler has a zillion members maybe it's better to prefix stuff. The whole thing looks like a ton of C code dumped into a C++ class...
src/jit/optcse.cpp
Outdated
| VNFuncApp cmpVNFuncApp; | ||
|
|
||
| if (!vnStore->GetVNFunc(compareVN, &cmpVNFuncApp) || | ||
| (cmpVNFuncApp.m_func != GetVNFuncForOper(compare->OperGet(), (compare->gtFlags & GTF_UNSIGNED) != 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(compare->gtFlags & GTF_UNSIGNED) != 0)
Side note: maybe we should add bool gtUnsigned() to GenTree, this stuff is nightmarish. Too bad we don't have unsigned relops.
|
I'm not familiar enough with JIT VN to comment much on this but it looks like a rather involved change for little benefit. I wonder if it's truly worth it... |
src/jit/compiler.h
Outdated
| // re-numbered with the array length to improve range check elimination | ||
|
|
||
| // Given a compare, look for a cse candidate arrlen feeding it and add a map entry if found. | ||
| void updateCseArrLenMap(GenTreePtr compare); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the other CSE related methods defined in compiler.h do start with an opt prefix, so it probably best to continue to do so here.
|
LGTM |
This lines up better with other names in code that optimizes common sub- expressions.
|
Updated with the suggested renames and factoring out signedness testing into helper
I'm inclined to check it in; the paranoia checks make the code a bit verbose, but that's mostly confined to a helper, and doesn't seem conceptually very complex. Throughput-wise, it fires sparsely enough (statements with comparisons and CSE candidate array lengths) to not seem worrisome, and loops walking through arrays are right at the top of our list of things that are worth spending compile-time to optimize... |
Of course but sometimes I'm getting the impression that range checks are unfairly targeted. But I've seen cases where the lack of induction variable widening optimizations can be quite costly. Or the lack of proper BB layout in loops. And sometimes range checks can be quite bad but not because of the cost of the actual instructions they generate but because of the impact they have on the surrounding IR/code (e.g. they introduce additional lclvars). |
…pareVN Update array length compare value numbers on CSE Commit migrated from dotnet/coreclr@5c7dcce
Modify CSE to identify compares which are functions of CSE candidate array
length nodes; when those length nodes get CSEd and consequently have their
value numbers updated, also update the value number on the compare
consuming the array length; this way the assertions generated in assertion
prop for the different compares on the CSEd array will use the same value
numbers, and range check elimination becomes more effective.
Resolves #5371