-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Improvements for object stack allocation. #21950
Conversation
x64 PMI diffs with #21944 applied to both base and diff and object stack allocation enabled in both base and diff:
|
No diffs with object stack allocation disabled. |
Some of the regressions are due to stack offset changes after moving allocations from the heap to the stack. For example, on x64 |
src/jit/jitconfigvalues.h
Outdated
@@ -349,7 +349,7 @@ CONFIG_STRING(JitInlineReplayFile, W("JitInlineReplayFile")) | |||
#endif // defined(DEBUG) || defined(INLINE_DATA) | |||
|
|||
CONFIG_INTEGER(JitInlinePolicyModel, W("JitInlinePolicyModel"), 0) | |||
CONFIG_INTEGER(JitObjectStackAllocation, W("JitObjectStackAllocation"), 0) | |||
CONFIG_INTEGER(JitObjectStackAllocation, W("JitObjectStackAllocation"), 1) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not be merge this change. I'd like to run some ci testing with object stack allocation enabled. I'll revert this change before merging.
@AndyAyersMS @echesakovMSFT @dotnet/jit-contrib PTAL |
Example of a good diff ( G_M19268_IG01:
- push rdi
- push rsi
- sub rsp, 40
+ sub rsp, 24
+ xor rax, rax
+ mov qword ptr [rsp+10H], rax
G_M19268_IG02:
- mov rsi, gword ptr [rcx+8]
+ mov rax, gword ptr [rcx+8]
- mov ecx, dword ptr [rsi]
- mov rcx, 0xD1FFAB1E
- call CORINFO_HELP_NEWSFAST
- mov rdi, rax
- lea rcx, bword ptr [rdi+8]
- mov rdx, rsi
- call CORINFO_HELP_ASSIGN_REF
- mov rax, gword ptr [rdi+8]
+ mov rdx, rax
+ mov edx, dword ptr [rdx]
+ xor rdx, rdx
+ lea rcx, bword ptr [rsp+08H]
+ mov qword ptr [rcx], rdx
mov eax, dword ptr [rax+96]
G_M19268_IG03:
- add rsp, 40
+ add rsp, 24
- pop rsi
- pop rdi
-; Total bytes of code 56, prolog size 6 for method BroadcastBlock`1:get_ValueForDebugger():int:this
+; Total bytes of code 38, prolog size 11 for method BroadcastBlock`1:get_ValueForDebugger():int:this
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good -- nice to see it rounding into shape.
Left you a two small notes.
src/jit/objectalloc.cpp
Outdated
{ | ||
tree->ChangeType(newType); | ||
} | ||
lclVarDsc->lvType = newType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest you sink this update out of both then/else and add a JITDUMP message describing the change.
@@ -321,6 +392,8 @@ bool ObjectAllocator::MorphAllocObjNodes() | |||
|
|||
const unsigned int stackLclNum = MorphAllocObjNodeIntoStackAlloc(asAllocObj, block, stmt); | |||
m_HeapLocalToStackLocalMap.AddOrUpdate(lclNum, stackLclNum); | |||
MarkLclVarAsDefinitelyStackPointing(lclNum); | |||
MarkLclVarAsPossiblyStackPointing(lclNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note here and/or elsewhere that the possibly stack pointing set is kept as a superset of the definitely stack pointing set (and a local is in both it is definitely stack pointing).
MarkLclVarAsPossiblyStackPointing(lclNum); | ||
|
||
// Check if this pointer always points to the stack. | ||
if (lclVarDsc->lvSingleDef == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really get around to writing a checker for the early meaning of lvSingleDef
someday. I think the upstream phases maintain it, but ...
45c257d
to
a377055
Compare
@AndyAyersMS I addressed your feedback and added a couple of small fixes for issues found in x86 testing. PTAL. |
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test |
This change enables object stack allocation for more cases. 1. Objects with gc fields can now be stack-allocated. 2. Object stack allocation is enabled for x86. ObjectAllocator updates the types of trees containing references to possibly-stack-allocated objects to TYP_BYREF or TYP_I_IMPL as appropriate. That allows us to remove the hacks in gcencode.cpp and refine reporting of pointers: the pointer is not reported when we can prove that it always points to a stack-allocated object or is null (typed as TYP_I_IMPL); the pointer is reported as an interior pointer when it may point to either a stack-allocated object or a heap-allocated object (typed as TYP_BYREF); the pointer is reported as a normal pointer when it points to a heap-allocated object (typed as TYP_REF). ObjectAllocator also adds flags to indirections: GTF_IND_TGTANYWHERE when the indirection may be the heap or the stack (that results in checked write barriers used for writes) or the new GTF_IND_TGT_NOT_HEAP when the indirection is null or stack memory (that results in no barrier used for writes).
a377055
to
01731f7
Compare
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test |
2 similar comments
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test |
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test |
The Tizen leg is broken (and now, removed) so you might as well ignore it. |
{ | ||
object o = (f1 == 0) ? (object)new SimpleClassB(f1, f2) : (object)new SimpleClassA(f1, f2); | ||
return (o is SimpleClassB) || !(o is SimpleClassA) ? 0 : 1; | ||
GC.Collect(); | ||
return !(o is SimpleClassA) ? 0 : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet there should be a reason why you prefered this over (o is SimpleClassA) ? 1 : 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the result of refactoring the previous version of the test. I didn't bother simplifying this.
@erozenfeld |
We have support for object stack allocation merged but off by default. It can be turned on by setting |
dotnet/runtime#11192 is a tracking issue for object stack allocation. |
@erozenfeld any background to off being the default? Even the simplest version will definitely help F# code which creates a lot more superfluous tuples, optionals and such. |
@NinoFloris if you can point us at specific examples that would be helpful. More broadly, we are looking for some of F# performance tests -- for example, to help us better evaluate some of the tradeoffs in dotnet/runtime#341. |
@NinoFloris The escape analysis is not free in terms of jit speed and we can't justify having it on if it almost never results in actual object stack allocation. And yes, if you have great F# examples where this analysis is sufficient and stack allocation candidates are not passed as arguments, please send them our way. |
I have linked to some cases, generally the F# compiler can automatically inline local functions or functions explicitly marked I'm also hoping to see if I can (and OK'ed to) make some changes in that area in the compiler instead. |
Thank you @NinoFloris ! I'll take a look at these issues sometime next week. |
@erozenfeld |
.NET AOT compilers (crossgen and crossgen2) use the jit to compile individual methods. Neither one currently does whole-program optimizations. We have some long-term plans to look into adding whole-program optimizations to crossgen2 and did some prototyping. Escape analysis is one of the optimizations we will consider. The way I envision this is crossgen2 will do compilation in bottom-up call graph order (callees before callers) and will record escape info for method parameters. That info can be used when compiling callers to determine which args can escape. |
This change enables object stack allocation for more cases:
ObjectAllocator updates the types of trees containing references
to possibly-stack-allocated objects to TYP_BYREF or TYP_I_IMPL as appropriate.
That allows us to remove the hacks in gcencode.cpp and refine reporting of pointers:
the pointer is not reported when we can prove that it always points to a stack-allocated object or is null (typed as TYP_I_IMPL);
the pointer is reported as an interior pointer when it may point to either a stack-allocated object or a heap-allocated object (typed as TYP_BYREF);
the pointer is reported as a normal pointer when it points to a heap-allocated object (typed as TYP_REF).
ObjectAllocator also adds flags to indirections:
GTF_IND_TGTANYWHERE when the indirection may be the heap or the stack
(that results in checked write barriers used for writes)
or the new GTF_IND_TGT_NOT_HEAP when the indirection is null or stack memory
(that results in no barrier used for writes).