-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
a4a2204
to
512b6ca
Compare
x64:
x86:
arm64:
arm32:
Not sure what Microsoft.Diagnostics.Tracing.TraceEvent is doing. Presumably it has a secret plan to take over the world by copying itself tens of thousands of times :) |
c3a516f
to
6377fdd
Compare
6377fdd
to
f735ccc
Compare
4461b9b
to
db5d596
Compare
448d186
to
be19099
Compare
81dde33
to
49dd036
Compare
2985116
to
1b99f2d
Compare
801a7d6
to
04544c7
Compare
3cf090a
to
698b556
Compare
4b694f9
to
de44e29
Compare
e440d2f
to
c47445e
Compare
14604e7
to
44466c2
Compare
2d91c84
to
65e840a
Compare
65e840a
to
e00ac4f
Compare
Whew, after a bunch of cleanup PRs I think this is done now. There are still CQ improvements that could be done but that's all I have at the moment. |
Could you please update the header with was is left in this PR? |
Right, it was still mentioning stuff done in other PRs or stuff that wasn't done at the time of its writing but that now is. |
/azp run coreclr-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with 1 question.
} | ||
#endif | ||
|
||
if (!IsSafeToContainMem(blkNode, addr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the description of IsSafeToContainMem
it is not obvious that it allows non-immediate children, but from the code it is obvious that it does so calls like ContainBlockStoreAddress(blkNode, size, src->AsIndir()->Addr());
are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the parent/child naming doesn't make a lot of sense for IsSafeToContainMem
. This more or less checks if it's safe to move a "childNode" forward to the "parentNode" point. The parent/child appeared because it was used for containment.
return; | ||
} | ||
|
||
if (!addr->OperIs(GT_ADD) || addr->gtOverflow() || !addr->AsOp()->gtGetOp2()->OperIs(GT_CNS_INT)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why TryCreateAddrMode
can't be used here as it happens on XARCH?
I see that it does different checks, but it is unclear if this case is unique and other calls to TryCreateAddrMode
on arm are correct or all other calls to ``TryCreateAddrModeneed to be changed to check
if ((size >= 2 * REGSIZE_BYTES) && (offset % REGSIZE_BYTES != 0))` for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The address mode situation on ARM is pretty messy - TryCreateAddrMode
(well, genCreateAddrMode
really), was written for x86 and despite having tons of ifdefs in it (and I've rarely seen such an abuse of ifdefs) still generates address modes that aren't really valid on ARM and relies on the emitter to sort things out - for example:
Line 11823 in 5a381e7
if (addr->isContained()) |
And the emitter sorts things out only in certain cases (if you use emitInsLoadStoreOp
and that only works with GT_IND
nodes, does not support LDP
/STP
, requires temporary registers) so it's not suitable for block init/copy purposes.
So I preferred to avoid TryCreateAddrMode
. It may be possible to use it to create an address mode node and then do extra checks before containing it. But there doesn't seem to be any point in doing that - it's a lot of extra code that needs to run only to produce a LEA
that may not be useful and the original ADD
should have been left alone.
This improved block op codegen by better containment of the source/destination address in the unrolled case:
Enable block init unroll on ARM32 (it was already enabled on ARM64 and the differences between the 2 architectures are trivial).(Done in Enable block init unroll on ARM32 #27450)Fix block copy local address containment on ARM. Block store lowering didn't mark local addresses as contained yet the codegen did containment on its own by ignoring the address computed in a register and instead emitting load/stores using the frame pointer register directly.(Done in Fix block store local address containment on ARM #27338)Examples:
With this change the following code is generated instead:
This also applies to array elements and local variables. For example, initializing a struct local var may currently generate:
After this change the generated code is: