-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Create RefPositions without TreeNodeInfo #16517
Conversation
This results in the following throughput improvements (crossgen of SPC.dll) as measured by pin instruction count:
|
@dotnet/jit-contrib PTAL (for preliminary feedback) |
src/jit/lsraxarch.cpp
Outdated
} | ||
#endif | ||
break; | ||
RefPosition* def = BuildDef(tree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend to use the def
variable for anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; at one point I had thought I would need to keep the defs around, but realized it was not necessary.
src/jit/lsraxarch.cpp
Outdated
assert(info->dstCount == 1); | ||
srcCount = 0; | ||
assert(dstCount == 1); | ||
dstCandidates = RBM_NONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placing this in a "else" ifdef would be better IMO
src/jit/lsraxarch.cpp
Outdated
// to special case them. | ||
// These tree nodes will have their op1 marked as isDelayFree=true. | ||
// That is, op1's reg remains in use until the subsequent instruction. | ||
GenTree* addr = tree->gtOp.gtOp1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using functions like gtGetOp1
and gtGetOp2
consistently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do; I was programmed to avoid it back before we added gtGetOp2IfPresent()
, and I guess I'm a creature of habit.
dstCount = 1; | ||
if (!data->isContained()) | ||
{ | ||
RefPosition* dataUse = dataUse = BuildUse(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's something odd with this line, dataUse appears twice. And it's not used anywhere anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mistakenly was initially setting delay free on the data, not addr, which was a mistake - no clue about the double assignment, though.
src/jit/lsraxarch.cpp
Outdated
for (; sourceInfo != nullptr; sourceInfo = sourceInfo->Next()) | ||
srcCandidates = allRegs(TYP_INT) & ~RBM_RCX; | ||
dstCandidates = allRegs(TYP_INT) & ~RBM_RCX; | ||
if (tree->IsReverseOp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverse op? Wasn't this removed from LIR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I feel a little silly - I had either forgotten or had missed that this happened. There's code in LSRA that attempts to handle it, but I see that the Rationalizer is clearing it on all nodes, and CheckLIR
is validating it. I added an issue #16528
src/jit/lsraxarch.cpp
Outdated
|
||
assert(!call->isContained()); | ||
info->srcCount = 0; | ||
srcCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant
src/jit/lsraxarch.cpp
Outdated
} | ||
if (!tree->isContained()) | ||
{ | ||
info->srcCount = srcCount; | ||
RefPosition* def = BuildDef(tree, dstCandidates); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable?
src/jit/lsraxarch.cpp
Outdated
bool isUnsignedMultiply = ((tree->gtFlags & GTF_UNSIGNED) != 0); | ||
bool requiresOverflowCheck = tree->gtOverflowEx(); | ||
// Only non-floating point mul has special requirements | ||
if (!varTypeIsFloating(tree->TypeGet())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it consistent with BuildDivMod that does if (is float) { build simple; return; }
? Or move the float check inside the switch case and rename BuildMul and BuildDivMod to BuildIntegerMul and BuildIntegerDivMod?
Note that FP add/sub/mul/div use the same instruction format but they're handled more or less differently throughout lower/ra/codegen. I have a PR to clean this up but I was waiting for your lsra refactoring to finish it.
src/jit/lsrabuild.cpp
Outdated
reinterpret_cast<LocationInfoListNode*>(compiler->compGetMem(preallocateSize, CMK_LSRA)); | ||
size_t preallocateSize = sizeof(RefInfoListNode) * preallocate; | ||
RefInfoListNode* preallocatedNodes = | ||
reinterpret_cast<RefInfoListNode*>(compiler->compGetMem(preallocateSize, CMK_LSRA)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing code but note that casting from void*
to X*
is normally done via static_cast
. reinterpret_cast
is usually reserved for "weird" stuff such as casting int*
to float*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change
@mikedn - thanks so much for the quick and thorough review; it's really appreciated! Clearly I need to fix some bugs ;-) and will incorporate your feedback in the next round. |
Far from thorough, I've only took a quick look in the morning and then went to work. Maybe I'll take another look this evening :) |
src/jit/lsrabuild.cpp
Outdated
} | ||
|
||
//------------------------------------------------------------------------ | ||
// getKillSetForMul: Determine the liveness kill set for a mod or div node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong function name in comment. Applies to subsequent functions as well.
case GT_MUL_LONG: | ||
#endif | ||
killMask = RBM_RAX | RBM_RDX; | ||
killMask = getKillSetForMul(tree->AsOp()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an issue with this particular line per se but this whole function seemed a bit strange to me when I looked at it in the past. I would have expected BuildNode
/TreeNodeInfoInit
to somehow deal with this instead of having this whole switch here, it looks like a special case.
Looking now through this function's uses I see that one is likely redundant: buildUpperVectorSaveRefPositions
is only called from BuildDefsWithKills
and this one has a killMask that's supposed to be the same as the one returned by getKillSetForNode
due to an assert.
And then there's the said assert in BuildDefsWithKills
. Does this function actually needs the killMask
parameter?
Apparently the only "real" use of getKillSetForNode
is in some stress related code in buildRefPositionsForNode
. Maybe there's should be another way to communicate the kill mask from build node (e.g. via a class member similar to pendingDelayFree
)?
This may then allow moving functions like getKillSetForMul
to where they really belong - lsraxarch.cpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to maintain the kill mask as a class member. I'm making the general getKillSetForNode
DEBUG
-only; it will only be used in the assert and in the stress modes where we constrain nodes.
src/jit/lsrabuild.cpp
Outdated
#ifdef _TARGET_XARCH_ | ||
if (tgtPrefUse != nullptr) | ||
{ | ||
defRefPosition->getInterval()->assignRelatedIntervalIfUnassigned(tgtPrefUse->getInterval()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't defRefPosition->getInterval()
same as interval
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is.
src/jit/lsrabuild.cpp
Outdated
RefPosition* useRefPos = newRefPosition(interval, currentLoc, RefTypeUse, operand, candidates, multiRegIdx); | ||
if (regOptional) | ||
{ | ||
useRefPos->setAllocateIfProfitable(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better performance wise to always call setAllocateIfProfitable, always setting that bit might very well be cheaper than a branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
// | ||
void LinearScan::BuildSimple(GenTree* tree) | ||
int LinearScan::BuildSimple(GenTree* tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function really useful? For example, I look at xarch build code and I see that it handles GT_CNS_INT, GT_CND_DBL, GT_CNS_LNG (huh? looks like this one shouldn't reach LSRA due to decomposition) so attempting to handle GTK_CONST in BuildSimple is probably useless.
In general, BuildNode
's switch already handles a ton of opers, I might be better (from a performance point of view and even for clarity) to add whatever opers are missing to those BuildNode
switches and make default case unreached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable, but probably not something I want to undertake with this PR.
src/jit/lsra.h
Outdated
{ | ||
#ifdef _TARGET_XARCH_ | ||
RefPosition* tgtPrefUse = nullptr; | ||
if (node->OperIsBinary() && isRMWRegOper(node)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to BuildSimple
, it's not clear if this kind of special casing in a what's supposed to be general purpose function is useful.
It may be better to simply call BuildRMWUses
directly from BuildNode
's switch relevant cases (e.g. case GT_XOR
) rather than calling BuildBinaryUses
from many places and having to decide again what to do.
src/jit/lsraxarch.cpp
Outdated
int dstCount = 0; | ||
regMaskTP dstCandidates = RBM_NONE; | ||
regMaskTP killMask = RBM_NONE; | ||
pendingDelayFree = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These member assignments should be visually separated from the local variable above, I'd move them above locals and add a blank line. Or, perhaps they should not even be in BuildNode
but in its caller.
Also, the need for these data members is a bit unfortunate. It's kind of difficult to track down where they are assigned values and where they are used, the delay free ones in particular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that they are somewhat unfortunate; I tried to come up with an approach that didn't require these data members on LinearScan
, but could only come up with models that involved excessive passing around of state.
I'd prefer to keep them in BuildNode
, however, as that's the main entry point for all the building functionality. I'll separate them and document them more clearly.
assert(!tree->IsUnusedValue() || (dstCount != 0)); | ||
assert(dstCount == tree->GetRegisterDstCount()); | ||
INDEBUG(dumpNodeInfo(tree, dstCandidates, srcCount, dstCount)); | ||
return srcCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the return value of this function is used only in an assert. I hope that assert is worth all the added complication :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - it's something I think should probably be removed, but I found it useful for debugging and comparison as I was making the transition from TreeNodeInfo
src/jit/lsraxarch.cpp
Outdated
isLocalDefUse = true; | ||
tree->SetUnusedValue(); | ||
} | ||
RefPosition* def = BuildDef(tree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vertical alignment of all assignments is the worst formatting rule the JIT code uses...
src/jit/lsraxarch.cpp
Outdated
#endif // _TARGET_X86_ | ||
BuildDef(tree, dstCandidates); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just pass RBM_BYTE_REGS
directly? It's RBM_ALLINT
on x64 so it should do the right thing.
src/jit/lsraxarch.cpp
Outdated
int srcCount = 0; | ||
int internalCount = 0; | ||
const int MaxInternalCount = 4; | ||
RefPosition* internalDefs[MaxInternalCount]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used it seems. Same for internalCount above. Strange that they have the same name as 2 LinearScan data members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; I had originally added these to each method needing internal temps, but it was quite a bit messier than handling it pseudo-globally. I'll delete these.
src/jit/lsraxarch.cpp
Outdated
bool hasMultiRegRetVal = false; | ||
ReturnTypeDesc* retTypeDesc = nullptr; | ||
RefPosition* internalDefs[MAX_ARG_REG_COUNT]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is populated in the code below but then it doesn't appear to be used.
src/jit/lsraxarch.cpp
Outdated
{ | ||
return; | ||
} | ||
int srcCount = BuildRMWUses(tree->AsOp()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, calling BuildRMWUses
without calling isRMWRegOper
that has special handling for GT_MUL
?
Hmm, I left a few comments related to RMW but I'm not sure I'm seeing the big picture, there may be a few issues that need to be considered:
Now, neither issue is directly related to this PR but you may want to consider the impact they have on your refactoring. It may be that |
// Example1: GT_EQ(int, op1 of type ubyte, op2 of type ubyte) - in this case codegen uses | ||
// ubyte as the result of comparison and if the result needs to be materialized into a reg | ||
// simply zero extend it to TYP_INT size. Here is an example of generated code: | ||
// cmp dl, byte ptr[addr mode] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setcc
instruction is missing from the example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm not sure why "in this case codegen uses ubyte as the result of comparison and if the result needs to be materialized into a reg simply zero extend it to TYP_INT
size" appears here, this comment is more suitable where dstCandidates
is set.
// Example2: GT_EQ(int, op1 of type ubyte, op2 is GT_CNS_INT) - in this case codegen uses | ||
// ubyte as the result of the comparison and if the result needs to be materialized into a reg | ||
// simply zero extend it to TYP_INT size. | ||
else if (varTypeIsByte(op1) && op2->IsCnsIntOrI()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this case added? Codegen should not attempt to generate a byte
instruction unless both operands are byte
. If the constant isn't a byte
then an int
compare instruction should be generated even if the first operand is a byte
, it's supposed to have been extended to int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These cases where copied from LinearScan::ExcludeNonByteableRegisters()
, previously in lsraxarch.cpp. It may be that these can be improved/eliminated, but I am aiming for zero-diffs for now.
src/jit/lsraxarch.cpp
Outdated
assert(!sourceLo->isContained() && !sourceHi->isContained()); | ||
RefPosition* sourceLoUse = BuildUse(sourceLo, srcCandidates); | ||
RefPosition* sourceHiUse = BuildUse(sourceHi, srcCandidates); | ||
|
||
if (tree->OperGet() == GT_LSH_HI) | ||
if (!tree->isContained()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, when is a shift node contained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A shift node can be contained under a STOREIND
when it is part of a memory RMW op. In most cases this is just handled in BuildIndir
, but shift ops have special register requirements.
src/jit/lsraxarch.cpp
Outdated
unsigned size = blkNode->gtBlkSize; | ||
GenTree* source = blkNode->Data(); | ||
int srcCount = 0; | ||
int internalCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this have been removed?
pgosupport.cmake
Outdated
@@ -30,8 +30,8 @@ function(add_pgo TargetName) | |||
# If we don't have profile data availble, gracefully fall back to a non-PGO opt build | |||
if(EXISTS ${ProfilePath}) | |||
if(WIN32) | |||
# set_property(TARGET ${TargetName} APPEND_STRING PROPERTY LINK_FLAGS_RELEASE " /LTCG /USEPROFILE:PGD=${ProfilePath}") | |||
# set_property(TARGET ${TargetName} APPEND_STRING PROPERTY LINK_FLAGS_RELWITHDEBINFO " /LTCG /USEPROFILE:PGD=${ProfilePath}") | |||
set_property(TARGET ${TargetName} APPEND_STRING PROPERTY LINK_FLAGS_RELEASE " /LTCG /USEPROFILE:PGD=${ProfilePath}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably you wanted to build without PGO and accidentally checked in the commented out lines? There's a build.cmd if you need that - -nopgooptimize
.
e1d63a4
to
238a4dc
Compare
@dotnet/jit-contrib ping. |
This was fixed recently @dotnet-bot test Windows_NT x64_arm64_altjit Checked jitstressregs0x1000 |
The 'x64_arm64_altjit Checked jitstressregs3' failures also occur in master. |
@dotnet-bot test Windows_NT x64_arm64_altjit Checked jitstressregs3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple minor things noticed so far
src/jit/lsra.h
Outdated
@@ -82,118 +82,117 @@ inline regMaskTP calleeSaveRegs(RegisterType rt) | |||
//------------------------------------------------------------------------ | |||
// LocationInfo: Captures the necessary information for a node that is "in-flight" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change type name in comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
src/jit/lsrabuild.cpp
Outdated
} | ||
prevListNode = listNode; | ||
} | ||
assert(!"GetRefPosition didn't find the node"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert has wrong function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/jit/lsrabuild.cpp
Outdated
} | ||
prevListNode = listNode; | ||
} | ||
assert(!"GetRefPosition didn't find the node"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert has wrong function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// | ||
// Return Value: a register mask of the registers killed | ||
// | ||
regMaskTP LinearScan::getKillSetForProfilerHook() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should be named getKillSetForProfilerHookTailcall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that the node name is GT_PROF_HOOK
, so this seems like the better name to me.
src/jit/gentree.h
Outdated
static bool OperIsMul(genTreeOps gtOper) | ||
{ | ||
return (gtOper == GT_MUL) || (gtOper == GT_MULHI) | ||
#if !defined(_TARGET_64BIT_) && !defined(LEGACY_BACKEND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a conflict with @BruceForstall change to remove LEGACY_BACKEND
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are probably more conflicts as well; that's how it goes ;-)
src/jit/hwintrinsicArm64.cpp
Outdated
|
||
if (op2 != nullptr) | ||
{ | ||
return 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to add:
assert(!op1->OperIsList());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the scenarios where op1->OperIsList()
and op2 != nullptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if op1
is a list, op2
must be null.
src/jit/hwintrinsicArm64.cpp
Outdated
return 2; | ||
} | ||
|
||
if (op1 != nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This would be easier to read (less nesting) if we did:
if (op1 == nullptr)
{
return 0;
}
if (op1->OperIsList())
{
// logic
}
else
{
return 1;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to account for op2
, but will restructure.
src/jit/lsraxarch.cpp
Outdated
@@ -2263,9 +2290,8 @@ void LinearScan::BuildSIMD(GenTreeSIMD* simdTree) | |||
// Return Value: | |||
// None. | |||
|
|||
void LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree) | |||
int LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned up this method here: #18078
It might be worthwhile merging the PRs together (some of the changes were fixing "correctness" issues I found, and the other changes are required to support the FMA instruction set, where any of the three operands can be contained).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes I did to support FMA are here: https://github.com/tannergooding/coreclr/commit/e402548087f2bf9554e83052911561c016c60346#diff-498fd4859cb147c9bf082f5ebf32ca8fR2507
(The PR isn't up as it requires the CoreFX change adding back the unimplemented APIs to the ref assembly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worthwhile merging the PRs together
I think it might be easier to serialize them.
src/jit/hwintrinsicArm64.cpp
Outdated
} | ||
|
||
assert(numArgs > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be: assert(numArgs >= 3)
or are there cases where we (inefficiently) use a list for less than 2 args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments.
src/jit/lsrabuild.cpp
Outdated
{ | ||
info->srcCount = appendBinaryLocationInfoToList(tree->AsOp()); | ||
assert((kind & GTK_SMPOP) != 0); | ||
int srcCount = BuildBinaryUses(tree->AsOp()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition hides the enclosing one; that seems wrong, or at least confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. (This default is rarely used, and @mikedn recommended we eliminate it, which is probably a good idea, but perhaps for another day.)
} | ||
#endif // DEBUG | ||
int newDefListCount = defList.Count(); | ||
int produce = newDefListCount - oldDefListCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the only use (and assert using) produce is for amd64? Should you define this below, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert was pre-existing, though before there were other uses of produce
. However, I'd like to strengthen this assert to apply more broadly, but it requires some cleanup of the various nodes that produce multiple results. I've added a note to issue #13183 that deals with multi-reg nodes to strengthen this assert as well.
{ | ||
internalCandidates = allRegs(TYP_INT); | ||
} | ||
printf(" +<TreeNodeInfo %d=%d %di %df", dstCount, srcCount, internalIntCount, internalFloatCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are currently still many references in comments to the TreeNodeInfo struct -- and the type itself is still there. But the type is unused. Can you update the comments, and remove the references and type?
src/jit/lsraarm.cpp
Outdated
@@ -166,73 +184,99 @@ void LinearScan::BuildShiftLongCarry(GenTree* tree) | |||
// requirements needed by LSRA to build the Interval Table (source, | |||
// destination and internal [temp] register counts). | |||
// | |||
void LinearScan::BuildNode(GenTree* tree) | |||
int LinearScan::BuildNode(GenTree* tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type isn't described in the comments above
} | ||
|
||
//------------------------------------------------------------------------ | ||
// GetOperandInfo: Get the source registers for an operand that might be contained. | ||
// BuildDef: Build one or more RefTypeDef RefPositions for the given node, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name in comment doesn't match function name
} | ||
|
||
//------------------------------------------------------------------------ | ||
// GetOperandInfo: Get the source registers for an operand that might be contained. | ||
// BuildDef: Build one or more RefTypeDef RefPositions for the given node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name in comment doesn't match function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I've only had a few minor comments, but overall it looks good to me.
@dotnet-bot test Windows_NT x64 Formatting |
This is no longer used after dotnet#16517
@CarolEidt We need to update ryujit-overview.md that still refers to |
@erozenfeld - thanks for pointing that out. I'm going to open an issue and assign it to myself so that it doesn't get lost. |
This is no longer used after dotnet/coreclr#16517 Commit migrated from dotnet/coreclr@5d01d2d
This is the next phase of building RefPositions incrementally.
The big pictures is that, instead of creating
TreeNodeInfo
with the register requirements for each node, theBuild
methods inLinearScan
build theRefPosition
s directly, putting the defs in aDefList
for when the consuming node builds the corresponding uses.There are zero diffs for crossgen of frameworks & tests across all the x64 & x86 + altjits (arm64, arm and x64/ux), aside from a small number of improvements due to some RMW handling changes.