Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace a load with cheaper mov instruction when possible #83458

Merged
merged 1 commit into from
Mar 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 80 additions & 4 deletions src/coreclr/jit/emitarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16183,7 +16183,7 @@ bool emitter::IsRedundantLdStr(
// ins - The instruction code
// reg1Attr - The emit attribute for register 1
// reg1 - Register 1
// reg2 - Encoded register 2
// reg2 - Register 2
// imm - Immediate offset, prior to scaling by operand size
// size - Operand size
// fmt - Instruction format
Expand All @@ -16194,9 +16194,6 @@ bool emitter::IsRedundantLdStr(
bool emitter::ReplaceLdrStrWithPairInstr(
instruction ins, emitAttr reg1Attr, regNumber reg1, regNumber reg2, ssize_t imm, emitAttr size, insFormat fmt)
{
// Register 2 needs conversion to unencoded value.
reg2 = encodingZRtoSP(reg2);

RegisterOrder optimizationOrder = IsOptimizableLdrStrWithPair(ins, reg1, reg2, imm, size, fmt);

if (optimizationOrder != eRO_none)
Expand Down Expand Up @@ -16367,4 +16364,83 @@ emitter::RegisterOrder emitter::IsOptimizableLdrStrWithPair(
return optimisationOrder;
}

//-----------------------------------------------------------------------------------
// IsOptimizableLdrToMov: Check if it is possible to optimize a second "ldr"
// instruction into a cheaper "mov" instruction.
//
// Examples: ldr w1, [x20, #0x10]
// ldr w2, [x20, #0x10] => mov w1, w2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be mov w2, w1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll fix this.

//
// Arguments:
// ins - The instruction code
// reg1 - Register 1 number
// reg2 - Register 2 number
// imm - Immediate offset, prior to scaling by operand size
// size - Operand size
// fmt - Instruction format
//
// Return Value:
// true - Optimization of the second instruction is possible
//
bool emitter::IsOptimizableLdrToMov(
instruction ins, regNumber reg1, regNumber reg2, ssize_t imm, emitAttr size, insFormat fmt)
{
if (ins != INS_ldr)
{
// This instruction is not an "ldr" instruction.
return false;
}

if (ins != emitLastIns->idIns())
{
// Not successive "ldr" instructions.
return false;
}

regNumber prevReg1 = emitLastIns->idReg1();
regNumber prevReg2 = emitLastIns->idReg2();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prevReg2 here is encoded. Doesn't it need prevReg2 = encodingZRtoSP(prevReg2)?

(Did you see any diffs with a base register of SP?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is pulled up in the OptimizeLdrStr().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check you refer to, reg2 = encodingZRtoSP(reg2);, only applies to the reg2 argument. The prevReg2 here is still in the encoded form, meaning if reg2 == SP (after decoding) it will never match prevReg2 == ZR (before decoding).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BruceForstall - Does that mean that we would have lost some opportunities because of not encoding it to SP?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now I am confused - why we have to do the encodingZRtoSP(reg2) for this optimization?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BruceForstall - Does that mean that we would have lost some opportunities because of not encoding it to SP?

That's my thinking. However, when I add prevReg2 = encodingZRtoSP(prevReg2), there are no diffs. If I add assert(prevReg2 != REG_ZR) there are hits, so obviously there are cases where we are incorrectly comparing against ZR instead of SP. There might not be any opportunities due to the way we generate code.

Actually, now I am confused - why we have to do the encodingZRtoSP(reg2) for this optimization?

The problem is that we store the encoded register value in the instrDesc -- and arm64 encodings sometimes encode SP as ZR in the instruction encoding bits.

It's possibly unfortunate that we (long ago) decided to store the encoded form into igReg1()/idReg2()/etc. An alternative would be to store the "actual" register in the instrDesc and do the SP->ZR encoding when we're building the instruction bits. This would eliminate the need to "un-encode" which we do in the emitDispIns code path (to display/output the textual instruction). And it would simplify thinking about peephole optimizations (which were not considered when this was designed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to store the "actual" register in the instrDesc

Unfortunately, REG_SP is 64 and REGNUM_BITS is 6, so we don't have enough bits currently to store REG_SP in idReg1/idReg2/etc. Bumping REGNUM_BITS to 7 just to allow this might have negative effects on memory usage or other instrDesc effects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @BruceForstall, I'll add a new PR addressing the comments 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #83886

insFormat lastInsFmt = emitLastIns->idInsFmt();
emitAttr prevSize = emitLastIns->idOpSize();
ssize_t prevImm = emitGetInsSC(emitLastIns);

if ((reg2 != prevReg2) || !isGeneralRegisterOrSP(reg2))
{
// The "register 2" should be same as previous instruction and
// should either be a general register or stack pointer.
return false;
}

if (prevImm != imm)
{
// Then we are loading from a different immediate offset.
return false;
}

if (!isGeneralRegister(reg1) || !isGeneralRegister(prevReg1))
{
// Either register 1 or previous register 1 is not a general register
// or the zero register, so we cannot optimise.
Comment on lines +16421 to +16422
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is wrong (or confusing) since isGeneralRegister doesn't include ZR. (So we don't optimize if one is ZR, which makes sense)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll change this. Should I coalesce these changes with another PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have an additional PR coming, it's fine to add the comment changes to that PR.

return false;
}

if (lastInsFmt != fmt)
{
// The formats of the two instructions differ.
return false;
}

if (prevReg1 == prevReg2)
{
// Then the previous load overwrote the register that we are indexing against.
return false;
}

if (prevSize != size)
{
// Operand sizes differ.
return false;
}

return true;
}
#endif // defined(TARGET_ARM64)
11 changes: 11 additions & 0 deletions src/coreclr/jit/emitarm64.h
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ RegisterOrder IsOptimizableLdrStrWithPair(
instruction ins, regNumber reg1, regNumber reg2, ssize_t imm, emitAttr size, insFormat fmt);
bool ReplaceLdrStrWithPairInstr(
instruction ins, emitAttr reg1Attr, regNumber reg1, regNumber reg2, ssize_t imm, emitAttr size, insFormat fmt);
bool IsOptimizableLdrToMov(instruction ins, regNumber reg1, regNumber reg2, ssize_t imm, emitAttr size, insFormat fmt);

// Try to optimize a Ldr or Str with an alternative instruction.
inline bool OptimizeLdrStr(instruction ins,
Expand All @@ -156,6 +157,9 @@ inline bool OptimizeLdrStr(instruction ins,
return true;
}

// Register 2 needs conversion to unencoded value for following optimisation checks.
reg2 = encodingZRtoSP(reg2);
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved

// If the previous instruction was a matching load/store, then try to replace it instead of emitting.
// Don't do this if either instruction had a local variable.
if ((emitLastIns->idIns() == ins) && !localVar && !emitLastIns->idIsLclVar() &&
Expand All @@ -164,6 +168,13 @@ inline bool OptimizeLdrStr(instruction ins,
return true;
}

// If we have a second LDR instruction from the same source, then try to replace it with a MOV.
if (IsOptimizableLdrToMov(ins, reg1, reg2, imm, size, fmt))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this creates a GC hole? Should this be gated by a !localVar check, or even better emitIns_Mov variant that handles varx/offs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see gc liveness changing in the diffs so probably it should be fine.

image

Here, in 1st example, x1 is reported to have gc value before and after. Same in other places.

I will still run gcstress now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the gcstress pipelines are clean.

{
emitIns_Mov(INS_mov, reg1Attr, reg1, emitLastIns->idReg1(), true);
return true;
}

return false;
}

Expand Down