Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imul instruction size misprediction #57179

Merged
merged 6 commits into from
Aug 14, 2021
Merged

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Aug 10, 2021

During emitting imul, we wrongly predict the size of instruction as 16 bytes whereas it should be just 15 bytes (which is debatable, see below) comprising of 1 byte REX prefix + 1 byte opcode + 1 byte modR/M + 4 bytes RIP-relative displacement + 4 bytes of immediate. Last row in https://www.felixcloutier.com/x86/imul table.

Now as per Intel docs:

RIP-relative addressing allows specific ModR/M modes to address memory relative to the 64-bit RIP using a signed
32-bit displacement. 

which means the RIP immediate above should be just 4-bytes and the instruction size should be just 11 bytes. But in emitInsSizeCV() we have it calculated as valSize == 8.

When calculating the size, it seems that we double calculate the REX prefix size, once in

size += emitGetRexPrefixSize(ins);

and then further in emitInsSize() -> emitGetPrefixSize() in

if (hasRexPrefix(code))
{
return 1;
}

Because of above conditions, we end up with size of 16 bytes but when we save the size in instrDesc, we cap it to 15 bytes because of #12840.

if (sz > 15)
{
// This is a temporary workaround for non-precise instr size
// estimator on XARCH. It often overestimates sizes and can
// return value more than 15 that doesn't fit in 4 bits _idCodeSize.
// If somehow we generate instruction that needs more than 15 bytes we
// will fail on another assert in emit.cpp: noway_assert(id->idCodeSize() >= csz).
// Issue https://github.com/dotnet/runtime/issues/12840.
sz = 15;
}

Further, we account the wrong size of instruction 16 bytes instead of the one we have in id->CodeSize() of 15 bytes.

id->idCodeSize(sz);
dispIns(id);
emitCurIGsize += sz;
}

Due to this, the emitCurIGSize and hence ig->igOffs becomes off by 1 byte and during emitting, we think that we just over-estimated the instruction by 4 bytes = 15 (id->CodeSize()) - 11 (actual encoding size) but the offsets were calculated assuming the instruction size of 16 bytes. That difference of 1 byte leads to an assert in #57041.

To summarize, there are 3 problems:

  1. Double counting of REX prefix which happens at multiple places.

This is fixed by tracking if we have already counted REX prefix size and if yes, do not include it in emitGetPrefixSize().

  1. Why is valSize > 4 for RIP-immediate . I have added an assert in this draft PR to see if there are any instructions other than mov for which we get valSize > 4 other wise, we should just update this code to cap valSize to 4 bytes.

This is fixed by making sure we cap valSize to 4 bytes for all instructions except mov.

  1. There are multiple places that has code like this:
sz = calculateSize();
id->idCodeSize(sz);
emitCurIGsize += sz; // This can be wrong for sz >= 15.

This problem should never arise because I have removed the code that caps sz to 15 bytes and added an assert for sz <= 15. With that, it will be safe to use emitCurIGsize += sz.

Thanks @tannergooding for pointing me to manuals.

Fixes: #12840, #57041

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 10, 2021
@kunalspathak kunalspathak changed the title Add assert imul instruction misprediction Aug 10, 2021
@kunalspathak kunalspathak marked this pull request as ready for review August 12, 2021 14:18
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@briansull briansull changed the title imul instruction misprediction imul instruction size misprediction Aug 12, 2021
@kunalspathak
Copy link
Member Author

Can someone take a look so we can get this in before next week? @echesakov or @tannergooding

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM for a quick fix.

We should log an issue to track fixing this properly (likely by centralizing the size computation logic so we don't risk duplicating checks, etc).

@kunalspathak
Copy link
Member Author

We should log an issue to track fixing this properly (likely by centralizing the size computation logic so we don't risk duplicating checks, etc).

Sure, I have created #57368 and included links to multiple issues that talks about it.

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed it and it looks OK
(and better than before)

@JulieLeeMSFT JulieLeeMSFT linked an issue Aug 13, 2021 that may be closed by this pull request
Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak
Copy link
Member Author

no asmdiff in benchmarks/libraries. Minor improvements in coreclr_tests:

Summary of Allocation Size diffs:
(Lower is better)

Total bytes of base: 5881
Total bytes of diff: 5852
Total bytes of delta: -29 (-0.49% of base)
Total relative delta: -0.07
    diff is an improvement.
    relative diff is an improvement.


Top file improvements (bytes):
          -8 : 82370.dasm (-0.95% of base)
          -4 : 245861.dasm (-0.82% of base)
          -4 : 247166.dasm (-0.32% of base)
          -4 : 245841.dasm (-0.82% of base)
          -4 : 81148.dasm (-3.96% of base)
          -4 : 247130.dasm (-0.33% of base)
          -1 : 240673.dasm (-0.07% of base)

7 total files with Allocation Size differences (7 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -8 (-0.95% of base) : 82370.dasm - ldfldstatic:Main():int
          -4 (-0.82% of base) : 245861.dasm - MyClass:TestStaticFields()
          -4 (-0.32% of base) : 247166.dasm - ShiftTest.ulong32Test:Main():int
          -4 (-0.82% of base) : 245841.dasm - MyClass:TestStaticFields()
          -4 (-3.96% of base) : 81148.dasm - Program:Main():int
          -4 (-0.33% of base) : 247130.dasm - ShiftTest.longTest:Main():int
          -1 (-0.07% of base) : 240673.dasm - AA:Static5(System.Double[],byref,byref,ushort,System.Boolean[][],byref,System.UInt32[][,,],long):System.SByte[,][]

Top method improvements (percentages):
          -4 (-3.96% of base) : 81148.dasm - Program:Main():int
          -8 (-0.95% of base) : 82370.dasm - ldfldstatic:Main():int
          -4 (-0.82% of base) : 245841.dasm - MyClass:TestStaticFields()
          -4 (-0.82% of base) : 245861.dasm - MyClass:TestStaticFields()
          -4 (-0.33% of base) : 247130.dasm - ShiftTest.longTest:Main():int
          -4 (-0.32% of base) : 247166.dasm - ShiftTest.ulong32Test:Main():int
          -1 (-0.07% of base) : 240673.dasm - AA:Static5(System.Double[],byref,byref,ushort,System.Boolean[][],byref,System.UInt32[][,,],long):System.SByte[,][]

7 total methods with Allocation Size differences (7 improved, 0 regressed), 0 unchanged.

@kunalspathak kunalspathak merged commit 20f4c7e into dotnet:main Aug 14, 2021
@kunalspathak kunalspathak deleted the imul branch August 14, 2021 00:54
@ghost ghost locked as resolved and limited conversation to collaborators Sep 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
4 participants