imul instruction size misprediction #57179

kunalspathak · 2021-08-10T23:33:22Z

During emitting imul, we wrongly predict the size of instruction as 16 bytes whereas it should be just 15 bytes (which is debatable, see below) comprising of 1 byte REX prefix + 1 byte opcode + 1 byte modR/M + 4 bytes RIP-relative displacement + 4 bytes of immediate. Last row in https://www.felixcloutier.com/x86/imul table.

Now as per Intel docs:

RIP-relative addressing allows specific ModR/M modes to address memory relative to the 64-bit RIP using a signed
32-bit displacement.

which means the RIP immediate above should be just 4-bytes and the instruction size should be just 11 bytes. But in emitInsSizeCV() we have it calculated as valSize == 8.

When calculating the size, it seems that we double calculate the REX prefix size, once in

runtime/src/coreclr/jit/emitxarch.cpp

Line 2612 in 53a1ddc

size += emitGetRexPrefixSize(ins);

and then further in emitInsSize() -> emitGetPrefixSize() in

runtime/src/coreclr/jit/emitxarch.cpp

Lines 1188 to 1191 in 53a1ddc

    
           if (hasRexPrefix(code)) 
        
           { 
        
               return 1; 
        
           }

Because of above conditions, we end up with size of 16 bytes but when we save the size in instrDesc, we cap it to 15 bytes because of #12840.

runtime/src/coreclr/jit/emit.h

Lines 887 to 896 in d20606c

    
           if (sz > 15) 
        
           { 
        
               // This is a temporary workaround for non-precise instr size 
        
               // estimator on XARCH. It often overestimates sizes and can 
        
               // return value more than 15 that doesn't fit in 4 bits _idCodeSize. 
        
               // If somehow we generate instruction that needs more than 15 bytes we 
        
               // will fail on another assert in emit.cpp: noway_assert(id->idCodeSize() >= csz). 
        
               // Issue https://github.com/dotnet/runtime/issues/12840. 
        
               sz = 15; 
        
           }

Further, we account the wrong size of instruction 16 bytes instead of the one we have in id->CodeSize() of 15 bytes.

runtime/src/coreclr/jit/emitxarch.cpp

Lines 5596 to 5601 in 53a1ddc

    
               id->idCodeSize(sz); 
        
               dispIns(id); 
        
               emitCurIGsize += sz; 
        
           }

Due to this, the emitCurIGSize and hence ig->igOffs becomes off by 1 byte and during emitting, we think that we just over-estimated the instruction by 4 bytes = 15 (id->CodeSize()) - 11 (actual encoding size) but the offsets were calculated assuming the instruction size of 16 bytes. That difference of 1 byte leads to an assert in #57041.

To summarize, there are 3 problems:

Double counting of REX prefix which happens at multiple places.

This is fixed by tracking if we have already counted REX prefix size and if yes, do not include it in emitGetPrefixSize().

Why is valSize > 4 for RIP-immediate . I have added an assert in this draft PR to see if there are any instructions other than mov for which we get valSize > 4 other wise, we should just update this code to cap valSize to 4 bytes.

This is fixed by making sure we cap valSize to 4 bytes for all instructions except mov.

There are multiple places that has code like this:

sz = calculateSize();
id->idCodeSize(sz);
emitCurIGsize += sz; // This can be wrong for sz >= 15.

This problem should never arise because I have removed the code that caps sz to 15 bytes and added an assert for sz <= 15. With that, it will be safe to use emitCurIGsize += sz.

Thanks @tannergooding for pointing me to manuals.

Fixes: #12840, #57041

kunalspathak · 2021-08-12T14:18:44Z

@dotnet/jit-contrib

kunalspathak · 2021-08-13T18:14:34Z

Can someone take a look so we can get this in before next week? @echesakov or @tannergooding

tannergooding

Changes LGTM for a quick fix.

We should log an issue to track fixing this properly (likely by centralizing the size computation logic so we don't risk duplicating checks, etc).

kunalspathak · 2021-08-13T18:52:03Z

We should log an issue to track fixing this properly (likely by centralizing the size computation logic so we don't risk duplicating checks, etc).

Sure, I have created #57368 and included links to multiple issues that talks about it.

briansull

I reviewed it and it looks OK
(and better than before)

src/coreclr/jit/emit.h

echesakov

LGTM

kunalspathak · 2021-08-14T00:54:01Z

no asmdiff in benchmarks/libraries. Minor improvements in coreclr_tests:

Summary of Allocation Size diffs:
(Lower is better)

Total bytes of base: 5881
Total bytes of diff: 5852
Total bytes of delta: -29 (-0.49% of base)
Total relative delta: -0.07
    diff is an improvement.
    relative diff is an improvement.


Top file improvements (bytes):
          -8 : 82370.dasm (-0.95% of base)
          -4 : 245861.dasm (-0.82% of base)
          -4 : 247166.dasm (-0.32% of base)
          -4 : 245841.dasm (-0.82% of base)
          -4 : 81148.dasm (-3.96% of base)
          -4 : 247130.dasm (-0.33% of base)
          -1 : 240673.dasm (-0.07% of base)

7 total files with Allocation Size differences (7 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -8 (-0.95% of base) : 82370.dasm - ldfldstatic:Main():int
          -4 (-0.82% of base) : 245861.dasm - MyClass:TestStaticFields()
          -4 (-0.32% of base) : 247166.dasm - ShiftTest.ulong32Test:Main():int
          -4 (-0.82% of base) : 245841.dasm - MyClass:TestStaticFields()
          -4 (-3.96% of base) : 81148.dasm - Program:Main():int
          -4 (-0.33% of base) : 247130.dasm - ShiftTest.longTest:Main():int
          -1 (-0.07% of base) : 240673.dasm - AA:Static5(System.Double[],byref,byref,ushort,System.Boolean[][],byref,System.UInt32[][,,],long):System.SByte[,][]

Top method improvements (percentages):
          -4 (-3.96% of base) : 81148.dasm - Program:Main():int
          -8 (-0.95% of base) : 82370.dasm - ldfldstatic:Main():int
          -4 (-0.82% of base) : 245841.dasm - MyClass:TestStaticFields()
          -4 (-0.82% of base) : 245861.dasm - MyClass:TestStaticFields()
          -4 (-0.33% of base) : 247130.dasm - ShiftTest.longTest:Main():int
          -4 (-0.32% of base) : 247166.dasm - ShiftTest.ulong32Test:Main():int
          -1 (-0.07% of base) : 240673.dasm - AA:Static5(System.Double[],byref,byref,ushort,System.Boolean[][],byref,System.UInt32[][,,],long):System.SByte[,][]

7 total methods with Allocation Size differences (7 improved, 0 regressed), 0 unchanged.

Add assert

bc192fa

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 10, 2021

kunalspathak changed the title ~~Add assert~~ imul instruction misprediction Aug 10, 2021

kunalspathak added 4 commits August 11, 2021 10:39

Remove the assert in emitInsSizeCV

cf6efb7

Add a check for includeRexPrefixSize

d15ea32

Remove the codeSize() capping code added to fix dotnet#12840

3f5518e

Make immediate only 4bytes long for non-mov instructions

7861678

kunalspathak marked this pull request as ready for review August 12, 2021 14:18

Delete a commented code

9fc5bbc

briansull changed the title ~~imul instruction misprediction~~ imul instruction size misprediction Aug 12, 2021

tannergooding approved these changes Aug 13, 2021

View reviewed changes

briansull approved these changes Aug 13, 2021

View reviewed changes

JulieLeeMSFT linked an issue Aug 13, 2021 that may be closed by this pull request

Assertion failed 'emitOffsAdj == newOffsAdj' during 'Emit code' #57041

Closed

echesakov reviewed Aug 13, 2021

View reviewed changes

src/coreclr/jit/emit.h Show resolved Hide resolved

echesakov approved these changes Aug 13, 2021

View reviewed changes

kunalspathak merged commit 20f4c7e into dotnet:main Aug 14, 2021

kunalspathak deleted the imul branch August 14, 2021 00:54

ghost locked as resolved and limited conversation to collaborators Sep 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imul instruction size misprediction #57179

imul instruction size misprediction #57179

kunalspathak commented Aug 10, 2021 •

edited

Loading

kunalspathak commented Aug 12, 2021

kunalspathak commented Aug 13, 2021

tannergooding left a comment

kunalspathak commented Aug 13, 2021

briansull left a comment

echesakov left a comment

kunalspathak commented Aug 14, 2021

	if (sz > 15)
	{
	// This is a temporary workaround for non-precise instr size
	// estimator on XARCH. It often overestimates sizes and can
	// return value more than 15 that doesn't fit in 4 bits _idCodeSize.
	// If somehow we generate instruction that needs more than 15 bytes we
	// will fail on another assert in emit.cpp: noway_assert(id->idCodeSize() >= csz).
	// Issue https://github.com/dotnet/runtime/issues/12840.
	sz = 15;
	}

imul instruction size misprediction #57179

imul instruction size misprediction #57179

Conversation

kunalspathak commented Aug 10, 2021 • edited Loading

kunalspathak commented Aug 12, 2021

kunalspathak commented Aug 13, 2021

tannergooding left a comment

Choose a reason for hiding this comment

kunalspathak commented Aug 13, 2021

briansull left a comment

Choose a reason for hiding this comment

echesakov left a comment

Choose a reason for hiding this comment

kunalspathak commented Aug 14, 2021

kunalspathak commented Aug 10, 2021 •

edited

Loading