Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal fix for Issue 620 #40535

Merged
merged 2 commits into from
Aug 8, 2020
Merged

Minimal fix for Issue 620 #40535

merged 2 commits into from
Aug 8, 2020

Conversation

briansull
Copy link
Contributor

Turns a GT_IND(GT_ADDR(GT_LCL_VAR) into a GT_LCL_FLD when necessary for correctness.

Added test case: Runtime_620.cs
Added lvForceLoadNormalize

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 7, 2020
@briansull
Copy link
Contributor Author

Fixes #620

@briansull
Copy link
Contributor Author

briansull commented Aug 7, 2020

Crossgen CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Total bytes of diff: 574 (0.00% of base) - diff is a regression.
9 total files with Code Size differences (0 improved, 9 regressed), 257 unchanged.
Top method regressions (bytes):
          48 ( 2.76% of base) : System.Private.CoreLib.dasm - Enumerator:MoveNext():bool:this (7 methods)
          27 ( 2.60% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Symbols.SourceMethodSymbol:EarlyDecodeWellKnownAttribute(byref):Microsoft.CodeAnalysis.VisualBasic.Symbols.VisualBasicAttributeData:this
          24 ( 2.14% of base) : System.CodeDom.dasm - Microsoft.VisualBasic.VBCodeGenerator:QuoteSnippetString(System.String):System.String:this
          24 (22.43% of base) : System.Private.CoreLib.dasm - System.Text.Rune:ToString():System.String:this
          22 ( 2.71% of base) : System.Private.CoreLib.dasm - System.Reflection.MdConstant:GetValue(System.Reflection.MetadataImport,int,System.RuntimeTypeHandle,bool):System.Object
          21 (12.07% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEParameterSymbol:get_HasCallerLineNumberAttribute():bool:this
          21 (350.00% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetInt16WithAllBitsSet():short
          20 (333.33% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetUInt16WithAllBitsSet():ushort
          20 (57.14% of base) : System.Private.CoreLib.dasm - System.Buffers.Binary.BinaryPrimitives:TryReadInt16BigEndian(System.ReadOnlySpan`1[Byte],byref):bool
          20 (16.39% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonReader:GetInt16WithQuotes():short:this
          19 (316.67% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetSByteWithAllBitsSet():byte
          19 (55.88% of base) : System.Private.CoreLib.dasm - System.Buffers.Binary.BinaryPrimitives:TryReadUInt16BigEndian(System.ReadOnlySpan`1[Byte],byref):bool
          19 (15.57% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonReader:GetSByteWithQuotes():byte:this
          19 (15.70% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonReader:GetUInt16WithQuotes():ushort:this
          18 (10.47% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEParameterSymbol:get_HasCallerFilePathAttribute():bool:this
          18 (10.47% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEParameterSymbol:get_HasCallerMemberNameAttribute():bool:this
          18 (300.00% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetByteWithAllBitsSet():ubyte
          18 (72.00% of base) : System.Private.CoreLib.dasm - System.Span`1[SByte][System.SByte]:Fill(byte):this
          18 (15.25% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonReader:GetByteWithQuotes():ubyte:this
          15 ( 7.77% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEParameterSymbol:get_IsCallerLineNumber():bool:this
          15 ( 7.28% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEParameterSymbol:get_IsCallerFilePath():bool:this
          15 ( 6.85% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PEParameterSymbol:get_IsCallerMemberName():bool:this
          15 ( 0.50% of base) : System.Private.Xml.dasm - System.Xml.XmlConvert:DecodeName(System.String):System.String

Top method regressions (percentages):
          21 (350.00% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetInt16WithAllBitsSet():short
          20 (333.33% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetUInt16WithAllBitsSet():ushort
          19 (316.67% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetSByteWithAllBitsSet():byte
          18 (300.00% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetByteWithAllBitsSet():ubyte
          18 (72.00% of base) : System.Private.CoreLib.dasm - System.Span`1[SByte][System.SByte]:Fill(byte):this
          20 (57.14% of base) : System.Private.CoreLib.dasm - System.Buffers.Binary.BinaryPrimitives:TryReadInt16BigEndian(System.ReadOnlySpan`1[Byte],byref):bool
          19 (55.88% of base) : System.Private.CoreLib.dasm - System.Buffers.Binary.BinaryPrimitives:TryReadUInt16BigEndian(System.ReadOnlySpan`1[Byte],byref):bool
          24 (22.43% of base) : System.Private.CoreLib.dasm - System.Text.Rune:ToString():System.String:this
          20 (16.39% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonReader:GetInt16WithQuotes():short:this
          19 (15.70% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonReader:GetUInt16WithQuotes():ushort:this
          19 (15.57% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonReader:GetSByteWithQuotes():byte:this
          18 (15.25% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonReader:GetByteWithQuotes():ubyte:this

@briansull
Copy link
Contributor Author

Sample regression:

Source code:

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static ushort GetUInt16WithAllBitsSet()
        {
            ushort value = 0;
            unsafe
            {
                unchecked
                {
                    *((ushort*)&value) = (ushort)0xffff;
                }
            }
            return value;
        }

Original:

; Assembly listing for method System.Numerics.ConstantHelper:GetUInt16WithAllBitsSet():ushort
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 loc0         [V00,T00] (  0,  0   )  ushort  ->  zero-ref    ld-addr-op
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M13827_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M13827_IG02:
       mov      eax, 0xFFFF
						;; bbWeight=1    PerfScore 0.25
G_M13827_IG03:
       ret      
						;; bbWeight=1    PerfScore 1.00

; Total bytes of code 6, prolog size 0, PerfScore 1.85, (MethodHash=ae96c9fc) for method System.Numerics.ConstantHelper:GetUInt16WithAllBitsSet():ushort

New version:

; Assembly listing for method System.Numerics.ConstantHelper:GetUInt16WithAllBitsSet():ushort
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 loc0         [V00,T00] (  3,  3   )  ushort  ->  [rsp+0x04]   do-not-enreg[F] ld-addr-op
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 8

G_M13827_IG01:
       push     rax
						;; bbWeight=1    PerfScore 1.00
G_M13827_IG02:
       xor      eax, eax
       mov      dword ptr [rsp+04H], eax
       mov      word  ptr [rsp+04H], 0xFFFF
       mov      eax, dword ptr [rsp+04H]
       movzx    rax, ax
						;; bbWeight=1    PerfScore 3.50
G_M13827_IG03:
       add      rsp, 8
       ret      
						;; bbWeight=1    PerfScore 1.25

; Total bytes of code 26, prolog size 1, PerfScore 8.35, (MethodHash=ae96c9fc) for method System.Numerics.ConstantHelper:GetUInt16WithAllBitsSet():ushort

Added test case: Runtime_620.cs
Added lvForceLoadNormalize
@briansull briansull force-pushed the orig_620 branch 2 times, most recently from bceadec to 43980f1 Compare August 7, 2020 17:09
@briansull
Copy link
Contributor Author

@CarolEidt @dotnet/jit-contrib PTAL

@briansull
Copy link
Contributor Author

runtime (Libraries Build Linux_musl x64 Debug) Failing after 14m

due to #40416

//
// For loads signed/unsigned differences do matter.
//
if (varTypeIsUnsigned(lvaTable[lclNum].lvType) == varTypeIsUnsigned(typ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an else if? If not, I don't understand the "otherwise" comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will change this to an else if,
I reordered these two during a refactoring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of additional methods had regressions:

 21 (350.00% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetInt16WithAllBitsSet():short
 19 (316.67% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetSByteWithAllBitsSet():byte

Copy link
Contributor Author

@briansull briansull Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally the BYTEMARK test IDEA also has a significant performace/codesize regression.
This existed before the change to "else if".

before: IDEA: Iterations/sec: 5542.04724 Index: 84.76411
after: IDEA: Iterations/sec: 3428.97638 Index: 52.44527

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No other methods in the benchmarks had code diffs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have a look at the IDEA regression?

Copy link
Contributor Author

@briansull briansull Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it has many unsafe casts and is a awkward port of the C code:
The regression occurs in
private static void cipher_idea(byte[] xin, byte[] xout, int offset, char[] Z)
Where we inline the MUL and low16 methods, into the inner loop.
This forces x1, x4, t2 and t1 to become GT_LCL_FLD and thus memory accesses.

                MUL(ref x1, Z[idx++]);
                MUL(ref x4, Z[idx++]);
                MUL(ref t2, Z[idx++]);
                MUL(ref t1, Z[idx++]);

    // These were macros in the original C code

    /* #define low16(x) ((x) & 0x0FFFF) */
    private static char low16(int x)
    {
        return (char)((x) & 0x0FFFF);
    }

    /* #define MUL(x,y) (x=mul(low16(x),y)) */
    private static void MUL(ref char x, char y)
    {
        x = mul(low16(x), y);
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My other fix does better with a much smaller regression:

before: IDEA: Iterations/sec: 5542.04724 Index: 84.76411
after: IDEA:  Iterations/sec: 5244.63054  Index: 80.21521

Copy link
Contributor Author

@briansull briansull Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can change/fix the source code that could be another option:

Original:

    /* #define MUL(x,y) (x=mul(low16(x),y)) */
    private static void MUL(ref char x, char y)
    {
        x = mul(low16(x), y);
    }

Modification:

    /* #define MUL(x,y) (x=mul(low16(x),y)) */
    private static char MUL(char x, char y)
    {
        return mul(low16(x), y);
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried with the above code modification and with the change all version produce the same Assembly code
and the execution times are about the same as the original version.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - and thanks for adding the test!

@CarolEidt
Copy link
Contributor

Thanks @briansull for the analysis. Would you be willing to open an issue to address the regression in .NET 6? I think it would make sense to add the optimization you'd proposed adding in your original fix, as well as possibly suggesting the improvement to the benchmark.

@briansull briansull merged commit e1494e8 into dotnet:master Aug 8, 2020
@briansull briansull deleted the orig_620 branch August 8, 2020 19:59
Jacksondr5 pushed a commit to Jacksondr5/runtime that referenced this pull request Aug 10, 2020
* Minimal fix for Issue 620
Added test case: Runtime_620.cs
Added lvForceLoadNormalize

* Changed conditional to "else if"
@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants