Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove AggressiveInlining from XxHash128.HashLength0To16 #81565

Merged
merged 2 commits into from
Feb 3, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 2, 2023

static byte[] Test(byte[] data) => XxHash128.Hash(data);
Codegen
G_M5885_IG01:              ;; offset=0000H
       4156                 push     r14
       57                   push     rdi
       56                   push     rsi
       55                   push     rbp
       53                   push     rbx
       4881EC90000000       sub      rsp, 144
       C5F877               vzeroupper
       33C0                 xor      eax, eax
       4889442428           mov      qword ptr [rsp+28H], rax
       C5D857E4             vxorps   xmm4, xmm4
       C5F97F642430         vmovdqa  xmmword ptr [rsp+30H], xmm4
       C5F97F642440         vmovdqa  xmmword ptr [rsp+40H], xmm4
       C5F97F642450         vmovdqa  xmmword ptr [rsp+50H], xmm4
       C5F97F642460         vmovdqa  xmmword ptr [rsp+60H], xmm4
                                                ;; size=51 bbWeight=1 PerfScore 15.83
G_M5885_IG02:              ;; offset=0033H
       4885C9               test     rcx, rcx
       0F84DE010000         je       G_M5885_IG16
                                                ;; size=9 bbWeight=1 PerfScore 1.25
G_M5885_IG03:              ;; offset=003CH
       488D7110             lea      rsi, bword ptr [rcx+10H]
       8B7908               mov      edi, dword ptr [rcx+08H]
       48B998A6825AFB7F0000 mov      rcx, 0x7FFB5A82A698      ; ubyte[]
       BA10000000           mov      edx, 16
       E82959885F           call     CORINFO_HELP_NEWARR_1_VC
       488BD8               mov      rbx, rax
       488D6B10             lea      rbp, bword ptr [rbx+10H]
       41BE10000000         mov      r14d, 16
       448BC7               mov      r8d, edi
       4183FE10             cmp      r14d, 16
       0F8C91010000         jl       G_M5885_IG13
                                                ;; size=53 bbWeight=1.00 PerfScore 6.50
G_M5885_IG04:              ;; offset=0071H
       4889742468           mov      bword ptr [rsp+68H], rsi
       488BD6               mov      rdx, rsi
       4183F810             cmp      r8d, 16
       0F870A010000         ja       G_M5885_IG09
       4183F808             cmp      r8d, 8
       7613                 jbe      SHORT G_M5885_IG05
       488D4C2458           lea      rcx, [rsp+58H]
       4533C9               xor      r9d, r9d
       FF15C1FF6000         call     [System.IO.Hashing.XxHash128:HashLength9To16(ulong,uint,ulong):System.IO.Hashing.XxHash128+Hash128]
       E9E3000000           jmp      G_M5885_IG08
                                                ;; size=43 bbWeight=0.50 PerfScore 4.75
G_M5885_IG05:              ;; offset=009CH
       4183F804             cmp      r8d, 4
       7213                 jb       SHORT G_M5885_IG06
       488D4C2458           lea      rcx, [rsp+58H]
       4533C9               xor      r9d, r9d
       FF1590FF6000         call     [System.IO.Hashing.XxHash128:HashLength4To8(ulong,uint,ulong):System.IO.Hashing.XxHash128+Hash128]
       E9CA000000           jmp      G_M5885_IG08
                                                ;; size=25 bbWeight=0.50 PerfScore 3.50
G_M5885_IG06:              ;; offset=00B5H
       4585C0               test     r8d, r8d
       747B                 je       SHORT G_M5885_IG07
       0FB60A               movzx    rcx, byte  ptr [rdx]
       418BC0               mov      eax, r8d
       D1E8                 shr      eax, 1
       0FB60402             movzx    rax, byte  ptr [rdx+rax]
       458D48FF             lea      r9d, [r8-01H]
       420FB6140A           movzx    rdx, byte  ptr [rdx+r9]
       C1E110               shl      ecx, 16
       C1E018               shl      eax, 24
       0BC8                 or       ecx, eax
       0BCA                 or       ecx, edx
       41C1E008             shl      r8d, 8
       410BC8               or       ecx, r8d
       8BC1                 mov      eax, ecx
       0FC8                 bswap    eax
       C1C00D               rol      eax, 13
       8BC9                 mov      ecx, ecx
       BA9B5A2787           mov      edx, 0x87275A9B
       4833CA               xor      rcx, rdx
       8BF0                 mov      esi, eax
       4881F68B202C30       xor      rsi, 0x302C208B
       FF1598186100         call     [System.IO.Hashing.XxHash64:Avalanche(ulong):ulong]
       488BF8               mov      rdi, rax
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       C5F811442438         vmovups  xmmword ptr [rsp+38H], xmm0
       488BCE               mov      rcx, rsi
       FF1582186100         call     [System.IO.Hashing.XxHash64:Avalanche(ulong):ulong]
       4C8BC0               mov      r8, rax
       488D4C2438           lea      rcx, [rsp+38H]
       488BD7               mov      rdx, rdi
       FF15D9026100         call     [System.IO.Hashing.XxHash128+Hash128:.ctor(ulong,ulong):this]
       C5F810442438         vmovups  xmm0, xmmword ptr [rsp+38H]
       C5F811442458         vmovups  xmmword ptr [rsp+58H], xmm0
       EB4A                 jmp      SHORT G_M5885_IG08
                                                ;; size=128 bbWeight=0.50 PerfScore 15.54
G_M5885_IG07:              ;; offset=0135H
       48B9B4F837308A902E68 mov      rcx, 0x682E908A3037F8B4
       FF1553186100         call     [System.IO.Hashing.XxHash64:Avalanche(ulong):ulong]
       488BF0               mov      rsi, rax
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       C5F811442448         vmovups  xmmword ptr [rsp+48H], xmm0
       48B9497DA7D10BD35596 mov      rcx, 0x9655D30BD1A77D49
       FF1536186100         call     [System.IO.Hashing.XxHash64:Avalanche(ulong):ulong]
       4C8BC0               mov      r8, rax
       488D4C2448           lea      rcx, [rsp+48H]
       488BD6               mov      rdx, rsi
       FF158D026100         call     [System.IO.Hashing.XxHash128+Hash128:.ctor(ulong,ulong):this]
       C5F810442448         vmovups  xmm0, xmmword ptr [rsp+48H]
       C5F811442458         vmovups  xmmword ptr [rsp+58H], xmm0
                                                ;; size=74 bbWeight=0.50 PerfScore 8.04
G_M5885_IG08:              ;; offset=017FH
       C5F810442458         vmovups  xmm0, xmmword ptr [rsp+58H]
       C5F811442470         vmovups  xmmword ptr [rsp+70H], xmm0
       EB40                 jmp      SHORT G_M5885_IG12
                                                ;; size=14 bbWeight=0.50 PerfScore 3.00
G_M5885_IG09:              ;; offset=018DH
       4181F880000000       cmp      r8d, 128
       7710                 ja       SHORT G_M5885_IG10
       488D4C2470           lea      rcx, [rsp+70H]
       4533C9               xor      r9d, r9d
       FF15CCFE6000         call     [System.IO.Hashing.XxHash128:HashLength17To128(ulong,uint,ulong):System.IO.Hashing.XxHash128+Hash128]
       EB27                 jmp      SHORT G_M5885_IG12
                                                ;; size=25 bbWeight=0.50 PerfScore 3.50
G_M5885_IG10:              ;; offset=01A6H
       4181F8F0000000       cmp      r8d, 240
       7710                 ja       SHORT G_M5885_IG11
       488D4C2470           lea      rcx, [rsp+70H]
       4533C9               xor      r9d, r9d
       FF15CBFE6000         call     [System.IO.Hashing.XxHash128:HashLength129To240(ulong,uint,ulong):System.IO.Hashing.XxHash128+Hash128]
       EB0E                 jmp      SHORT G_M5885_IG12
                                                ;; size=25 bbWeight=0.50 PerfScore 3.50
G_M5885_IG11:              ;; offset=01BFH
       488D4C2470           lea      rcx, [rsp+70H]
       4533C9               xor      r9d, r9d
       FF15D3FE6000         call     [System.IO.Hashing.XxHash128:HashLengthOver240(ulong,uint,ulong):System.IO.Hashing.XxHash128+Hash128]
                                                ;; size=14 bbWeight=0.50 PerfScore 1.88
G_M5885_IG12:              ;; offset=01CDH
       33D2                 xor      rdx, rdx
       4889542468           mov      bword ptr [rsp+68H], rdx
       C5F810442470         vmovups  xmm0, xmmword ptr [rsp+70H]
       C5F811842480000000   vmovups  xmmword ptr [rsp+80H], xmm0
       48896C2428           mov      bword ptr [rsp+28H], rbp
       4489742430           mov      dword ptr [rsp+30H], r14d
       488D542428           lea      rdx, [rsp+28H]
       488D8C2480000000     lea      rcx, [rsp+80H]
       FF15F8FD6000         call     [System.IO.Hashing.XxHash128:WriteBigEndian128(byref,System.Span`1[ubyte])]
       EB07                 jmp      SHORT G_M5885_IG14
                                                ;; size=53 bbWeight=0.50 PerfScore 6.62
G_M5885_IG13:              ;; offset=0202H
       FF1588FC6000         call     [System.IO.Hashing.NonCryptographicHashAlgorithm:ThrowDestinationTooShort()]
       CC                   int3
                                                ;; size=7 bbWeight=0.50 PerfScore 1.62
G_M5885_IG14:              ;; offset=0209H
       488BC3               mov      rax, rbx
                                                ;; size=3 bbWeight=1 PerfScore 0.25
G_M5885_IG15:              ;; offset=020CH
       4881C490000000       add      rsp, 144
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       415E                 pop      r14
       C3                   ret
                                                ;; size=14 bbWeight=1 PerfScore 3.75
G_M5885_IG16:              ;; offset=021AH
       B9C1000000           mov      ecx, 193
       48BA30CAB65AFB7F0000 mov      rdx, 0x7FFB5AB6CA30
       E8B22E535F           call     CORINFO_HELP_STRCNS
       488BC8               mov      rcx, rax
       FF15F99E1C00         call     [System.ArgumentNullException:Throw(System.String)]
       CC                   int3
                                                ;; size=30 bbWeight=0 PerfScore 0.00

; Total bytes of code 568, prolog size 51, PerfScore 136.34, instruction count 135, allocated bytes for code 568 (MethodHash=6f80e902) for method P:Test(ubyte[]):ubyte[]
; ============================================================

The codegen contains several simple non-inlined calls due to "exceeds inlining budget", here is the tree of decisions:

image

After this PR:

image

@ghost
Copy link

ghost commented Feb 2, 2023

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details
static byte[] Test(byte[] data) => XxHash128.Hash(data);
Codegen
G_M5885_IG01:              ;; offset=0000H
       4156                 push     r14
       57                   push     rdi
       56                   push     rsi
       55                   push     rbp
       53                   push     rbx
       4881EC90000000       sub      rsp, 144
       C5F877               vzeroupper
       33C0                 xor      eax, eax
       4889442428           mov      qword ptr [rsp+28H], rax
       C5D857E4             vxorps   xmm4, xmm4
       C5F97F642430         vmovdqa  xmmword ptr [rsp+30H], xmm4
       C5F97F642440         vmovdqa  xmmword ptr [rsp+40H], xmm4
       C5F97F642450         vmovdqa  xmmword ptr [rsp+50H], xmm4
       C5F97F642460         vmovdqa  xmmword ptr [rsp+60H], xmm4
                                                ;; size=51 bbWeight=1 PerfScore 15.83
G_M5885_IG02:              ;; offset=0033H
       4885C9               test     rcx, rcx
       0F84DE010000         je       G_M5885_IG16
                                                ;; size=9 bbWeight=1 PerfScore 1.25
G_M5885_IG03:              ;; offset=003CH
       488D7110             lea      rsi, bword ptr [rcx+10H]
       8B7908               mov      edi, dword ptr [rcx+08H]
       48B998A6825AFB7F0000 mov      rcx, 0x7FFB5A82A698      ; ubyte[]
       BA10000000           mov      edx, 16
       E82959885F           call     CORINFO_HELP_NEWARR_1_VC
       488BD8               mov      rbx, rax
       488D6B10             lea      rbp, bword ptr [rbx+10H]
       41BE10000000         mov      r14d, 16
       448BC7               mov      r8d, edi
       4183FE10             cmp      r14d, 16
       0F8C91010000         jl       G_M5885_IG13
                                                ;; size=53 bbWeight=1.00 PerfScore 6.50
G_M5885_IG04:              ;; offset=0071H
       4889742468           mov      bword ptr [rsp+68H], rsi
       488BD6               mov      rdx, rsi
       4183F810             cmp      r8d, 16
       0F870A010000         ja       G_M5885_IG09
       4183F808             cmp      r8d, 8
       7613                 jbe      SHORT G_M5885_IG05
       488D4C2458           lea      rcx, [rsp+58H]
       4533C9               xor      r9d, r9d
       FF15C1FF6000         call     [System.IO.Hashing.XxHash128:HashLength9To16(ulong,uint,ulong):System.IO.Hashing.XxHash128+Hash128]
       E9E3000000           jmp      G_M5885_IG08
                                                ;; size=43 bbWeight=0.50 PerfScore 4.75
G_M5885_IG05:              ;; offset=009CH
       4183F804             cmp      r8d, 4
       7213                 jb       SHORT G_M5885_IG06
       488D4C2458           lea      rcx, [rsp+58H]
       4533C9               xor      r9d, r9d
       FF1590FF6000         call     [System.IO.Hashing.XxHash128:HashLength4To8(ulong,uint,ulong):System.IO.Hashing.XxHash128+Hash128]
       E9CA000000           jmp      G_M5885_IG08
                                                ;; size=25 bbWeight=0.50 PerfScore 3.50
G_M5885_IG06:              ;; offset=00B5H
       4585C0               test     r8d, r8d
       747B                 je       SHORT G_M5885_IG07
       0FB60A               movzx    rcx, byte  ptr [rdx]
       418BC0               mov      eax, r8d
       D1E8                 shr      eax, 1
       0FB60402             movzx    rax, byte  ptr [rdx+rax]
       458D48FF             lea      r9d, [r8-01H]
       420FB6140A           movzx    rdx, byte  ptr [rdx+r9]
       C1E110               shl      ecx, 16
       C1E018               shl      eax, 24
       0BC8                 or       ecx, eax
       0BCA                 or       ecx, edx
       41C1E008             shl      r8d, 8
       410BC8               or       ecx, r8d
       8BC1                 mov      eax, ecx
       0FC8                 bswap    eax
       C1C00D               rol      eax, 13
       8BC9                 mov      ecx, ecx
       BA9B5A2787           mov      edx, 0x87275A9B
       4833CA               xor      rcx, rdx
       8BF0                 mov      esi, eax
       4881F68B202C30       xor      rsi, 0x302C208B
       FF1598186100         call     [System.IO.Hashing.XxHash64:Avalanche(ulong):ulong]
       488BF8               mov      rdi, rax
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       C5F811442438         vmovups  xmmword ptr [rsp+38H], xmm0
       488BCE               mov      rcx, rsi
       FF1582186100         call     [System.IO.Hashing.XxHash64:Avalanche(ulong):ulong]
       4C8BC0               mov      r8, rax
       488D4C2438           lea      rcx, [rsp+38H]
       488BD7               mov      rdx, rdi
       FF15D9026100         call     [System.IO.Hashing.XxHash128+Hash128:.ctor(ulong,ulong):this]
       C5F810442438         vmovups  xmm0, xmmword ptr [rsp+38H]
       C5F811442458         vmovups  xmmword ptr [rsp+58H], xmm0
       EB4A                 jmp      SHORT G_M5885_IG08
                                                ;; size=128 bbWeight=0.50 PerfScore 15.54
G_M5885_IG07:              ;; offset=0135H
       48B9B4F837308A902E68 mov      rcx, 0x682E908A3037F8B4
       FF1553186100         call     [System.IO.Hashing.XxHash64:Avalanche(ulong):ulong]
       488BF0               mov      rsi, rax
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       C5F811442448         vmovups  xmmword ptr [rsp+48H], xmm0
       48B9497DA7D10BD35596 mov      rcx, 0x9655D30BD1A77D49
       FF1536186100         call     [System.IO.Hashing.XxHash64:Avalanche(ulong):ulong]
       4C8BC0               mov      r8, rax
       488D4C2448           lea      rcx, [rsp+48H]
       488BD6               mov      rdx, rsi
       FF158D026100         call     [System.IO.Hashing.XxHash128+Hash128:.ctor(ulong,ulong):this]
       C5F810442448         vmovups  xmm0, xmmword ptr [rsp+48H]
       C5F811442458         vmovups  xmmword ptr [rsp+58H], xmm0
                                                ;; size=74 bbWeight=0.50 PerfScore 8.04
G_M5885_IG08:              ;; offset=017FH
       C5F810442458         vmovups  xmm0, xmmword ptr [rsp+58H]
       C5F811442470         vmovups  xmmword ptr [rsp+70H], xmm0
       EB40                 jmp      SHORT G_M5885_IG12
                                                ;; size=14 bbWeight=0.50 PerfScore 3.00
G_M5885_IG09:              ;; offset=018DH
       4181F880000000       cmp      r8d, 128
       7710                 ja       SHORT G_M5885_IG10
       488D4C2470           lea      rcx, [rsp+70H]
       4533C9               xor      r9d, r9d
       FF15CCFE6000         call     [System.IO.Hashing.XxHash128:HashLength17To128(ulong,uint,ulong):System.IO.Hashing.XxHash128+Hash128]
       EB27                 jmp      SHORT G_M5885_IG12
                                                ;; size=25 bbWeight=0.50 PerfScore 3.50
G_M5885_IG10:              ;; offset=01A6H
       4181F8F0000000       cmp      r8d, 240
       7710                 ja       SHORT G_M5885_IG11
       488D4C2470           lea      rcx, [rsp+70H]
       4533C9               xor      r9d, r9d
       FF15CBFE6000         call     [System.IO.Hashing.XxHash128:HashLength129To240(ulong,uint,ulong):System.IO.Hashing.XxHash128+Hash128]
       EB0E                 jmp      SHORT G_M5885_IG12
                                                ;; size=25 bbWeight=0.50 PerfScore 3.50
G_M5885_IG11:              ;; offset=01BFH
       488D4C2470           lea      rcx, [rsp+70H]
       4533C9               xor      r9d, r9d
       FF15D3FE6000         call     [System.IO.Hashing.XxHash128:HashLengthOver240(ulong,uint,ulong):System.IO.Hashing.XxHash128+Hash128]
                                                ;; size=14 bbWeight=0.50 PerfScore 1.88
G_M5885_IG12:              ;; offset=01CDH
       33D2                 xor      rdx, rdx
       4889542468           mov      bword ptr [rsp+68H], rdx
       C5F810442470         vmovups  xmm0, xmmword ptr [rsp+70H]
       C5F811842480000000   vmovups  xmmword ptr [rsp+80H], xmm0
       48896C2428           mov      bword ptr [rsp+28H], rbp
       4489742430           mov      dword ptr [rsp+30H], r14d
       488D542428           lea      rdx, [rsp+28H]
       488D8C2480000000     lea      rcx, [rsp+80H]
       FF15F8FD6000         call     [System.IO.Hashing.XxHash128:WriteBigEndian128(byref,System.Span`1[ubyte])]
       EB07                 jmp      SHORT G_M5885_IG14
                                                ;; size=53 bbWeight=0.50 PerfScore 6.62
G_M5885_IG13:              ;; offset=0202H
       FF1588FC6000         call     [System.IO.Hashing.NonCryptographicHashAlgorithm:ThrowDestinationTooShort()]
       CC                   int3
                                                ;; size=7 bbWeight=0.50 PerfScore 1.62
G_M5885_IG14:              ;; offset=0209H
       488BC3               mov      rax, rbx
                                                ;; size=3 bbWeight=1 PerfScore 0.25
G_M5885_IG15:              ;; offset=020CH
       4881C490000000       add      rsp, 144
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       415E                 pop      r14
       C3                   ret
                                                ;; size=14 bbWeight=1 PerfScore 3.75
G_M5885_IG16:              ;; offset=021AH
       B9C1000000           mov      ecx, 193
       48BA30CAB65AFB7F0000 mov      rdx, 0x7FFB5AB6CA30
       E8B22E535F           call     CORINFO_HELP_STRCNS
       488BC8               mov      rcx, rax
       FF15F99E1C00         call     [System.ArgumentNullException:Throw(System.String)]
       CC                   int3
                                                ;; size=30 bbWeight=0 PerfScore 0.00

; Total bytes of code 568, prolog size 51, PerfScore 136.34, instruction count 135, allocated bytes for code 568 (MethodHash=6f80e902) for method P:Test(ubyte[]):ubyte[]
; ============================================================

The codegen contains several simple non-inlined calls due to "exceeds inlining budget", here is the tree of decisions:

image

After this PR:

image

Author: EgorBo
Assignees: EgorBo
Labels:

area-System.IO

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Feb 2, 2023

Benchmarks:

public static IEnumerable<byte[]> TestData()
{
    yield return new byte[1];
    yield return new byte[6];
    yield return new byte[12];
    yield return new byte[17];
}

[Benchmark]
[ArgumentsSource(nameof(TestData))]
public byte[] Test(byte[] data) => XxHash128.Hash(data);

Before:

| Method |     data |     Mean |    Error |   StdDev |
|------- |--------- |---------:|---------:|---------:|
|   Test | Byte[12] | 15.49 ns | 0.147 ns | 0.138 ns |
|   Test | Byte[17] | 15.30 ns | 0.072 ns | 0.060 ns |
|   Test |  Byte[1] | 15.93 ns | 0.050 ns | 0.039 ns |
|   Test |  Byte[6] | 13.74 ns | 0.033 ns | 0.029 ns |

After:

| Method |     data |      Mean |     Error |    StdDev |
|------- |--------- |----------:|----------:|----------:|
|   Test | Byte[12] | 10.031 ns | 0.1387 ns | 0.1297 ns |
|   Test | Byte[17] | 11.185 ns | 0.1147 ns | 0.1073 ns |
|   Test |  Byte[1] |  8.670 ns | 0.0181 ns | 0.0160 ns |
|   Test |  Byte[6] |  8.773 ns | 0.0303 ns | 0.0268 ns |

@EgorBo
Copy link
Member Author

EgorBo commented Feb 2, 2023

PTAL @stephentoub

I noticed this when I was inspecting all inliner's decisions "exceeds budget". Unfortunately there is not much we can do in jit here - it's really a lot of work for jit to do and it has to stop at some point.

@stephentoub
Copy link
Member

Thanks. Does XxHash3 suffer similarly?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 2, 2023

Thanks. Does XxHash3 suffer similarly?

Nope, I only noticed XxHash128, in fact the list of methods which don't inline AggressiveInlining due to over-budget is:

        1137 (174.92% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyName(long):this
         675 (139.75% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHash128:Hash(ubyte[]):ubyte[]
        1141 (131.30% of base) : System.Private.CoreLib.dasm - System.UInt128:TryFormat(System.Span`1[ushort],byref,System.ReadOnlySpan`1[ushort],System.IFormatProvider):bool:this
         888 (117.77% of base) : System.Private.CoreLib.dasm - System.Int128:TryFormat(System.Span`1[ushort],byref,System.ReadOnlySpan`1[ushort],System.IFormatProvider):bool:this
        1735 (117.23% of base) : System.Private.CoreLib.dasm - System.Number:TryFormatInt128(System.Int128,System.ReadOnlySpan`1[ushort],System.IFormatProvider,System.Span`1[ushort],byref):bool
         680 (110.93% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyName(System.Text.Json.JsonEncodedText):this
         922 (100.44% of base) : System.Private.CoreLib.dasm - System.Number:TryFormatUInt128(System.UInt128,System.ReadOnlySpan`1[ushort],System.IFormatProvider,System.Span`1[ushort],byref):bool
        1191 (98.84% of base) : System.Formats.Asn1.dasm - System.Formats.Asn1.AsnWriter:WriteUtcTimeCore(System.Formats.Asn1.Asn1Tag,System.DateTimeOffset):this
         636 (74.39% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyName(ulong):this
         535 (70.77% of base) : System.Text.Json.dasm - System.Text.Json.JsonSerializer:WriteMetadataForCollection(System.Text.Json.Serialization.JsonConverter,byref,System.Text.Json.Utf8JsonWriter):ubyte
         433 (68.62% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyNameHelper(System.ReadOnlySpan`1[ubyte]):this
         372 (64.47% of base) : System.Private.CoreLib.dasm - System.IntPtr:TryFormat(System.Span`1[ushort],byref,System.ReadOnlySpan`1[ushort],System.IFormatProvider):bool:this
         433 (58.28% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyName(System.DateTime):this
         433 (58.28% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyName(System.DateTimeOffset):this
         330 (52.22% of base) : System.Private.CoreLib.dasm - System.Int64:TryFormat(System.Span`1[ushort],byref,System.ReadOnlySpan`1[ushort],System.IFormatProvider):bool:this
         399 (51.09% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHash128:Hash(System.ReadOnlySpan`1[ubyte],long):ubyte[]
         351 (40.25% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHash128:Hash(System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],long):int
         255 (33.12% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHash128:HashToUInt128(System.ReadOnlySpan`1[ubyte],long):System.UInt128
         278 (29.48% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHash128:Hash(ubyte[],long):ubyte[]
         150 (25.47% of base) : System.Private.CoreLib.dasm - System.Int32:TryFormat(System.Span`1[ushort],byref,System.ReadOnlySpan`1[ushort],System.IFormatProvider):bool:this
         196 (22.53% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyNameUnescaped(System.ReadOnlySpan`1[ubyte]):this
         212 (21.86% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyName(System.Decimal):this
         208 (21.62% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyName(bool):this
         113 (21.52% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHash3:Hash(ubyte[]):ubyte[]
         203 (21.04% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyName(System.Guid):this
         201 (20.57% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyName(float):this
         199 (19.45% of base) : System.Text.Json.dasm - System.Text.Json.Utf8JsonWriter:WritePropertyName(double):this
         174 (18.77% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHash128:TryHash(System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],byref,long):bool
          68 (12.90% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.PEModule:GetMemberRefNameOrThrow(System.Reflection.Metadata.MetadataReader,System.Reflection.Metadata.MemberReferenceHandle):System.String
          93 ( 9.46% of base) : System.Reflection.MetadataLoadContext.dasm - System.Reflection.TypeLoading.Ecma.EcmaDefinitionType:IsTypeNameEqual(System.ReadOnlySpan`1[ubyte],System.ReadOnlySpan`1[ubyte]):bool:this
          47 ( 7.15% of base) : System.Private.CoreLib.dasm - System.Int16:TryFormat(System.Span`1[ushort],byref,System.ReadOnlySpan`1[ushort],System.IFormatProvider):bool:this
          47 ( 7.15% of base) : System.Private.CoreLib.dasm - System.SByte:TryFormat(System.Span`1[ushort],byref,System.ReadOnlySpan`1[ushort],System.IFormatProvider):bool:this
          89 ( 6.32% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.PEModule:GetTypeDefPropsOrThrow(System.Reflection.Metadata.TypeDefinitionHandle,byref,byref,byref,byref):this
          23 ( 5.57% of base) : System.DirectoryServices.Protocols.dasm - System.DirectoryServices.Protocols.TlsOperationException:.ctor(System.DirectoryServices.Protocols.DirectoryResponse):this
          38 ( 4.29% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToDate(System.String):System.DateTime
          38 ( 4.29% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToDateTime(System.String):System.DateTime
          38 ( 4.29% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToGDay(System.String):System.DateTime
          38 ( 4.29% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToGMonthDay(System.String):System.DateTime
          38 ( 4.29% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToGYear(System.String):System.DateTime
          38 ( 4.29% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToGYearMonth(System.String):System.DateTime
          38 ( 4.29% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToTime(System.String):System.DateTime
          33 ( 4.24% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.AesCng:CreateEncryptor(ubyte[],ubyte[]):System.Security.Cryptography.ICryptoTransform:this
          33 ( 4.24% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.TripleDESCng:CreateEncryptor(ubyte[],ubyte[]):System.Security.Cryptography.ICryptoTransform:this
          38 ( 4.24% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToDateOffset(System.String):System.DateTimeOffset
          38 ( 4.24% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToDateTimeOffset(System.String):System.DateTimeOffset
          38 ( 4.24% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToGDayOffset(System.String):System.DateTimeOffset
          38 ( 4.24% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToGMonthDayOffset(System.String):System.DateTimeOffset
          38 ( 4.24% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToGYearMonthOffset(System.String):System.DateTimeOffset
          38 ( 4.24% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToGYearOffset(System.String):System.DateTimeOffset
          38 ( 4.24% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlBaseConverter:StringToTimeOffset(System.String):System.DateTimeOffset
          30 ( 3.86% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.AesCng:CreateDecryptor(ubyte[],ubyte[]):System.Security.Cryptography.ICryptoTransform:this
          30 ( 3.86% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.TripleDESCng:CreateDecryptor(ubyte[],ubyte[]):System.Security.Cryptography.ICryptoTransform:this
          20 ( 2.18% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.MetadataReaderExtensions:IsTheObjectClass(System.Reflection.Metadata.MetadataReader,System.Reflection.Metadata.TypeDefinition):bool
           9 ( 1.53% of base) : System.Reflection.MetadataLoadContext.dasm - System.Reflection.TypeLoading.Ecma.EcmaField:ComputeName():System.String:this
          16 ( 1.45% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.PEModule:GetMemberRefPropsOrThrow(System.Reflection.Metadata.MemberReferenceHandle,byref,byref,byref):this

the JSON ones worth taking a look too I guess

@EgorBo
Copy link
Member Author

EgorBo commented Feb 2, 2023

Ah, .XxHash3 is also here

@bartonjs
Copy link
Member

bartonjs commented Feb 2, 2023

1191 (98.84% of base) : System.Formats.Asn1.dasm - System.Formats.Asn1.AsnWriter:WriteUtcTimeCore(System.Formats.Asn1.Asn1Tag,System.DateTimeOffset):this

I don't see an aggressive inlining marker on that method at all:

}
// T-REC-X.680-201508 sec 47
// T-REC-X.690-201508 sec 11.8
private void WriteUtcTimeCore(Asn1Tag tag, DateTimeOffset value)
{
// Because UtcTime is IMPLICIT VisibleString it technically can have
// a constructed form.
// DER says character strings must be primitive.
// CER says character strings <= 1000 encoded bytes must be primitive.
// So we'll just make BER be primitive, too.
Debug.Assert(!tag.IsConstructed);
WriteTag(tag);
// BER allows for omitting the seconds, but that's not an option we need to expose.

Perhaps I don't understand what the report is reporting?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 2, 2023

Perhaps I don't understand what the report is reporting?

It reports methods in which there were failed inlining attempts inside, e.g. for System.Formats.Asn1.AsnWriter:WriteUtcTimeCore(System.Formats.Asn1.Asn1Tag,System.DateTimeOffset):this

Click me
G_M33131_IG01:
       push     r15
       push     r14
       push     r13
       push     r12
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 136
       vzeroupper 
       xor      eax, eax
       mov      qword ptr [rsp+30H], rax
       mov      qword ptr [rsp+D8H], rdx
       mov      rsi, rcx
       mov      rdi, r8
						;; size=43 bbWeight=1 PerfScore 12.00
G_M33131_IG02:
       mov      rcx, rsi
       mov      rdx, qword ptr [rsp+D8H]
       call     [System.Formats.Asn1.AsnWriter:WriteTag(System.Formats.Asn1.Asn1Tag):this]
       mov      rcx, rsi
       mov      edx, 13
       call     [System.Formats.Asn1.AsnWriter:WriteLength(int):this]
       mov      rdi, qword ptr [rdi+08H]
       mov      rbx, 0xD1FFAB1E
       and      rdi, rbx
       mov      rbx, 0xD1FFAB1E
       or       rdi, rbx
       vxorps   xmm0, xmm0, xmm0
       vmovups  xmmword ptr [rsp+60H], xmm0
       mov      rcx, 0xD1FFAB1E
       and      rcx, rdi
       je       SHORT G_M33131_IG04
						;; size=86 bbWeight=1 PerfScore 13.58
G_M33131_IG03:
       mov      rax, 0xD1FFAB1E
       cmp      rcx, rax
       je       SHORT G_M33131_IG06
						;; size=15 bbWeight=0.50 PerfScore 0.75
G_M33131_IG04:
       mov      rcx, 0xD1FFAB1E      ; data for System.TimeZoneInfo:s_cachedData
       mov      rbx, gword ptr [rcx]
       mov      rcx, gword ptr [rbx+08H]
       test     rcx, rcx
       jne      SHORT G_M33131_IG05
       mov      rcx, rbx
       call     [System.TimeZoneInfo+CachedData:CreateLocal():System.TimeZoneInfo:this]
       mov      rcx, rax
						;; size=34 bbWeight=0.50 PerfScore 4.50
G_M33131_IG05:
       mov      rdx, rdi
       mov      r9, rbx
       mov      r8d, 2
       cmp      dword ptr [rcx], ecx
       call     [System.TimeZoneInfo:GetUtcOffset(System.DateTime,int,System.TimeZoneInfo+CachedData):System.TimeSpan:this]
       mov      rbx, rax
       jmp      SHORT G_M33131_IG07
						;; size=25 bbWeight=0.50 PerfScore 4.50
G_M33131_IG06:
       xor      ebx, ebx
						;; size=2 bbWeight=0.50 PerfScore 0.12
G_M33131_IG07:
       mov      rcx, rbx
       call     [System.DateTimeOffset:ValidateOffset(System.TimeSpan):short]
       mov      word  ptr [rsp+60H], ax
       mov      rcx, rdi
       mov      rdx, rbx
       call     [System.DateTimeOffset:ValidateDate(System.DateTime,System.TimeSpan):System.DateTime]
       mov      qword ptr [rsp+68H], rax
       vmovups  xmm0, xmmword ptr [rsp+60H]
       vmovups  xmmword ptr [rsp+78H], xmm0
       lea      rcx, [rsp+78H]
       call     [System.DateTimeOffset:get_ClockDateTime():System.DateTime:this]
       mov      qword ptr [rsp+58H], rax
       lea      rcx, [rsp+58H]
       call     [System.DateTime:get_Year():int:this]
       mov      edi, eax
       lea      rcx, [rsp+78H]
       call     [System.DateTimeOffset:get_ClockDateTime():System.DateTime:this]
       mov      qword ptr [rsp+50H], rax
       lea      rcx, [rsp+50H]
       call     [System.DateTime:get_Month():int:this]
       mov      ebx, eax
       lea      rcx, [rsp+78H]
       call     [System.DateTimeOffset:get_ClockDateTime():System.DateTime:this]
       mov      qword ptr [rsp+48H], rax
       lea      rcx, [rsp+48H]
       call     [System.DateTime:get_Day():int:this]
       mov      ebp, eax
       lea      rcx, [rsp+78H]
       call     [System.DateTimeOffset:get_ClockDateTime():System.DateTime:this]
       mov      rdx, 0xD1FFAB1E
       and      rdx, rax
       mov      rcx, 0xD1FFAB1E
       mov      rax, rdx
       mul      rdx:rax, rcx
       shr      rdx, 33
       mov      ecx, 0xD1FFAB1E
       mov      eax, edx
       imul     rcx, rax
       shr      rcx, 36
       imul     ecx, ecx, 24
       mov      r14d, edx
       sub      r14d, ecx
       lea      rcx, [rsp+78H]
       call     [System.DateTimeOffset:get_ClockDateTime():System.DateTime:this]
       mov      rdx, 0xD1FFAB1E
       and      rdx, rax
       mov      rcx, 0xD1FFAB1E
       mov      rax, rdx
       mul      rdx:rax, rcx
       mov      r15, rdx
       shr      r15, 26
       mov      rdx, 0xD1FFAB1E
       mov      rax, r15
       mul      rdx:rax, rdx
       imul     rcx, rdx, 60
       sub      r15, rcx
       lea      rcx, [rsp+78H]
       call     [System.DateTimeOffset:get_ClockDateTime():System.DateTime:this]
       mov      rdx, 0xD1FFAB1E
       and      rdx, rax
       mov      rcx, 0xD1FFAB1E
       mov      rax, rdx
       mul      rdx:rax, rcx
       mov      r12, rdx
       shr      r12, 22
       mov      rdx, 0xD1FFAB1E
						;; size=325 bbWeight=1 PerfScore 73.50
G_M33131_IG08:
       mov      rax, r12
       mul      rdx:rax, rdx
       imul     rax, rdx, 60
       sub      r12, rax
       mov      rax, gword ptr [rsi+08H]
       mov      edx, dword ptr [rsi+18H]
       test     rax, rax
       jne      SHORT G_M33131_IG10
						;; size=25 bbWeight=1 PerfScore 10.75
G_M33131_IG09:
       test     edx, edx
       jne      G_M33131_IG23
       xor      r13, r13
       xor      r10d, r10d
       jmp      SHORT G_M33131_IG11
						;; size=16 bbWeight=0.50 PerfScore 1.88
G_M33131_IG10:
       cmp      dword ptr [rax+08H], edx
       jb       G_M33131_IG23
       mov      r10d, edx
       lea      r13, bword ptr [rax+r10+10H]
       mov      r10d, dword ptr [rax+08H]
       sub      r10d, edx
						;; size=24 bbWeight=0.50 PerfScore 3.75
G_M33131_IG11:
       mov      dword ptr [rsp+44H], r10d
       cmp      r10d, 2
       jb       G_M33131_IG23
       mov      edx, 0xD1FFAB1E
       mov      eax, edx
       imul     edx:eax, edi
       mov      ecx, edx
       shr      ecx, 31
       sar      edx, 5
       add      ecx, edx
       imul     ecx, ecx, 100
       sub      edi, ecx
       movsxd   rcx, edi
       xor      r9d, r9d
       test     rcx, rcx
       jge      SHORT G_M33131_IG13
						;; size=50 bbWeight=1 PerfScore 11.25
G_M33131_IG12:
       mov      r9d, 1
       neg      rcx
						;; size=9 bbWeight=0.50 PerfScore 0.25
G_M33131_IG13:
       mov      bword ptr [rsp+30H], r13
       mov      dword ptr [rsp+38H], 2
       lea      rdx, [rsp+70H]
       mov      qword ptr [rsp+20H], rdx
       mov      edx, 2
       lea      r8, [rsp+30H]
       call     [System.Buffers.Text.Utf8Formatter:TryFormatUInt64D(ulong,ubyte,System.Span`1[ubyte],bool,byref):bool]
       test     eax, eax
       je       G_M33131_IG24
       mov      edi, dword ptr [rsp+44H]
       cmp      edi, 4
       jb       G_M33131_IG23
       lea      rcx, bword ptr [r13+02H]
       movsxd   rdx, ebx
       xor      r9d, r9d
       test     rdx, rdx
       jge      SHORT G_M33131_IG15
						;; size=75 bbWeight=1 PerfScore 13.00
G_M33131_IG14:
       mov      r9d, 1
       neg      rdx
						;; size=9 bbWeight=0.50 PerfScore 0.25
G_M33131_IG15:
       mov      bword ptr [rsp+30H], rcx
       mov      dword ptr [rsp+38H], 2
       lea      rcx, [rsp+70H]
       mov      qword ptr [rsp+20H], rcx
       mov      rcx, rdx
       mov      edx, 2
       lea      r8, [rsp+30H]
       call     [System.Buffers.Text.Utf8Formatter:TryFormatUInt64D(ulong,ubyte,System.Span`1[ubyte],bool,byref):bool]
       test     eax, eax
       je       G_M33131_IG24
       cmp      edi, 6
       jb       G_M33131_IG23
       lea      rcx, bword ptr [r13+04H]
       movsxd   rdx, ebp
       xor      r9d, r9d
       test     rdx, rdx
       jge      SHORT G_M33131_IG17
						;; size=74 bbWeight=1 PerfScore 12.25
G_M33131_IG16:
       mov      r9d, 1
       neg      rdx
						;; size=9 bbWeight=0.50 PerfScore 0.25
G_M33131_IG17:
       mov      bword ptr [rsp+30H], rcx
       mov      dword ptr [rsp+38H], 2
       lea      rcx, [rsp+70H]
       mov      qword ptr [rsp+20H], rcx
       mov      rcx, rdx
       mov      edx, 2
       lea      r8, [rsp+30H]
       call     [System.Buffers.Text.Utf8Formatter:TryFormatUInt64D(ulong,ubyte,System.Span`1[ubyte],bool,byref):bool]
       test     eax, eax
       je       G_M33131_IG24
       cmp      edi, 8
       jb       G_M33131_IG23
       lea      rcx, bword ptr [r13+06H]
       movsxd   rdx, r14d
       xor      r9d, r9d
       test     rdx, rdx
       jge      SHORT G_M33131_IG19
						;; size=74 bbWeight=1 PerfScore 12.25
G_M33131_IG18:
       mov      r9d, 1
       neg      rdx
						;; size=9 bbWeight=0.50 PerfScore 0.25
G_M33131_IG19:
       mov      bword ptr [rsp+30H], rcx
       mov      dword ptr [rsp+38H], 2
       lea      rcx, [rsp+70H]
       mov      qword ptr [rsp+20H], rcx
       mov      rcx, rdx
       mov      edx, 2
       lea      r8, [rsp+30H]
       call     [System.Buffers.Text.Utf8Formatter:TryFormatUInt64D(ulong,ubyte,System.Span`1[ubyte],bool,byref):bool]
       test     eax, eax
       je       G_M33131_IG24
       cmp      edi, 10
       jb       G_M33131_IG23
       lea      rcx, bword ptr [r13+08H]
       movsxd   rdx, r15d
       xor      r9d, r9d
       test     rdx, rdx
       jge      SHORT G_M33131_IG21
						;; size=74 bbWeight=1 PerfScore 12.25
G_M33131_IG20:
       mov      r9d, 1
       neg      rdx
						;; size=9 bbWeight=0.50 PerfScore 0.25
G_M33131_IG21:
       mov      bword ptr [rsp+30H], rcx
       mov      dword ptr [rsp+38H], 2
       lea      rcx, [rsp+70H]
       mov      qword ptr [rsp+20H], rcx
       mov      rcx, rdx
       mov      edx, 2
       lea      r8, [rsp+30H]
       call     [System.Buffers.Text.Utf8Formatter:TryFormatUInt64D(ulong,ubyte,System.Span`1[ubyte],bool,byref):bool]
       test     eax, eax
       je       G_M33131_IG24
       cmp      edi, 12
       jb       SHORT G_M33131_IG23
       add      r13, 10
       movsxd   rcx, r12d
       mov      bword ptr [rsp+30H], r13
       mov      dword ptr [rsp+38H], 2
       mov      byte  ptr [rsp+28H], 68
       mov      byte  ptr [rsp+29H], 2
       movsx    r8, word  ptr [rsp+28H]
       mov      dword ptr [rsp+20H], r8d
       lea      r8, [rsp+30H]
       lea      r9, [rsp+70H]
       mov      edx, 0xD1FFAB1E
       call     [System.Buffers.Text.Utf8Formatter:TryFormatInt64(long,ulong,System.Span`1[ubyte],byref,System.Buffers.StandardFormat):bool]
       test     eax, eax
       je       SHORT G_M33131_IG24
       mov      rax, gword ptr [rsi+08H]
       mov      edx, dword ptr [rsi+18H]
       lea      ecx, [rdx+0CH]
       cmp      ecx, dword ptr [rax+08H]
       jae      SHORT G_M33131_IG25
       mov      ecx, ecx
       mov      byte  ptr [rax+rcx+10H], 90
       add      edx, 13
       mov      dword ptr [rsi+18H], edx
						;; size=149 bbWeight=1 PerfScore 35.00
G_M33131_IG22:
       add      rsp, 136
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       ret      
						;; size=20 bbWeight=1 PerfScore 5.25
G_M33131_IG23:
       call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
       int3     
						;; size=7 bbWeight=0 PerfScore 0.00
G_M33131_IG24:
       mov      rcx, 0xD1FFAB1E      ; System.InvalidOperationException
       call     CORINFO_HELP_NEWSFAST
       mov      rsi, rax
       mov      rcx, rsi
       call     [System.InvalidOperationException:.ctor():this]
       mov      rcx, rsi
       call     CORINFO_HELP_THROW
       int3     
						;; size=36 bbWeight=0 PerfScore 0.00
G_M33131_IG25:
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
						;; size=6 bbWeight=0 PerfScore 0.00

; Total bytes of code 1205, prolog size 43, PerfScore 348.33, instruction count 286, allocated bytes for code 1205 (MethodHash=6afe7e94) for method System.Formats.Asn1.AsnWriter:WriteUtcTimeCore(System.Formats.Asn1.Asn1Tag,System.DateTimeOffset):this
; ============================================================
 call     [System.Buffers.Text.Utf8Formatter:TryFormatInt64(long,ulong,System.Span`1[ubyte],byref,System.Buffers.StandardFormat):bool]

was not inlined despite the fact it has AggressiveInlining.

@stephentoub
Copy link
Member

Thanks. Can you share XxHash3 numbers, too?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 2, 2023

Thanks. Can you share XxHash3 numbers, too?

Same benchmark, but changed to XxHash3.
Before

| Method |     data |      Mean |     Error |    StdDev |
|------- |--------- |----------:|----------:|----------:|
|   Test | Byte[12] | 10.031 ns | 0.1387 ns | 0.1297 ns |
|   Test | Byte[17] | 11.185 ns | 0.1147 ns | 0.1073 ns |
|   Test |  Byte[1] |  8.670 ns | 0.0181 ns | 0.0160 ns |
|   Test |  Byte[6] |  8.773 ns | 0.0303 ns | 0.0268 ns |

After

| Method |     data |     Mean |     Error |    StdDev |
|------- |--------- |---------:|----------:|----------:|
|   Test | Byte[12] | 7.004 ns | 0.0427 ns | 0.0378 ns |
|   Test | Byte[17] | 7.787 ns | 0.0110 ns | 0.0086 ns |
|   Test |  Byte[1] | 6.765 ns | 0.0114 ns | 0.0095 ns |
|   Test |  Byte[6] | 6.856 ns | 0.0263 ns | 0.0233 ns |

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. These numbers are all a bit bigger than I remember from my machine, so I'll take a look locally to see if we might have regressed, but the change LGTM.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 2, 2023

These numbers are all a bit bigger than I remember from my machine

6 years-old CPU 🤷‍♂️

@stephentoub
Copy link
Member

Ah, your test is calling XxHash3.Hash as opposed to XxHash3.HashToUInt64. The numbers I had in my head were for the latter, which is cheaper. Mind validating that one before/after as well?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 3, 2023

@stephentoub XxHash3.HashToUInt64 slightly regressed because unlike XxHash3.Hash it has smaller call graph so it manages to inline the whole thing without "exceed budget":

Before
|  Method |     data |     Mean |     Error |    StdDev |
|-------- |--------- |---------:|----------:|----------:|
|   Hash3 | Byte[12] | 1.500 ns | 0.0048 ns | 0.0043 ns |
|   Hash3 | Byte[17] | 3.599 ns | 0.0992 ns | 0.0928 ns |
|   Hash3 |  Byte[1] | 1.922 ns | 0.0012 ns | 0.0010 ns |
|   Hash3 |  Byte[6] | 1.709 ns | 0.0107 ns | 0.0095 ns |

| Hash128 | Byte[12] | 7.829 ns | 0.0423 ns | 0.0331 ns |
| Hash128 | Byte[17] | 6.241 ns | 0.0030 ns | 0.0028 ns |
| Hash128 |  Byte[1] | 8.011 ns | 0.0008 ns | 0.0007 ns |
| Hash128 |  Byte[6] | 7.382 ns | 0.0012 ns | 0.0010 ns |


After:
|  Method |     data |     Mean |     Error |    StdDev |
|-------- |--------- |---------:|----------:|----------:|
|   Hash3 | Byte[12] | 2.057 ns | 0.0210 ns | 0.0197 ns |
|   Hash3 | Byte[17] | 3.518 ns | 0.0015 ns | 0.0012 ns |
|   Hash3 |  Byte[1] | 2.428 ns | 0.0006 ns | 0.0005 ns |
|   Hash3 |  Byte[6] | 2.264 ns | 0.0066 ns | 0.0061 ns |

| Hash128 | Byte[12] | 4.424 ns | 0.0010 ns | 0.0008 ns |
| Hash128 | Byte[17] | 6.264 ns | 0.0136 ns | 0.0120 ns |
| Hash128 |  Byte[1] | 4.195 ns | 0.0057 ns | 0.0050 ns |
| Hash128 |  Byte[6] | 3.579 ns | 0.0004 ns | 0.0003 ns |

Hash128 is fine. So I can remove the workaround for Hash3. In real world XxHash3.HashToUInt64 could be part of some larger call graph so it won't have the same budget. It's just that in my benchmark it's a root call:

public ulong Hash3(byte[] data) => XxHash3.HashToUInt64(data);
Inlines into Prog:Hash3(ubyte[]):ulong:this:
  [INLINED: below ALWAYS_INLINE size] System.ReadOnlySpan`1[ubyte]:op_Implicit(ubyte[]):System.ReadOnlySpan`1[ubyte]
    [INLINED: aggressive inline attribute] System.ReadOnlySpan`1[ubyte]:.ctor(ubyte[]):this
  [INLINED: profitable inline] System.IO.Hashing.XxHash3:HashToUInt64(System.ReadOnlySpan`1[ubyte],long):ulong
    [INLINED: below ALWAYS_INLINE size] System.Runtime.InteropServices.MemoryMarshal:GetReference[ubyte](System.ReadOnlySpan`1[ubyte]):byref
    [INLINED: aggressive inline attribute] System.IO.Hashing.XxHash3:HashLength0To16(ulong,uint,ulong):ulong
      [INLINED: aggressive inline attribute] System.IO.Hashing.XxHash3:HashLength9To16(ulong,uint,ulong):ulong
        [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:ReadUInt64LE(ulong):ulong
          [INLINED: aggressive inline attribute] System.Runtime.CompilerServices.Unsafe:ReadUnaligned[ulong](ulong):ulong
        [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:ReadUInt64LE(ulong):ulong
          [INLINED: aggressive inline attribute] System.Runtime.CompilerServices.Unsafe:ReadUnaligned[ulong](ulong):ulong
        [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:Multiply64To128ThenFold(ulong,ulong):ulong
          [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:Multiply64To128(ulong,ulong,byref):ulong
            [INLINED: aggressive inline attribute] System.Math:BigMul(ulong,ulong,byref):ulong
        [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:Avalanche(ulong):ulong
          [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:XorShift(ulong,int):ulong
          [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:XorShift(ulong,int):ulong
      [INLINED: aggressive inline attribute] System.IO.Hashing.XxHash3:HashLength4To8(ulong,uint,ulong):ulong
        [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:ReadUInt32LE(ulong):uint
          [INLINED: aggressive inline attribute] System.Runtime.CompilerServices.Unsafe:ReadUnaligned[uint](ulong):uint
        [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:ReadUInt32LE(ulong):uint
          [INLINED: aggressive inline attribute] System.Runtime.CompilerServices.Unsafe:ReadUnaligned[uint](ulong):uint
        [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:Rrmxmx(ulong,uint):ulong
          [INLINED: aggressive inline attribute] System.IO.Hashing.XxHashShared:XorShift(ulong,int):ulong
      [INLINED: aggressive inline attribute] System.IO.Hashing.XxHash3:HashLength1To3(ulong,uint,ulong):ulong
        [INLINED: aggressive inline attribute] System.IO.Hashing.XxHash64:Avalanche(ulong):ulong
      [INLINED: aggressive inline attribute] System.IO.Hashing.XxHash64:Avalanche(ulong):ulong
    [FAILED: too many il bytes] System.IO.Hashing.XxHash3:HashLength17To128(ulong,uint,ulong):ulong
    [FAILED: too many il bytes] System.IO.Hashing.XxHash3:HashLength129To240(ulong,uint,ulong):ulong
    [FAILED: inline exceeds budget] System.IO.Hashing.XxHash3:HashLengthOver240(ulong,uint,ulong):ulong

Although, HashLengthOver240 exceeds the budget actually

@stephentoub
Copy link
Member

There's probably too much aggressive inlining in these types, anyway. We should strike the right balance between throughput, code size at each call site, and predictability given that the inlining heuristics will lead to differet results based on usage (which itself is a little disconcerting). What should we do differently in these types to strike that balance better?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 3, 2023

There's probably too much aggressive inlining in these types, anyway. We should strike the right balance between throughput, code size at each call site, and predictability given that the inlining heuristics will lead to differet results based on usage (which itself is a little disconcerting). What should we do differently in these types to strike that balance better?

It seems we can think about doing a proper rejection in JIT instead. Imagine we have a callgraph like this:

A()
    B()
        C()
        D()
        E()
        F()

B() has [AggressiveInlining] and it eats the whole inliner's budget (because B() has a lot of IL) - so we successfully inline B() into A() but then we have no budget for C()...F() despite them being small and super useful to inline. In this case we should sort of go back and reject inlining decision of B() into A() so we can optimize B() properly. cc @AndyAyersMS

@AndyAyersMS
Copy link
Member

There's probably too much aggressive inlining in these types, anyway. We should strike the right balance between throughput, code size at each call site, and predictability given that the inlining heuristics will lead to differet results based on usage (which itself is a little disconcerting). What should we do differently in these types to strike that balance better?

It seems we can think about doing a proper rejection in JIT instead. Imagine we have a callgraph like this:

I would say we have enough trouble not inlining AI methods and I would not like to see event more cases where we decide not to inline them.

InlineStrategy::NoteOutcome has provisions to increase the budget after the jit inlines a force inline, maybe it just needs to increase it more?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 3, 2023

maybe it just needs to increase it more?

I assume we can tweak that based on cases we have in BCL but it's hard to predict results for the real world. I'll experiment with it

@EgorBo EgorBo merged commit a6741d9 into dotnet:main Feb 3, 2023
@EgorBo EgorBo deleted the remove-aggrinlining branch February 3, 2023 16:02
@ghost ghost locked as resolved and limited conversation to collaborators Mar 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants