Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Remove BB limit from importer_vectorization #66534

Merged
merged 1 commit into from
Mar 14, 2022

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Mar 12, 2022

#66529 where due to large method size we end up with the following codegen for all loweredValue.SequenceEqual("<literal>".AsSpan()):

       mov       rdx,1FCA7409060	
       mov       rdx,[rdx]	
       lea       rcx,[rbp+4C0]	
       call      System.MemoryExtensions.AsSpan(System.String)	
       mov       [rbp+10],rbx	
       mov       [rbp+18],r14d	
       mov       rcx,[rbp+4C0]	
       mov       [rbp+20],rcx	
       mov       ecx,[rbp+4C8]	
       mov       [rbp+28],ecx	
       lea       rcx,[rbp+10]	
       lea       rdx,[rbp+20]	
       call      System.MemoryExtensions.SequenceEqual[[System.Char, System.Private.CoreLib]](System.Span`1<Char>, System.ReadOnlySpan`1<Char>)

so let's just unconditionally enable #65288 - surprisingly, it makes code even smaller.

#66529 Benchmark:

Method Toolchain Mean Error StdDev Ratio
SpanBlack \Core_Root_base\corerun.exe 49.13 ns 0.048 ns 0.037 ns 3.05
SpanBlack \Core_Root_PR\corerun.exe 16.11 ns 0.317 ns 0.297 ns 1.00
SpanLightGoldenrodYellowk \Core_Root_base\corerun.exe 65.72 ns 0.150 ns 0.126 ns 2.08
SpanLightGoldenrodYellowk \Core_Root_PR\corerun.exe 31.55 ns 0.229 ns 0.215 ns 1.00

2-3x improvement

Codegen diff for GetNamedColorSpan: https://www.diffchecker.com/bFuRMHNP

SPMI Diffs are interesting - https://dev.azure.com/dnceng/public/_build/results?buildId=1660127&view=ms.vss-build-web.run-extensions-tab (I assume they caught all switches over string literals)

benchmarks.run.Linux.x64.checked.mch:
Total bytes of delta: -10162 (-0.06 % of base)

coreclr_tests.pmi.Linux.x64.checked.mch:
Total bytes of delta: -5643 (-0.00 % of base)

libraries.crossgen2.Linux.x64.checked.mch:
Total bytes of delta: 3730 (0.03 % of base)

libraries.pmi.Linux.x64.checked.mch:
Total bytes of delta: -30098 (-0.06 % of base)

libraries_tests.pmi.Linux.x64.checked.mch:
Total bytes of delta: -36095 (-0.03 % of base)

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 12, 2022
@ghost ghost assigned EgorBo Mar 12, 2022
@ghost
Copy link

ghost commented Mar 12, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

#66529 where due to large method size we end up with the following codegen for all loweredValue.SequenceEqual("<literal>".AsSpan()):

       mov       rdx,1FCA7409060	
       mov       rdx,[rdx]	
       lea       rcx,[rbp+4C0]	
       call      System.MemoryExtensions.AsSpan(System.String)	
       mov       [rbp+10],rbx	
       mov       [rbp+18],r14d	
       mov       rcx,[rbp+4C0]	
       mov       [rbp+20],rcx	
       mov       ecx,[rbp+4C8]	
       mov       [rbp+28],ecx	
       lea       rcx,[rbp+10]	
       lea       rdx,[rbp+20]	
       call      System.MemoryExtensions.SequenceEqual[[System.Char, System.Private.CoreLib]](System.Span`1<Char>, System.ReadOnlySpan`1<Char>)

so let's just unconditionally enable #65288 - surprisingly, it makes code even smaller.

#66529 Benchmark:

Method Toolchain Mean Error StdDev Ratio
SpanBlack \Core_Root_base\corerun.exe 49.13 ns 0.048 ns 0.037 ns 3.05
SpanBlack \Core_Root_PR\corerun.exe 16.11 ns 0.317 ns 0.297 ns 1.00
SpanLightGoldenrodYellowk \Core_Root_base\corerun.exe 65.72 ns 0.150 ns 0.126 ns 2.08
SpanLightGoldenrodYellowk \Core_Root_PR\corerun.exe 31.55 ns 0.229 ns 0.215 ns 1.00

Codegen diff for GetNamedColorSpan: https://www.diffchecker.com/bFuRMHNP

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Mar 12, 2022

@dotnet/jit-contrib @AndyAyersMS PTAL

@stephentoub
Copy link
Member

surprisingly, it makes code even smaller.

Thanks, Egor. I was surprised in the repro when Eric showed it to me last night that even AsSpan wasn't being inlined. Separate from this change, have we done an experiment around what things would look like if we always, unconditionally inlined AsSpan and span's ctors? I wonder if there's a set of critical, core, tiny APIs where we end up falling off a cliff due to inlining budget and we'd be better off with an internal MethodImplOptions.ForceInliningNoReallyThisNeedsToBeInlined setting.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 12, 2022

surprisingly, it makes code even smaller.

Thanks, Egor. I was surprised in the repro when Eric showed it to me last night that even AsSpan wasn't being inlined. Separate from this change, have we done an experiment around what things would look like if we always, unconditionally inlined AsSpan and span's ctors? I wonder if there's a set of critical, core, tiny APIs where we end up falling off a cliff due to inlining budget and we'd be better off with an internal MethodImplOptions.ForceInliningNoReallyThisNeedsToBeInlined setting.

We plan to address it eventually with partial pre-scan to be able to go over-budget, I think there must be already an issue filed for it.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 12, 2022

Failures are not related, I suspect it's DST difference + #66540

@kasperk81
Copy link
Contributor

652 total methods with Code Size differences (428 improved, 224 regressed), 1 unchanged.

what caused the regressions?

@EgorBo
Copy link
Member Author

EgorBo commented Mar 13, 2022

652 total methods with Code Size differences (428 improved, 224 regressed), 1 unchanged.

what caused the regressions?

It seems like it's always a size improvement to unroll SequenceEquals for spans, but it's often a size regression for string objects.
However, I checked few size regressions and all of them were "performance" improvements so I leave it as is.

@jkotas
Copy link
Member

jkotas commented Mar 13, 2022

I suspect it's DST difference

Opened #66555

@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Mar 14, 2022
@EgorBo
Copy link
Member Author

EgorBo commented Mar 14, 2022

ping @dotnet/jit-contrib simple change

@EgorBo EgorBo merged commit 3fd6148 into dotnet:main Mar 14, 2022
radekdoulik pushed a commit to radekdoulik/runtime that referenced this pull request Mar 30, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Apr 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants