Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGO: Enable profiled casts by default #96597

Merged
merged 3 commits into from
Jan 8, 2024
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jan 8, 2024

This PR enables PGO for castclass/isinst by default. Since we now have a dedicated tier for instrumentation it should not be a big deal to enable more probes IMO. Profiled casts seem to help LINQ for small inputs since LINQ is typically full of type casts for fast paths (e.g. this).

Benchmark:

[HideColumns("Error", "Job", "StdDev", "Median", "RatioSD")]
public class LinqBenchmarks
{
    IEnumerable<int> _enum1 = new int[10];
    IEnumerable<int> _enum2 = new int[10];

    [Benchmark]
    public bool SequenceEqual() => _enum1.SequenceEqual(_enum2);

    [Benchmark]
    public int ElementAt() => _enum1.ElementAt(5);

    [Benchmark]
    public int Count() => _enum1.Count();

    [Benchmark]
    public List<int> ToList() => _enum1.ToList();
}
Method Toolchain Mean Ratio
SequenceEqual \Core_Root\corerun.exe 9.372 ns 2.96
SequenceEqual \Core_Root_PR\corerun.exe 3.171 ns 1.00
ElementAt \Core_Root\corerun.exe 7.163 ns 4.69
ElementAt \Core_Root_PR\corerun.exe 1.529 ns 1.00
Count \Core_Root\corerun.exe 3.482 ns 3.14
Count \Core_Root_PR\corerun.exe 1.127 ns 1.00
ToList \Core_Root\corerun.exe 16.334 ns 1.30
ToList \Core_Root_PR\corerun.exe 12.332 ns 1.00

PS: SequenceEqual should benefit from #96571 too

Codegen example

// Promote IsList to Tier1 with PGO
for (int i = 0; i < 200; i++)
{
    IsList(new int[10]); // source is always int[]
    Thread.Sleep(10);
}

[MethodImpl(MethodImplOptions.NoInlining)]
bool IsList<T>(IEnumerable<T> source) => source is IList<T>;

Codegen diff (Main vs PR) for IsList:

; Assembly listing for method IsList
G_M12490_IG01:
       sub      rsp, 40
G_M12490_IG02:
+      mov      rax, rcx
+      test     rax, rax                 ; if (source == null)
+      je       SHORT G_M12490_IG05
+G_M12490_IG03:
+      mov      rdx, 0x7FF941DE9750
+      cmp      qword ptr [rax], rdx
+      je       SHORT G_M12490_IG05      ; if (source is int[])
+G_M12490_IG04:
       mov      rdx, rcx
       mov      rcx, 0x7FF942439A60
       call     CORINFO_HELP_ISINSTANCEOFINTERFACE ; slow fallback
+G_M12490_IG05:
       test     rax, rax
       setne    al
       movzx    rax, al
-G_M12490_IG03:
+G_M12490_IG06:
       add      rsp, 40
       ret      
; Total bytes of code 59

@AndyAyersMS @dotnet/jit-contrib do you think this should go through an internal run of dotnet/performance first?
I tested impact on TE (no impact on startup and rps) and a few desktop app. Also, I did a run of 250 LINQ benchmarks in dotnet/performance (10% of them improved, none regressed).

@ghost ghost assigned EgorBo Jan 8, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 8, 2024
@ghost
Copy link

ghost commented Jan 8, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR enables PGO for castclass/isinst by default. Since we now have a dedicated tier for instrumentation it's no longer a big deal to enable more probes IMO. Profiled casts seem to help a lot to LINQ for small inputs since LINQ is typically full of type casts for fast paths (e.g. this).

Benchmark:

IEnumerable<int> _enum1 = new int[10];
IEnumerable<int> _enum2 = new int[10];

[Benchmark]
public bool SequenceEqual() => _enum1.SequenceEqual(_enum2);

[Benchmark]
public int ElementAt() => _enum1.ElementAt(5);

[Benchmark]
public int Count() => _enum1.Count();
Method Job Toolchain Mean Ratio
SequenceEqual Job-YKPWSW \Core_Root\corerun.exe 9.372 ns 2.96
SequenceEqual Job-AYWKCJ \Core_Root_PR\corerun.exe 3.171 ns 1.00
ElementAt Job-YKPWSW \Core_Root\corerun.exe 7.163 ns 4.69
ElementAt Job-AYWKCJ \Core_Root_PR\corerun.exe 1.529 ns 1.00
Count Job-YKPWSW \Core_Root\corerun.exe 3.482 ns 3.14
Count Job-AYWKCJ \Core_Root_PR\corerun.exe 1.127 ns 1.00

Codegen example

// Promote IsList to Tier1 with PGO
for (int i = 0; i < 200; i++)
{
    IsList(new int[10]); // source is always int[]
    Thread.Sleep(10);
}

[MethodImpl(MethodImplOptions.NoInlining)]
bool IsList<T>(IEnumerable<T> source) => source is IList<T>;
; Assembly listing for method IsList
G_M12490_IG01:
       sub      rsp, 40
G_M12490_IG02:
+      mov      rax, rcx
+      test     rax, rax                 ; if (source == null)
+      je       SHORT G_M12490_IG05
+G_M12490_IG03:
+      mov      rdx, 0x7FF941DE9750
+      cmp      qword ptr [rax], rdx
+      je       SHORT G_M12490_IG05      ; if (source is int[])
+G_M12490_IG04:
       mov      rdx, rcx
       mov      rcx, 0x7FF942439A60
       call     CORINFO_HELP_ISINSTANCEOFINTERFACE ; slow fallback
+G_M12490_IG05:
       test     rax, rax
       setne    al
       movzx    rax, al
-G_M12490_IG03:
+G_M12490_IG06:
       add      rsp, 40
       ret      
; Total bytes of code 59

@AndyAyersMS do you think this should go through an internal run of dotnet/performance first?
I tested impact on TE (no impact on startup and rps) and a few desktop app. Also, I did a run of 250 LINQ benchmarks in dotnet/performance (10% of them improved, none regressed).

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Jan 8, 2024

/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo, runtime-coreclr pgostress

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@stephentoub
Copy link
Member

Thanks. If this is isinst/castclass only, I assume it doesn't cover a pattern like this?

// Use `GetType() == typeof(...)` rather than `is` to avoid cast helpers. This is measurably cheaper
// but does mean we could end up missing some rare cases where we could get a span but don't (e.g. a uint[]
// masquerading as an int[]). That's an acceptable tradeoff. The Unsafe usage is only after we've
// validated the exact type; this could be changed to a cast in the future if the JIT starts to recognize it.
// We only pay the comparison/branching costs here for super common types we expect to be used frequently
// with LINQ methods.
bool result = true;
if (source.GetType() == typeof(TSource[]))
{
span = Unsafe.As<TSource[]>(source);
}
else if (source.GetType() == typeof(List<TSource>))
{
span = CollectionsMarshal.AsSpan(Unsafe.As<List<TSource>>(source));
}
else
{
span = default;
result = false;
}

(or is that covered separately?)

@EgorBo
Copy link
Member Author

EgorBo commented Jan 8, 2024

Thanks. If this is isinst/castclass only, I assume it doesn't cover a pattern like this?

// Use `GetType() == typeof(...)` rather than `is` to avoid cast helpers. This is measurably cheaper
// but does mean we could end up missing some rare cases where we could get a span but don't (e.g. a uint[]
// masquerading as an int[]). That's an acceptable tradeoff. The Unsafe usage is only after we've
// validated the exact type; this could be changed to a cast in the future if the JIT starts to recognize it.
// We only pay the comparison/branching costs here for super common types we expect to be used frequently
// with LINQ methods.
bool result = true;
if (source.GetType() == typeof(TSource[]))
{
span = Unsafe.As<TSource[]>(source);
}
else if (source.GetType() == typeof(List<TSource>))
{
span = CollectionsMarshal.AsSpan(Unsafe.As<List<TSource>>(source));
}
else
{
span = default;
result = false;
}

(or is that covered separately?)

This code looks like it's already devirtualized by hands so there is nothing JIT can do additionally here. Although, you can convert it back to is TSource[] arr if you want (but it might regress NAOT that typically doesn't have any PGO data and Mono).

One benefit of is TSource[] arr would be that uint[]/int[] case 🙂

@stephentoub
Copy link
Member

you can convert it back to is TSource[] arr if you want
One benefit of is TSource[] arr would be that uint[]/int[] case 🙂

It was written this way to explicitly avoid the variance costs

@EgorBo EgorBo marked this pull request as ready for review January 8, 2024 09:54
@EgorBo EgorBo requested a review from AndyAyersMS January 8, 2024 09:55
@EgorBo
Copy link
Member Author

EgorBo commented Jan 8, 2024

PGO pipelines fail with #96612 (unrelated issue)

@AndyAyersMS
Copy link
Member

As a follow up you might look at whether cloning can profitably key off of one of these casts. Right now type test cloning requires OMF_HAS_GUARDEDDEVIRT.

@EgorBo
Copy link
Member Author

EgorBo commented Jan 8, 2024

As a follow up you might look at whether cloning can profitably key off of one of these casts. Right now type test cloning requires OMF_HAS_GUARDEDDEVIRT.

Ah good point, will file an issue for it!

@EgorBo EgorBo merged commit f21dc6c into dotnet:main Jan 8, 2024
129 checks passed
@EgorBo EgorBo deleted the enable-pgo-casts branch January 8, 2024 23:48
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants