Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove fgMorphCastedBitwiseOp #86491

Merged
merged 1 commit into from
May 19, 2023
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented May 19, 2023

Seems like this opt is not very useful (only a few diffs in libraries.pmi and benchmarks), but at the same time it manages to significantly regress some benchmarks due to store-forward stalls #85957

Benchmark:

[Benchmark]
[Arguments(ulong.MinValue)]
[Arguments((ulong)12345)]
public bool FormatterUInt64(ulong value) => 
    Utf8Formatter.TryFormat(value, _destination, out _);
Method Toolchain value Mean
FormatterUInt64 \Core_Root_base\corerun.exe 0 4.254 ns
FormatterUInt64 \Core_Root_PR\corerun.exe 0 1.620 ns
FormatterUInt64 \Core_Root_base\corerun.exe 12345 4.989 ns
FormatterUInt64 \Core_Root_PR\corerun.exe 12345 2.577 ns

It's not a solution for such stalls in JIT's codegen, but I don't see clear benefits in having this opt in JIT

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 19, 2023
@ghost ghost assigned EgorBo May 19, 2023
@ghost
Copy link

ghost commented May 19, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Seems like this opt is not very useful (only a few diffs in libraries.pmi and benchmarks), but at the same time it manages to significantly regress some benchmarks due to store-forward stalls #85957

Benchmark:

[Benchmark]
[Arguments(ulong.MinValue)]
[Arguments((ulong)12345)]
public bool FormatterUInt64(ulong value) => 
    Utf8Formatter.TryFormat(value, _destination, out _);
Method Toolchain value Mean
FormatterUInt64 \Core_Root_base\corerun.exe 0 4.254 ns
FormatterUInt64 \Core_Root_PR\corerun.exe 0 1.620 ns
FormatterUInt64 \Core_Root_base\corerun.exe 12345 4.989 ns
FormatterUInt64 \Core_Root_PR\corerun.exe 12345 2.577 ns

It's not a solution for such stalls in JIT's codegen, but I don't see clear benefits in having this opt in JIT

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented May 19, 2023

image

@dotnet/jit-contrib @AndyAyersMS thoughts? I was thinking first to ignore NOL locals but to me the overall diffs aren't motivating enough to keep the optimizating judging that some of the "size improvements" actually regress perf

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, given that it has low hits. We discussed a bit offline and long term we should probably also consider whether we should fix codegen for normalize-on-load locals to always do a small access.

@AndyAyersMS
Copy link
Member

Yeah I think this is good. I looked to see if we missed spotting regressions back when this went in (#58727) but didn't find anything.

https://github.com/dotnet/perf-autofiling-issues/issues?q=is%3Aissue+created%3A2021-10-12..2021-10-19+label%3Aperf-regression+label%3Aarch-x64+label%3Aruntime-coreclr

//
// Return Value:
// A folded GenTree* instance, or nullptr if it couldn't be folded
GenTree* Compiler::fgMorphCastedBitwiseOp(GenTreeOp* tree)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but is this something we can handle in lowering instead where we can make the decision to keep or bypass the cast based on whether the operands are coming from memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, let's see if perf triage will find any regressions (hoping to see improvements instead)

@AndyAyersMS
Copy link
Member

I wonder if we are seeing this on arm64 too. System.Buffers.Text.Tests.Utf8FormatterTests.FormatterInt64(value: 12345) is slower with PGO for win-x64, lin-x64, and lin-arm64.

The big drop on April 23 was from #85277 but it helped non PGO much more than PGO (we only had PGO data for win-x64 at that point)

image

@EgorBo EgorBo merged commit a331b56 into dotnet:main May 19, 2023
@EgorBo EgorBo deleted the remove-bitwise-op-fwd-stolls branch May 19, 2023 20:38
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants