[Perf] Linux/x64: 4 Regressions on 2/25/2024 4:37:10 PM #99002

performanceautofiler · 2024-02-27T10:21:52Z

Run Information

Name	Value
Architecture	x64
OS	ubuntu 22.04
Queue	TigerUbuntu
Baseline	f32c428c86b4cc41e88e2e5a750c37dfb354e33a
Compare	5ef47c852ffd51aaeb52e34391fe4fed261c9f26
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Tests.Perf_StringBuilder

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
ToString_MultipleSegments - Duration of single invocation 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	87.46 μs	101.66 μs	1.16	0.00	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Tests.Perf_StringBuilder*'

Payloads

Baseline
Compare

System.Text.Tests.Perf_StringBuilder.ToString_MultipleSegments(length: 100000)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

DrewScoggins · 2024-02-27T17:14:25Z

From #98623

Improvements:
dotnet/perf-autofiling-issues#29825
dotnet/perf-autofiling-issues#29808

mangod9 · 2024-06-26T22:24:54Z

@EgorBo is this by design, so it could be closed?

EgorBo · 2024-06-26T23:02:40Z

@EgorBot -amd64 -intel -perf -commit 79dd9ba vs fab69ef --disasm

using BenchmarkDotNet.Attributes;
using System.Text;

public class MyProgram
{
    const int LOHAllocatedStringSize = 100_000;
    private string _stringLOH;
    private string _string100;
    private StringBuilder _builderSingleSegment100;
    private StringBuilder _builderSingleSegmentLOH;
    private StringBuilder _builderMultipleSegments100;
    private StringBuilder _builderMultipleSegmentsLOH;

    [GlobalSetup(Target = nameof(ToString_MultipleSegments))]
    public void Setup_ToString_MultipleSegments()
    {
        _builderMultipleSegments100 = Append_Char(100); // 16 + 32 + 48 + 96 char segments
        _builderMultipleSegmentsLOH = Append_Char(LOHAllocatedStringSize);
    }

    public StringBuilder Append_Char(int length)
    {
        StringBuilder builder = new StringBuilder();
        for (int i = 0; i < length; i++)
            builder.Append('a');
        return builder;
    }

    [Benchmark]
    [Arguments(LOHAllocatedStringSize)]
    public string ToString_MultipleSegments(int length) => (length == 100 ? _builderMultipleSegments100 : _builderMultipleSegmentsLOH).ToString();
}

EgorBot · 2024-06-26T23:21:49Z

Benchmark results on Intel

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  Job-JXBPJF : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BMGTJF : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Toolchain	length	Mean	Error	Ratio	Code Size
ToString_MultipleSegments	Main	100000	72.87 μs	1.077 μs	1.00	262 B
ToString_MultipleSegments	PR	100000	110.64 μs	1.693 μs	1.52	262 B

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBot · 2024-06-26T23:23:09Z

Benchmark results on Amd

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores
  Job-CNQFTW : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-FMFWZL : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Toolchain	length	Mean	Error	Ratio	Code Size
ToString_MultipleSegments	Main	100000	110.0 μs	0.37 μs	1.00	262 B
ToString_MultipleSegments	PR	100000	145.6 μs	0.31 μs	1.32	262 B

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

mangod9 · 2024-07-03T21:06:33Z

@EgorBo so the regression still exists?

EgorBo · 2024-07-03T21:17:33Z

@EgorBo so the regression still exists?

It does, but there is not much we can do here.
Previously we used to call native memmove directly (in coop mode), now we call managed SpanHelpers.Memmove instead that forwards back to native memmove (with gc transition) for large sizes, so we effectively pay for that wrapper. It only happens for large sizes.

Since there are more improvements (11) than regressions (4) and the fact that that PR:

Fixed a bug on NAOT: NativeAOT: Memset/Memcpy helpers don't throw NRE #95517
Made Memmove more suspension friendly: memcpy/memset helpers on CoreCLR may not be suspension-friendly #98620

we can close it.

performanceautofiler bot added arch-x64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Feb 27, 2024

performanceautofiler bot mentioned this issue Feb 27, 2024

[SENTINEL] Autofile run complete at 2/27/2024 10:24:45 AM. 14 issues filed. dotnet/perf-autofiling-issues#29826

Closed

DrewScoggins removed the untriaged New issue has not been triaged by the area owner label Feb 27, 2024

DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Feb 27, 2024

ghost added the untriaged New issue has not been triaged by the area owner label Feb 27, 2024

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 27, 2024

jeffschwMSFT added the area-VM-coreclr label Feb 28, 2024

vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 28, 2024

dotnet deleted a comment from EgorBot Jun 26, 2024

EgorBo closed this as completed Jul 3, 2024

dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Jul 3, 2024

github-actions bot locked and limited conversation to collaborators Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Linux/x64: 4 Regressions on 2/25/2024 4:37:10 PM #99002

[Perf] Linux/x64: 4 Regressions on 2/25/2024 4:37:10 PM #99002

performanceautofiler bot commented Feb 27, 2024 •

edited

Loading

Payloads

System.Text.Tests.Perf_StringBuilder.ToString_MultipleSegments(length: 100000)

ETL Files

Histogram

JIT Disasms

Docs

DrewScoggins commented Feb 27, 2024 •

edited

Loading

mangod9 commented Jun 26, 2024

EgorBo commented Jun 26, 2024

EgorBot commented Jun 26, 2024

EgorBot commented Jun 26, 2024

mangod9 commented Jul 3, 2024

EgorBo commented Jul 3, 2024 •

edited

Loading

[Perf] Linux/x64: 4 Regressions on 2/25/2024 4:37:10 PM #99002

[Perf] Linux/x64: 4 Regressions on 2/25/2024 4:37:10 PM #99002

Comments

performanceautofiler bot commented Feb 27, 2024 • edited Loading

Run Information

Regressions in System.Text.Tests.Perf_StringBuilder

Repro

Payloads

System.Text.Tests.Perf_StringBuilder.ToString_MultipleSegments(length: 100000)

ETL Files

Histogram

JIT Disasms

Docs

DrewScoggins commented Feb 27, 2024 • edited Loading

mangod9 commented Jun 26, 2024

EgorBo commented Jun 26, 2024

EgorBot commented Jun 26, 2024

EgorBot commented Jun 26, 2024

mangod9 commented Jul 3, 2024

EgorBo commented Jul 3, 2024 • edited Loading

performanceautofiler bot commented Feb 27, 2024 •

edited

Loading

DrewScoggins commented Feb 27, 2024 •

edited

Loading

EgorBo commented Jul 3, 2024 •

edited

Loading