Skip to content

stackalloc long[3] is slower than [0,0,0] #121248

@EgorBo

Description

@EgorBo

Noticed while working on #121225

Here is the minimal repro:

using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Engines;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);

public class Benchmarks
{
    [Benchmark]
    public long Bench_stackalloc() => ParseNonCanonical_stackalloc("11");

    [Benchmark]
    public long Bench_InlineArray() => ParseNonCanonical_InlineArray("11");


    [MethodImpl(MethodImplOptions.NoInlining)]
    int ParseNonCanonical_stackalloc(ReadOnlySpan<char> name)
    {
        Span<long> parts = stackalloc long[3];
        Consume(parts);
        return name[1];
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    int ParseNonCanonical_InlineArray(ReadOnlySpan<char> name)
    {
        Span<long> parts = [0, 0, 0];
        Consume(parts);
        return name[1];
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Consume(Span<long> parts) { }
}

Benchmarks results on Linux-x64:

| Method            | Mean     | Error     | StdDev    |
|------------------ |---------:|----------:|----------:|
| Bench_stackalloc  | 6.967 ns | 0.1560 ns | 0.2135 ns |
| Bench_InlineArray | 1.608 ns | 0.0043 ns | 0.0034 ns |

Presumably, the perf penalty comes from Store Forwarding:

       vmovdqu  xmm0, xmmword ptr [rsp+0x30]
       vmovdqu  xmmword ptr [rsp+0x20], xmm0

I haven't looked into JitDump yet to tell why.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIuntriagedNew issue has not been triaged by the area owner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions