Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Threading.Tests.Perf_Timer.SynchronousContention has regressed on macOS x64 #66774

Closed
adamsitnik opened this issue Mar 17, 2022 · 9 comments
Labels
Milestone

Comments

@adamsitnik
Copy link
Member

System.Threading.Tests.Perf_Timer.SynchronousContention

This particular benchmark has regressed even up to x10. I happened between preview 1 and preview 2 and is specific to macOS x64 . All Other configs are fine.

Repro:

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_monthly.py net6.0 net7.0-preview2 --filter 'System.Threading.Tests.Perf_Timer.SynchronousContention'

6.0 vs 7.0-preview2

Result Base Diff Ratio Operating System Bit
Faster 7592192700.00 6422264200.00 1.18 Windows 11 X64
Faster 3445500800.00 2991632200.00 1.15 Windows 11 X64
Slower 2753284900.00 3795765650.00 0.73 Windows 11 X64
Same 4362702200.00 4351387000.00 1.00 Windows 10 X64
Same 5140070400.00 5374294100.00 0.96 Windows 11 X64
Slower 3936259350.00 4533561150.00 0.87 Windows 11 X64
Same 3142999426.00 3060466877.00 1.03 ubuntu 18.04 X64
Same 3190544427.00 2896090257.00 1.10 ubuntu 20.04 X64
Same 4975816946.00 5315933153.00 0.94 ubuntu 18.04 X64
Same 3223415168.00 3261809279.00 0.99 ubuntu 18.04 X64
Slower 870459584.00 1018058722.50 0.86 pop 20.04 X64
Same 2250512250.00 2085430500.00 1.08 alpine 3.13 X64
Same 2175078950.00 2285564750.00 0.95 debian 11 X64
Faster 6984751875.00 2829376125.00 2.47 macOS Monterey 12.2.1 Arm64
Same 7679052700.00 8032133900.00 0.96 Windows 10 Arm64
Same 8556989400.00 8450397500.00 1.01 Windows 11 Arm64
Same 3196248800.00 3052868200.00 1.05 Windows 10 X86
Same 2562783800.00 2557121100.00 1.00 Windows 10 X86
Same 2660015150.00 2554015100.00 1.04 Windows 10 X86
Same 5298881900.00 5701486950.00 0.93 Windows 10 Arm
Same 1978640207.50 1936352036.00 1.02 macOS Big Sur 11.6.3 X64
Slower 2720032892.00 30488358818.00 0.09 macOS Monterey 12.2.1 X64
Slower 1625397605.00 9434815791.50 0.17 macOS Monterey 12.2.1 X64

6.0 vs 7.0-preview1

no regression

Result Base Diff Ratio Operating System Bit
Slower 7592192700.00 9025713200.00 0.84 Windows 11 X64
Same 3445500800.00 3497805750.00 0.99 Windows 11 X64
Same 3503478400.00 3521395300.00 0.99 Windows 10 X64
Slower 2753284900.00 4392316050.00 0.63 Windows 11 X64
Same 4480273700.00 4425113900.00 1.01 Windows 10 X64
Faster 5903656900.00 4395902900.00 1.34 Windows 11 X64
Same 5961020800.00 5530985700.00 1.08 Windows 11 X64
Same 3936259350.00 6987893650.00 0.56 Windows 11 X64
Slower 4086706650.00 4782135950.00 0.85 Windows 11 X64
Same 3190544427.00 3533905849.00 0.90 ubuntu 20.04 X64
Same 4975816946.00 5295691582.00 0.94 ubuntu 18.04 X64
Same 3940314238.00 3937037070.00 1.00 centos 7 X64
Same 3244566566.00 3200415653.00 1.01 ubuntu 18.04 X64
Same 2178689950.00 2201137150.00 0.99 alpine 3.13 X64
Same 1993444350.00 2193017300.00 0.91 ubuntu 18.04 X64
Same 2064021500.00 2262614500.00 0.91 ubuntu 20.04 X64
Same 7934386550.00 7985192600.00 0.99 Windows 10 Arm64
Same 3196248800.00 3061675600.00 1.04 Windows 10 X86
Same 5786057350.00 5734487600.00 1.01 Windows 10 Arm
Same 1978640207.50 2018054987.00 0.98 macOS Big Sur 11.6.3 X64
Faster 1492929237.50 1163942736.50 1.28 macOS Big Sur 11.4 X64

cc @kouvel

@ghost
Copy link

ghost commented Mar 17, 2022

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

System.Threading.Tests.Perf_Timer.SynchronousContention

This particular benchmark has regressed even up to x10. I happened between preview 1 and preview 2 and is specific to macOS x64 . All Other configs are fine.

Repro:

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_monthly.py net6.0 net7.0-preview2 --filter 'System.Threading.Tests.Perf_Timer.SynchronousContention'

6.0 vs 7.0-preview2

Result Base Diff Ratio Operating System Bit
Faster 7592192700.00 6422264200.00 1.18 Windows 11 X64
Faster 3445500800.00 2991632200.00 1.15 Windows 11 X64
Slower 2753284900.00 3795765650.00 0.73 Windows 11 X64
Same 4362702200.00 4351387000.00 1.00 Windows 10 X64
Same 5140070400.00 5374294100.00 0.96 Windows 11 X64
Slower 3936259350.00 4533561150.00 0.87 Windows 11 X64
Same 3142999426.00 3060466877.00 1.03 ubuntu 18.04 X64
Same 3190544427.00 2896090257.00 1.10 ubuntu 20.04 X64
Same 4975816946.00 5315933153.00 0.94 ubuntu 18.04 X64
Same 3223415168.00 3261809279.00 0.99 ubuntu 18.04 X64
Slower 870459584.00 1018058722.50 0.86 pop 20.04 X64
Same 2250512250.00 2085430500.00 1.08 alpine 3.13 X64
Same 2175078950.00 2285564750.00 0.95 debian 11 X64
Faster 6984751875.00 2829376125.00 2.47 macOS Monterey 12.2.1 Arm64
Same 7679052700.00 8032133900.00 0.96 Windows 10 Arm64
Same 8556989400.00 8450397500.00 1.01 Windows 11 Arm64
Same 3196248800.00 3052868200.00 1.05 Windows 10 X86
Same 2562783800.00 2557121100.00 1.00 Windows 10 X86
Same 2660015150.00 2554015100.00 1.04 Windows 10 X86
Same 5298881900.00 5701486950.00 0.93 Windows 10 Arm
Same 1978640207.50 1936352036.00 1.02 macOS Big Sur 11.6.3 X64
Slower 2720032892.00 30488358818.00 0.09 macOS Monterey 12.2.1 X64
Slower 1625397605.00 9434815791.50 0.17 macOS Monterey 12.2.1 X64

6.0 vs 7.0-preview1

no regression

Result Base Diff Ratio Operating System Bit
Slower 7592192700.00 9025713200.00 0.84 Windows 11 X64
Same 3445500800.00 3497805750.00 0.99 Windows 11 X64
Same 3503478400.00 3521395300.00 0.99 Windows 10 X64
Slower 2753284900.00 4392316050.00 0.63 Windows 11 X64
Same 4480273700.00 4425113900.00 1.01 Windows 10 X64
Faster 5903656900.00 4395902900.00 1.34 Windows 11 X64
Same 5961020800.00 5530985700.00 1.08 Windows 11 X64
Same 3936259350.00 6987893650.00 0.56 Windows 11 X64
Slower 4086706650.00 4782135950.00 0.85 Windows 11 X64
Same 3190544427.00 3533905849.00 0.90 ubuntu 20.04 X64
Same 4975816946.00 5295691582.00 0.94 ubuntu 18.04 X64
Same 3940314238.00 3937037070.00 1.00 centos 7 X64
Same 3244566566.00 3200415653.00 1.01 ubuntu 18.04 X64
Same 2178689950.00 2201137150.00 0.99 alpine 3.13 X64
Same 1993444350.00 2193017300.00 0.91 ubuntu 18.04 X64
Same 2064021500.00 2262614500.00 0.91 ubuntu 20.04 X64
Same 7934386550.00 7985192600.00 0.99 Windows 10 Arm64
Same 3196248800.00 3061675600.00 1.04 Windows 10 X86
Same 5786057350.00 5734487600.00 1.01 Windows 10 Arm
Same 1978640207.50 2018054987.00 0.98 macOS Big Sur 11.6.3 X64
Faster 1492929237.50 1163942736.50 1.28 macOS Big Sur 11.4 X64

cc @kouvel

Author: adamsitnik
Assignees: -
Labels:

area-System.Threading, os-mac-os-x, tenet-performance, arch-x64

Milestone: -

@kouvel
Copy link
Member

kouvel commented Mar 17, 2022

Looks like the perf on "macOS Big Sur" is similar on preview 2 as on preview 1. "macOS Monterey" appears to be new in the preview 2 runs, maybe it's something specific to Monterey.

@kouvel kouvel added this to the 7.0.0 milestone Mar 17, 2022
@adamsitnik
Copy link
Member Author

There are some simpler benchmarks like System.Collections.CtorGivenSizeNonGeneric.Queue that have also regressed only on macOS Monterey x64. Since System.Threading.Tests.Perf_Timer.SynchronousContention allocated 0.5 GB the issue can be related to GC and #65198.

System.Collections.CtorGivenSizeNonGeneric.Queue(Size: 512)

Result Ratio Operating System Bit
Same 0.99 Windows 11 X64
Same 1.02 Windows 11 X64
Same 1.06 Windows 11 X64
Same 1.01 Windows 10 X64
Same 1.11 Windows 11 X64
Same 0.97 Windows 11 X64
Same 1.01 ubuntu 18.04 X64
Same 0.97 ubuntu 20.04 X64
Same 1.00 ubuntu 18.04 X64
Same 1.01 ubuntu 18.04 X64
Same 0.95 pop 20.04 X64
Same 1.03 alpine 3.13 X64
Same 1.03 debian 11 X64
Faster 1.58 macOS Monterey 12.2.1 Arm64
Same 0.96 Windows 10 Arm64
Same 1.01 Windows 11 Arm64
Same 0.94 Windows 10 X86
Same 0.98 Windows 10 X86
Same 1.02 Windows 10 X86
Same 0.97 Windows 10 Arm
Same 0.96 macOS Big Sur 11.6.3 X64
Slower 0.15 macOS Monterey 12.2.1 X64
Slower 0.19 macOS Monterey 12.2.1 X64

@kouvel
Copy link
Member

kouvel commented Aug 9, 2022

Here's a narrower repro:

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;

static class Program
{
    private static void Main()
    {
        int threadCount = Environment.ProcessorCount;
        var threadReady = new AutoResetEvent(false);
        var continueThreads = new ManualResetEvent(false);
        int completedThreadCount = 0;
        var allThreadsCompleted = new AutoResetEvent(false);
        TimerCallback timerCallback = _ => { };
        ThreadStart threadStart = () =>
        {
            threadReady.Set();
            continueThreads.WaitOne();
            for (int i = 0; i < 1_000_000; i++)
            {
                CreateFinalizable();
            }

            if (Interlocked.Increment(ref completedThreadCount) >= threadCount)
            {
                continueThreads.Reset();
                completedThreadCount = 0;
                allThreadsCompleted.Set();
            }
        };

        var sw = new Stopwatch();

        for (int j = 0; j < 4; j++)
        {
            for (int i = 0; i < threadCount; i++)
            {
                var t = new Thread(threadStart);
                t.IsBackground = true;
                t.Start();
                threadReady.WaitOne();
            }

            sw.Restart();
            continueThreads.Set();
            allThreadsCompleted.WaitOne();
            sw.Stop();

            Console.WriteLine($"{sw.Elapsed.TotalMilliseconds,10:0.000}");
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void CreateFinalizable() => new TestObjWithFinalizable();

    private sealed class TestObjWithFinalizable
    {
        private readonly TestFinalizable _testFinalizable = new TestFinalizable();
    }

    private sealed class TestFinalizable
    {
        public TestFinalizable() => GC.SuppressFinalize(this);
        ~TestFinalizable() { }
    }
}

On my 2-core 4-thread x64 macOS Monterey:

.NET 6:

   661.627
   651.091
   653.747
   659.741

.NET 7 preview 1:

   689.781
   694.637
   667.180
   691.618

.NET 7 preview 2:

  1460.798
  1230.840
  1520.936
  1555.570

The numbers on the latest .NET 7 were ranging between 1300 and 2000.

I'm not sure what might have changed, but it seems related to the GC. On other systems I'm seeing a large portion of CPU time spent under RegisterForFinalization(), in EnterFinalizeLock(). The Sleep(5) path is being hit a lot and is likely slowing things down unnecessarily (and perhaps inconsistently). The test also performs differently on Windows because the Sleep(5) is closer to Sleep(15). May be worthwhile to improve that spin-lock implementation, though I'm not sure if it's relevant to the difference on Monterey.

CC @dotnet/gc

@ghost
Copy link

ghost commented Aug 9, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

System.Threading.Tests.Perf_Timer.SynchronousContention

This particular benchmark has regressed even up to x10. I happened between preview 1 and preview 2 and is specific to macOS x64 . All Other configs are fine.

Repro:

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_monthly.py net6.0 net7.0-preview2 --filter 'System.Threading.Tests.Perf_Timer.SynchronousContention'

6.0 vs 7.0-preview2

Result Base Diff Ratio Operating System Bit
Faster 7592192700.00 6422264200.00 1.18 Windows 11 X64
Faster 3445500800.00 2991632200.00 1.15 Windows 11 X64
Slower 2753284900.00 3795765650.00 0.73 Windows 11 X64
Same 4362702200.00 4351387000.00 1.00 Windows 10 X64
Same 5140070400.00 5374294100.00 0.96 Windows 11 X64
Slower 3936259350.00 4533561150.00 0.87 Windows 11 X64
Same 3142999426.00 3060466877.00 1.03 ubuntu 18.04 X64
Same 3190544427.00 2896090257.00 1.10 ubuntu 20.04 X64
Same 4975816946.00 5315933153.00 0.94 ubuntu 18.04 X64
Same 3223415168.00 3261809279.00 0.99 ubuntu 18.04 X64
Slower 870459584.00 1018058722.50 0.86 pop 20.04 X64
Same 2250512250.00 2085430500.00 1.08 alpine 3.13 X64
Same 2175078950.00 2285564750.00 0.95 debian 11 X64
Faster 6984751875.00 2829376125.00 2.47 macOS Monterey 12.2.1 Arm64
Same 7679052700.00 8032133900.00 0.96 Windows 10 Arm64
Same 8556989400.00 8450397500.00 1.01 Windows 11 Arm64
Same 3196248800.00 3052868200.00 1.05 Windows 10 X86
Same 2562783800.00 2557121100.00 1.00 Windows 10 X86
Same 2660015150.00 2554015100.00 1.04 Windows 10 X86
Same 5298881900.00 5701486950.00 0.93 Windows 10 Arm
Same 1978640207.50 1936352036.00 1.02 macOS Big Sur 11.6.3 X64
Slower 2720032892.00 30488358818.00 0.09 macOS Monterey 12.2.1 X64
Slower 1625397605.00 9434815791.50 0.17 macOS Monterey 12.2.1 X64

6.0 vs 7.0-preview1

no regression

Result Base Diff Ratio Operating System Bit
Slower 7592192700.00 9025713200.00 0.84 Windows 11 X64
Same 3445500800.00 3497805750.00 0.99 Windows 11 X64
Same 3503478400.00 3521395300.00 0.99 Windows 10 X64
Slower 2753284900.00 4392316050.00 0.63 Windows 11 X64
Same 4480273700.00 4425113900.00 1.01 Windows 10 X64
Faster 5903656900.00 4395902900.00 1.34 Windows 11 X64
Same 5961020800.00 5530985700.00 1.08 Windows 11 X64
Same 3936259350.00 6987893650.00 0.56 Windows 11 X64
Slower 4086706650.00 4782135950.00 0.85 Windows 11 X64
Same 3190544427.00 3533905849.00 0.90 ubuntu 20.04 X64
Same 4975816946.00 5295691582.00 0.94 ubuntu 18.04 X64
Same 3940314238.00 3937037070.00 1.00 centos 7 X64
Same 3244566566.00 3200415653.00 1.01 ubuntu 18.04 X64
Same 2178689950.00 2201137150.00 0.99 alpine 3.13 X64
Same 1993444350.00 2193017300.00 0.91 ubuntu 18.04 X64
Same 2064021500.00 2262614500.00 0.91 ubuntu 20.04 X64
Same 7934386550.00 7985192600.00 0.99 Windows 10 Arm64
Same 3196248800.00 3061675600.00 1.04 Windows 10 X86
Same 5786057350.00 5734487600.00 1.01 Windows 10 Arm
Same 1978640207.50 2018054987.00 0.98 macOS Big Sur 11.6.3 X64
Faster 1492929237.50 1163942736.50 1.28 macOS Big Sur 11.4 X64

cc @kouvel

Author: adamsitnik
Assignees: kouvel
Labels:

os-mac-os-x, tenet-performance, arch-x64, area-GC-coreclr

Milestone: 7.0.0

@kouvel kouvel removed their assignment Aug 9, 2022
@mangod9
Copy link
Member

mangod9 commented Aug 10, 2022

so there is a regression only on macOS x64 and not arm64?

@kouvel
Copy link
Member

kouvel commented Aug 10, 2022

so there is a regression only on macOS x64 and not arm64?

In the 6.0 vs 7.0-preview2 comparison above, 7.0p2 appears to be faster on the benchmark on macOS Monterey arm64.

@mangod9
Copy link
Member

mangod9 commented Aug 15, 2022

since this is macOS x64 only moving to 8

@mangod9 mangod9 modified the milestones: 7.0.0, 8.0.0 Aug 15, 2022
@mangod9
Copy link
Member

mangod9 commented Aug 2, 2023

Closing since this was MacOS x64 specific.

@mangod9 mangod9 closed this as completed Aug 2, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Sep 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants