Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ManyConcurrentAddsTakes_ForceContentionWithToArray fails intermittently on osx-arm64 #76501

Closed
jkotas opened this issue Oct 2, 2022 · 13 comments · Fixed by #78142
Closed

ManyConcurrentAddsTakes_ForceContentionWithToArray fails intermittently on osx-arm64 #76501

jkotas opened this issue Oct 2, 2022 · 13 comments · Fixed by #78142
Assignees
Labels
arch-arm64 area-System.Collections Known Build Error Use this to report build issues in the .NET Helix tab os-mac-os-x macOS aka OSX tenet-reliability Reliability/stability related issue (stress, load problems, etc.) test-failure
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Oct 2, 2022

Assert.DoesNotContain() Failure
Found:    0
In value: Int32[] [0]
   at System.Collections.Concurrent.Tests.ProducerConsumerCollectionTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(Double seconds) + 0x110
   at System.Collections.Concurrent!<BaseAddress>+0x9fa6bc
   at System.Reflection.DynamicInvokeInfo.Invoke(Object, IntPtr, Object[], BinderBundle, Boolean) + 0x148
 {
    "ErrorMessage" : "System.Collections.Concurrent.Tests.ProducerConsumerCollectionTests.ManyConcurrentAddsTakes_ForceContentionWithToArray",
    "BuildRetry": false
 }

Failed in #76499. Log: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-76499-merge-3c4c964d671349bdbd/System.Collections.Concurrent.Tests/1/console.b1cceddc.log?helixlogtype=result

Report

Build Definition Test Pull Request
74353 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1)
74067 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1)
73941 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1) #77934
73002 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1)
72557 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1)
71724 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithGetEnumerator
70399 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1) #77770
67757 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1)
67272 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1)
67179 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1)
66666 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1) #76630
64951 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1) #77522

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 5 12
@jkotas jkotas added arch-arm64 os-mac-os-x macOS aka OSX tenet-reliability Reliability/stability related issue (stress, load problems, etc.) area-NativeAOT-coreclr Known Build Error Use this to report build issues in the .NET Helix tab labels Oct 2, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 2, 2022
@jkotas
Copy link
Member Author

jkotas commented Oct 2, 2022

Also failed in #75421 (comment)

@jkotas
Copy link
Member Author

jkotas commented Oct 2, 2022

cc @VSadov @filipnavara

@jkotas jkotas added this to the 8.0.0 milestone Oct 2, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Oct 2, 2022
@filipnavara
Copy link
Member

I believe the failure is genuine but I was not able to reliably reproduce it on my machine yet. Should we mark the test with ActiveIssue until we find the root cause?

@jkotas
Copy link
Member Author

jkotas commented Oct 2, 2022

"Known Build Error" label turns on automation to monitor and collect statistics about the failure. Let's see how often it fails. If it fails too often, we are going to disable it.

@filipnavara
Copy link
Member

filipnavara commented Oct 4, 2022

It reproduces in less than a minute when running the single test in loop on M1.

@filipnavara
Copy link
Member

filipnavara commented Oct 4, 2022

In fact, it reproduces under regular CoreCLR as well:

./.dotnet/dotnet /Users/filipnavara/Projects/ConcurrentDict/bin/Debug/net7.0/ConcurrentDict.dll 
FAIL
Unhandled exception. System.Exception: Exception of type 'System.Exception' was thrown.
   at Assert.DoesNotContain(Int32 value, Int32[] array) in /Users/filipnavara/Projects/ConcurrentDict/ConcurrentDict.cs:line 92
   at BringUpTest.Test() in /Users/filipnavara/Projects/ConcurrentDict/ConcurrentDict.cs:line 50
   at BringUpTest.Main() in /Users/filipnavara/Projects/ConcurrentDict/ConcurrentDict.cs:line 18

ConcurrentDict.cs:

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Diagnostics;
using System.Text;
using System.Collections.Concurrent;
using System.Threading.Tasks;

public class BringUpTest
{
    const int Pass = 100;
    const int Fail = -1;

    public static int Main()
    {
        for (int i = 0; i < 200; i++)
            Test();
        return Pass;
    }

    public static void Test()
    {
        IProducerConsumerCollection<int> c = new ConcurrentQueue<int>();
        const int MaxCount = 4;

        DateTime end = DateTime.UtcNow + TimeSpan.FromSeconds(1);

        Task addsTakes = Task.Factory.StartNew(() =>
        {
            while (DateTime.UtcNow < end)
            {
                for (int i = 1; i <= MaxCount; i++)
                {
                    Assert.True(c.TryAdd(i));
                }
                for (int i = 1; i <= MaxCount; i++)
                {
                    int item;
                    Assert.True(c.TryTake(out item));
                    Assert.InRange(item, 1, MaxCount);
                }
            }
        });

        while (DateTime.UtcNow < end)
        {
            int[] arr = c.ToArray();
            Assert.InRange(arr.Length, 0, MaxCount);
            Assert.DoesNotContain(0, arr); // make sure we didn't get default(T)
        }

        addsTakes.GetAwaiter().GetResult();
        Assert.Equal(0, c.Count);
    }
}

class Assert
{
    public static void Equal(int expected, int actual)
    {
        if (expected != actual)
        {
            Console.WriteLine($"{actual} != {expected}");
            throw new Exception();
        }
    }

    public static void True(bool actual)
    {
        if (!actual)
        {
            Console.WriteLine($"!{actual}");
            throw new Exception();
        }
    }

    public static void InRange(int actual, int min, int max)
    {
        if (actual < min || actual > max)
        {
            Console.WriteLine($"{actual} not in {min}-{max}");
            throw new Exception();
        }
    }

    public static void DoesNotContain(int value, int[] array)
    {
        if (Array.Exists(array, e => e == value))
        {
            Console.WriteLine($"FAIL");
            throw new Exception();
        }
    }
}

@ghost
Copy link

ghost commented Oct 4, 2022

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details
Assert.DoesNotContain() Failure
Found:    0
In value: Int32[] [0]
   at System.Collections.Concurrent.Tests.ProducerConsumerCollectionTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(Double seconds) + 0x110
   at System.Collections.Concurrent!<BaseAddress>+0x9fa6bc
   at System.Reflection.DynamicInvokeInfo.Invoke(Object, IntPtr, Object[], BinderBundle, Boolean) + 0x148
 {
    "ErrorMessage" : "System.Collections.Concurrent.Tests.ProducerConsumerCollectionTests.ManyConcurrentAddsTakes_ForceContentionWithToArray",
    "BuildRetry": false
 }

Failed in #76499. Log: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-76499-merge-3c4c964d671349bdbd/System.Collections.Concurrent.Tests/1/console.b1cceddc.log?helixlogtype=result

Report

Build Definition Test
38234 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1)
38068 dotnet/runtime System.Collections.Concurrent.Tests.ConcurrentQueueTests.ManyConcurrentAddsTakes_ForceContentionWithToArray(seconds: 1)

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
1 2 2
Author: jkotas
Assignees: -
Labels:

arch-arm64, area-System.Collections, os-mac-os-x, tenet-reliability, Known Build Error

Milestone: 8.0.0

@filipnavara
Copy link
Member

Note that the failure is also reproducible on 7.0 RC1 so it may be appropriate to change the milestone.

@jkotas jkotas changed the title [NativeAOT] ManyConcurrentAddsTakes_ForceContentionWithToArray fails intermittently on osx-arm64 ManyConcurrentAddsTakes_ForceContentionWithToArray fails intermittently on osx-arm64 Oct 4, 2022
@akoeplinger
Copy link
Member

Might be related: #76141

@filipnavara
Copy link
Member

Might be related: #76141

Likely not, I already checked the existing issues. This one is ConcurrentQueue and the other one is ConcurrentStack.

@VSadov
Copy link
Member

VSadov commented Nov 10, 2022

Looks like an ordering issue on arm64. I will take a look

@VSadov
Copy link
Member

VSadov commented Nov 10, 2022

I am able to reproduce this on OSX-arm64-CoreCLR

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 10, 2022
@VSadov
Copy link
Member

VSadov commented Nov 10, 2022

PR with a fix: #78142

I am running the repro in multiple processes for about 20 min now with no failures. It typically fails in under one minute.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Nov 10, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-System.Collections Known Build Error Use this to report build issues in the .NET Helix tab os-mac-os-x macOS aka OSX tenet-reliability Reliability/stability related issue (stress, load problems, etc.) test-failure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants