Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime-coreclr pgo] nativeruntimeeventsource test fails arm32 and x86 #71331

Closed
EgorBo opened this issue Jun 27, 2022 · 10 comments · Fixed by #72010
Closed

[runtime-coreclr pgo] nativeruntimeeventsource test fails arm32 and x86 #71331

EgorBo opened this issue Jun 27, 2022 · 10 comments · Fixed by #72010
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Jun 27, 2022

This test seems to be failing on all outerloop runtime-coreclr pgo pipelines

Unhandled exception. System.Exception: Condition 'Saw the ThreadPoolIOPack event' is not true
at Tracing.Tests.Common.Assert.True(String name, Boolean condition)
at Tracing.Tests.NativeRuntimeEventSourceTest.Main(String[] args)

Fails in various PRs, e.g.:
https://dev.azure.com/dnceng/public/_build/results?buildId=1843344&view=ms.vss-test-web.build-test-results-tab
https://dev.azure.com/dnceng/public/_build/results?buildId=1832137&view=ms.vss-test-web.build-test-results-tab
https://dev.azure.com/dnceng/public/_build/results?buildId=1840992&view=ms.vss-test-web.build-test-results-tab

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 27, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@EgorBo EgorBo added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 27, 2022
@EgorBo EgorBo added this to the 7.0.0 milestone Jun 27, 2022
@ghost
Copy link

ghost commented Jun 27, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This test seems to be failing on all outerloop runtime-coreclr pgo pipelines

Unhandled exception. System.Exception: Condition 'Saw the ThreadPoolIOPack event' is not true
at Tracing.Tests.Common.Assert.True(String name, Boolean condition)
at Tracing.Tests.NativeRuntimeEventSourceTest.Main(String[] args)

Fails in various PRs, e.g.:
https://dev.azure.com/dnceng/public/_build/results?buildId=1843344&view=ms.vss-test-web.build-test-results-tab
https://dev.azure.com/dnceng/public/_build/results?buildId=1832137&view=ms.vss-test-web.build-test-results-tab
https://dev.azure.com/dnceng/public/_build/results?buildId=1840992&view=ms.vss-test-web.build-test-results-tab

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jun 27, 2022
@AndyAyersMS
Copy link
Member

I'll take a look.

@AndyAyersMS AndyAyersMS self-assigned this Jun 29, 2022
@AndyAyersMS
Copy link
Member

I can't repro this locally yet either running the full pri-1 suite or running just this wrapper. May be some kind of timing issue with other concurrent tests so I will keep trying.

CI logs show the IOPack event is emitted.

        [Simple] ThreadID = 8584 ID = 65 Name = ThreadPoolIOPack
        TimeStamp: 6/29/2022 12:04:01 PM
        LocalTime: 6/29/2022 12:04:03 PM
        Difference: 00:00:02.3316022
        Name = "NativeOverlapped" Value = "150706496"
        Name = "Overlapped" Value = "89279024"
        Name = "ClrInstanceID" Value = "0"

not sure why it is not recognized.

@EgorBo
Copy link
Member Author

EgorBo commented Jun 30, 2022

I'll try to reproduce it locally too, it seems it's a quite stable fail on CI - https://dev.azure.com/dnceng/public/_build?definitionId=1000

@AndyAyersMS
Copy link
Member

I can repro if the machine is loaded (eg run the single test case while the machine running a build in a different clone of the runtime repo).

c:\repos\runtime1\artifacts\tests\coreclr\windows.x86.Checked\Tests\Core_Root\corerun.exe nativeruntimeeventsource.dll  1>D:\bugs\r71331\test1.log
Unhandled exception. System.Exception: Condition 'Saw the ThreadPoolIOPack event' is not true
   at Tracing.Tests.Common.Assert.True(String name, Boolean condition) in C:\repos\runtime1\src\tests\tracing\common\Assert.cs:line 14
   at Tracing.Tests.NativeRuntimeEventSourceTest.Main(String[] args) in C:\repos\runtime1\src\tests\tracing\runtimeeventsource\NativeRuntimeEventSourceTest.cs:line 48

c:\repos\runtime1\artifacts\tests\coreclr\windows.x86.Checked\tracing\runtimeeventsource\nativeruntimeeventsource>echo ERRORLEVEL=-532462766

Pretty sure it's just a timing issue with the test

// If on Windows, attempt some Overlapped IO (triggers ThreadPool events)
if (OperatingSystem.IsWindows())
DoOverlappedIO();
// Wait for events.
Thread.Sleep(1000);
// Generate some GC events.
GC.Collect(2, GCCollectionMode.Forced);
// Wait for more events.
Thread.Sleep(1000);
// Ensure that we've seen some events.
foreach (string s in listener.SeenProvidersAndEvents)
Console.WriteLine(s);
Assert.True("listener.EventCount > 0", listener.EventCount > 0);
if (OperatingSystem.IsWindows())
Assert.True("Saw the ThreadPoolIOPack event", listener.SeenProvidersAndEvents.Contains("Microsoft-Windows-DotNETRuntime/EVENTID(65)"));
}

In the "good" runs we have

[Simple] ThreadID = 18116 ID = 65 Name = ThreadPoolIOPack
TimeStamp: 6/29/2022 5:56:35 PM
LocalTime: 6/29/2022 5:56:37 PM
Difference: 00:00:01.4429606
	Name = "NativeOverlapped" Value = "148986208"
	Name = "Overlapped" Value = "87809744"
	Name = "ClrInstanceID" Value = "4"

and in the bad one

[Simple] ThreadID = 12144 ID = 65 Name = ThreadPoolIOPack
TimeStamp: 6/29/2022 5:54:01 PM
LocalTime: 6/29/2022 5:54:04 PM
Difference: 00:00:02.7299304
	Name = "NativeOverlapped" Value = "156569912"
	Name = "Overlapped" Value = "95742188"
	Name = "ClrInstanceID" Value = "4"

I think the event needs to arrive within 2 seconds and in the bad case it takes 2.7 seconds.

@tommcdon can you suggest somebody to verify/refute that this is a timing issue?

@AndyAyersMS
Copy link
Member

Also see what may be related timing issues: #70966, #68690

@AndyAyersMS
Copy link
Member

@tommcdon can you suggest somebody to verify/refute that this is a timing issue?

@tommcdon ping

@tommcdon
Copy link
Member

@tommcdon can you suggest somebody to verify/refute that this is a timing issue?
Sorry! I wasn't getting the notifications for this and didn't mean to miss this. @davmason could you take a look?

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 12, 2022
@davmason
Copy link
Member

I verified on my machine that I hit it almost 100% when the machine was at 100 CPU usage, and after the fix in #72010 it passes every time.

The only way this could not be a timing issue is if somehow the JIT recognizes that the code here doesn't actually do anything and removes it. From my understanding that is not currently something the JIT would do.

private static unsafe void DoOverlappedIO()
{
Console.WriteLine("DOOVERLAPPEDIO");
Overlapped overlapped = new();
NativeOverlapped* pOverlap = overlapped.Pack(null, null);
Overlapped.Free(pOverlap);
}
}

@davmason davmason assigned davmason and unassigned AndyAyersMS Jul 13, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 14, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants