-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quarantine VerifyCountersFireWithCorrectValues
and EventCountersAndMetricsValues
#57259
Comments
@JamesNK Do you know of any changes that might have affected Hosting's "requests-per-second", "total-requests", "current-requests", and/or "failed-requests" counters? Both of these tests verify those counters using the |
No, I don't. There have been SignalR telemetry changes, but they're isolated to SignalR. And the Kestrel metrics connection improvement was merged a month ago. It looks like errors started a few days ago: There was a runtime update on that day. Related? #57060 |
I looked at what @tarekgh @noahfalk Have there been any changes in event counters recently? |
I am not touching event counters at all and not aware of any changes there. @noahfalk should know better here. |
This change merged recently and might be related - dotnet/runtime#105548 I suspect what happened is the test is making assumptions that both requests will land in the same measurement interval so that some counter event will get reported where Increment=2. Such an assumption has always been relying on a race condition and not guaranteed to be true, but prior to 105548 it was probably very likely. The requests just needed to finish in less than 1 second so that they wouldn't miss the first measurement interval. After change 105548 the start of the first measurement interval is also asynchronous so its now possible for the requests to run fast enough that they complete before the first measurement interval starts. If that is the issue, a couple things that could make the test more timing resillient:
|
That sounds like it. We can refactor the test to track increments across events. This could affect more people. Although I doubt many people take the time to unit test counters. |
PR with improvements: #57269 |
Yeah, its a risk but I think a worthwhile one in order to fix the bug. I'm not aware of a good option that preserves the fix without also disrupting tests that were making implicit assumptions about the 1st callback being synchronous. |
It looks like this is still failing, but with a different error. Failing Test(s)
Error Message
Stacktrace
Logs
Build |
Still flaky: Another PR to try to fix: #59892 |
Failing Test(s)
Microsoft.AspNetCore.Hosting.Tests.HostingApplicationDiagnosticsTests.EventCountersAndMetricsValues
Microsoft.AspNetCore.Hosting.HostingEventSourceTests.VerifyCountersFireWithCorrectValues
Error Message
Stacktrace
Logs
Build
https://dev.azure.com/dnceng-public/public/_build/results?buildId=771079&view=ms.vss-test-web.build-test-results-tab
https://dev.azure.com/dnceng-public/public/_build/results?buildId=771156&view=ms.vss-test-web.build-test-results-tab
https://dev.azure.com/dnceng-public/public/_build/results?buildId=771449&view=ms.vss-test-web.build-test-results-tab
https://dev.azure.com/dnceng-public/public/_build/results?buildId=769803&view=ms.vss-test-web.build-test-results-tab
The text was updated successfully, but these errors were encountered: