Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running test: ReadAsync_CancelPendingTask_ThrowsCancellationException and ReadAsync_CancelPendingValueTask_ThrowsCancellationException #72586

Closed
tlakollo opened this issue Jul 21, 2022 · 14 comments · Fixed by #77573
Labels
area-System.Net.Http bug disabled-test The test is disabled in source code against the issue in-pr There is an active PR which will close this issue when it is merged Known Build Error Use this to report build issues in the .NET Helix tab test-run-core Test failures in .NET Core test runs
Milestone

Comments

@tlakollo
Copy link
Contributor

tlakollo commented Jul 21, 2022

Long running tests:

  • CancelPendingTask - ReadAsync_CancelPendingTask_ThrowsCancellationException (disabled in PR Disable long running ReadAsync_CancelPendingTask_ThrowsCancellationException tests for some Http1 cases #72596 on 7/21)
    • System.Net.Http.Functional.Tests.Http1CloseResponseStreamConformanceTests.ReadAsync_CancelPendingTask_ThrowsCancellationException - Http1CloseResponseStreamConformanceTests
    • System.Net.Http.Functional.Tests.Http1RawResponseStreamConformanceTests.ReadAsync_CancelPendingTask_ThrowsCancellationException - Http1RawResponseStreamConformanceTests
  • CancelPendingValueTask - ReadAsync_CancelPendingValueTask_ThrowsCancellationException (disabled in PR Disable ReadAsync_CancelPendingValueTask_ThrowsCancellationException for Http1 #72854 on 7/27)
    • System.Net.Http.Functional.Tests.Http1RawResponseStreamConformanceTests.ReadAsync_CancelPendingValueTask_ThrowsCancellationException - Http1RawResponseStreamConformanceTests
    • System.Net.Http.Functional.Tests.Http1CloseResponseStreamConformanceTests.ReadAsync_CancelPendingValueTask_ThrowsCancellationException - Http1CloseResponseStreamConformanceTests

Regression on 7/20

Failures 6/30-8/5:

Date Runs
8/5 1x PR occurrence - likely older source code
8/4 1x PR occurrence - likely older source code
8/3 1x PR occurrence - likely older source code
7/27 24x (1x in Official runs)
7/26 29x (2x in Official runs)
7/25 12x (2x in Official runs)
7/24 8x (3x in Official runs)
7/23 5x (1x in Official runs)
7/22 20x (0x in Official runs)
7/21 61x (5x in Official runs)
7/20 35x (7x in Official runs)

Given the high frequency, we should disable the test ASAP.

let timeouts = (includePR : bool) {
cluster('engsrvprod.kusto.windows.net').database('engineeringdata').WorkItems
| where Status == "Timeout"
| where FriendlyName startswith "System.Net.Http.Functional.Tests"
//| where Finished > datetime('2021-09-07')
| distinct JobId, WorkItemId, Name, FriendlyName, ConsoleUri
| join kind=inner (cluster('engsrvprod.kusto.windows.net').database('engineeringdata').Jobs
    | where ((Branch == 'refs/heads/main') or (Branch == 'refs/heads/master') or (includePR and (Source startswith "pr/")) or (Branch startswith 'refs/heads/release/6.0'))
    | where Type startswith "test/functional/cli/"
        and not(Properties contains "runtime-staging")
    | summarize arg_max(Finished, Properties, Type, Branch, Source, Started, QueueName) by JobId
| project-rename JobType = Type) on JobId
| order by Finished desc
};
timeouts(true);

Original report

Couldn't add runfo information since the work item has the same name as previously fixed issues. But this timeout behavior has occurred in the following rolling build runs:
https://dev.azure.com/dnceng/public/_build/results?buildId=1893373&view=results
https://dev.azure.com/dnceng/public/_build/results?buildId=1893617&view=results
https://dev.azure.com/dnceng/public/_build/results?buildId=1893157&view=results
https://dev.azure.com/dnceng/public/_build/results?buildId=1893763&view=results

The timeout occurs when executing the long-running test System.Net.Http.Functional.Tests.Http1RawResponseStreamConformanceTests.ReadAsync_CancelPendingTask_ThrowsCancellationException
These timeout are occurring on both runtime and runtime-extra-platforms

Related to #72209

{
  "ErrorMessage": "[Long Running Test] 'System.Net.Http.Functional.Tests.Http1RawResponseStreamConformanceTests.ReadAsync_CancelPending",
  "BuildRetry": false
}

Report

Build Definition Test Pull Request
64914 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77169
64782 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77487
64481 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #76859
64009 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77449
63786 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77386
63754 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77261
62760 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77398
62577 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #74820
62186 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution
61912 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77398
61472 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77400
60736 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77245
60520 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution
60215 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77336
59963 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77336
59761 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77196
59151 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77293
59055 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77196
58799 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #74820
58224 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #72717
57227 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77219
57205 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77220
56479 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77149
56287 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77052
55730 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution
55653 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77107
55580 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77127
55028 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #76942
54938 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #62863
54792 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77116
54391 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #76361
54104 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #76361
53859 dotnet/runtime System.Net.Http.Functional.Tests.WorkItemExecution #77088

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
3 18 33
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 21, 2022
@ghost
Copy link

ghost commented Jul 21, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Couldn't add runfo information since the work item has the same name as previously fixed issues. But this timeout behavior has occurred in the following rolling build runs:
https://dev.azure.com/dnceng/public/_build/results?buildId=1893373&view=results
https://dev.azure.com/dnceng/public/_build/results?buildId=1893617&view=results
https://dev.azure.com/dnceng/public/_build/results?buildId=1893157&view=results
https://dev.azure.com/dnceng/public/_build/results?buildId=1893763&view=results

The timeout occurs when executing the long-running test System.Net.Http.Functional.Tests.Http1RawResponseStreamConformanceTests.ReadAsync_CancelPendingTask_ThrowsCancellationException
These timeout are occurring on both runtime and runtime-extra-platforms

Related to #72209

Author: tlakollo
Assignees: -
Labels:

area-System.Net.Http, untriaged

Milestone: -

@tlakollo tlakollo added the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Jul 21, 2022
@karelz karelz added this to the 7.0.0 milestone Jul 21, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jul 21, 2022
@karelz karelz changed the title System.Net.Http.Function.Tests timeout when executing long running test Long running test: ReadAsync_CancelPendingTask_ThrowsCancellationException Jul 21, 2022
@karelz
Copy link
Member

karelz commented Jul 21, 2022

Triage: regression introduced on 7/20. High frequency, we should disable it ASAP.

@karelz
Copy link
Member

karelz commented Jul 21, 2022

@stephentoub I asked @rzikm to disable the test. Do you plan to do more (curious about your self-assignment).

@stephentoub
Copy link
Member

stephentoub commented Jul 21, 2022

curious about your self-assignment

I changed this test yesterday.

@stephentoub stephentoub removed their assignment Jul 21, 2022
@stephentoub
Copy link
Member

I'm having trouble reproing locally. The changes I made yesterday were:

Re-reviewing, I think the tests are valid. Most likely there's some race condition somewhere in the implementation of either SocketsHttpHandler or the test stream it's using that causes the waiting for cancellation in these tests to hang. But it's hard to say for sure without a repro.

I've unassigned myself.

@stephentoub stephentoub added disabled-test The test is disabled in source code against the issue and removed blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' labels Jul 21, 2022
@rzikm rzikm removed their assignment Jul 22, 2022
@karelz karelz added the test-run-core Test failures in .NET Core test runs label Jul 22, 2022
@rzikm rzikm self-assigned this Jul 26, 2022
@rzikm rzikm removed their assignment Jul 26, 2022
@CarnaViire
Copy link
Member

System.Net.Http.Functional.Tests.Http1CloseResponseStreamConformanceTests.ReadAsync_CancelPendingValueTask_ThrowsCancellationException disabled in #72854

@stephentoub
Copy link
Member

The issue here stems from HttpConnection not passing the CancellationToken down into the underlying Stream.ReadAsync operations and instead registering with the token to Dispose of the stream if cancellation is requested. That causes a problem for this specific test because the test is simply issuing a read (with no writes) and waiting for a cancellation request. And while it sounds like the issue there has to do with the token not canceling the read, it's actually that the token is the only thing keeping alive all of the state associated with the test. By not passing the token in, everything becomes available for collection because the timer that would be rooting the whole operation doesn't get handed a reference to the leaf async state machine.

@wfurt
Copy link
Member

wfurt commented Aug 3, 2022

I was running with change @stephentoub suggested as try and the issue disappears

+++ b/src/libraries/Common/tests/StreamConformanceTests/System/IO/StreamConformanceTests.cs
@@ -536,6 +536,7 @@ protected async Task ValidateCancelableReadAsyncTask_AfterInvocation_ThrowsCance
             }

             var cts = new CancellationTokenSource();
+            cts.Token.Register(_ => { }, stream);

             Task<int> t = stream.ReadAsync(new byte[1], 0, 1, cts.Token);

While this is test change, we should still keep this issue open for product fix. Since this is now understood and it seems like it is very corner case specific to not using the stream after cancelation and using special in-memory stream, I'm going to move it to future (and leave the tests disabled for now)

@wfurt wfurt modified the milestones: 7.0.0, Future Aug 3, 2022
@karelz
Copy link
Member

karelz commented Aug 5, 2022

@wfurt can you please clarify what kind of product change is needed?
Is it "just": HttpConnection should pass CancellationToken to Stream.ReadAsync operations?

@karelz karelz added the bug label Aug 5, 2022
@karelz karelz unassigned wfurt Aug 5, 2022
@karelz karelz changed the title Long running test: ReadAsync_CancelPendingTask_ThrowsCancellationException Long running test: ReadAsync_CancelPendingTask_ThrowsCancellationException and ReadAsync_CancelPendingValueTask_ThrowsCancellationException Aug 5, 2022
@wfurt
Copy link
Member

wfurt commented Aug 5, 2022

yes @karelz. HttpConnection use token registration to dispose underlying stream on cancelation. However, because we don't pass it into the stream, it is not root and eligible for collection in this particular test. When the finalizer runs, we skip the cleanup logic in Dispose and the task hangs.
There may be perf penalty or other consequences for changing the cancelation so we should approach it carefully.

@MihaZupan
Copy link
Member

@stephentoub isn't the issue more so that when we register for cancellation on HttpConnection, we do so using a weak reference?
If we had passed a strong ref to the connection instead, the CTS would be rooting the connection (and therefore also the stream).

I am not sure I understand this comment given that we always control the registrations ourselves internally:

// - A weak reference to this HttpConnection is stored in the cancellation token, to prevent the token from
// artificially keeping this connection alive.

@jkotas jkotas added the Known Build Error Use this to report build issues in the .NET Helix tab label Oct 16, 2022
@stephentoub
Copy link
Member

stephentoub commented Oct 27, 2022

isn't the issue more

Not really. That was by design, and because of that, what I wrote holds.

I am not sure I understand this comment given that we always control the registrations ourselves internally:

If memory serves, the concern was that we're awaiting on the Read/WriteAsync operations on the underlying stream, and if those operations didn't end up rooting their continuations, the connection object could/should get collected, but storing a strong reference to it into the token would thwart that. Basically exactly what this test is showing is a possibility.

We could decide we no longer care about that, of course.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 27, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 27, 2022
@karelz karelz modified the milestones: Future, 8.0.0 Mar 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http bug disabled-test The test is disabled in source code against the issue in-pr There is an active PR which will close this issue when it is merged Known Build Error Use this to report build issues in the .NET Helix tab test-run-core Test failures in .NET Core test runs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants