-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HTTP/3] Stress test regression #56310
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @dotnet/ncl Issue DetailsLast success run was on 7/20
(strange that numbers are even lower than reported in #55810) Regression seen from 7/21 It goes like this:
(also note the numbers that are now bigger than in #55810)
|
I realized that it's not really a regression, it's more like #55810 got fixed somehow. Before, we had super small numbers of requests and they didn't change since 1st minute of running. Now, we finally have more decent-ish numbers and that surfaced the issues we weren't able to see before. I wonder if connection getting closed and then not opening again is yet another symptom of #55979 Sadly I'm unable to reproduce the problem (closed connection) locally even though it's the same docker container :( It seems I run out of ports earlier due to #56151, will need to check out after it gets fixed. |
Triage: no time to investigate yet, Natalia wasn't able to reproduce locally in the same container. If I don't reproduce it either, we might punt it. |
This issue is now hidden by new issue that affects stress #57647 so it is not actionable at the moment. |
120 errors from 249 931 in 30 mins, looks much better now. I'll keep investigating the specific errors, but at least we now have some reasonable data. EDIT: comparing to numbers from #55810, we still have 10%-20% throughput of H/1.1 | H/2... |
So the latest numbers (from #58442 - Expect 100 Continue excluded) are:
Double the throughtput (582 k vs 249 k) and only 30 errors (down from 120) where I see 2 major types:
I'll continue investigating, but in general it looks that the state is getting better and better 👏 |
I'm wondering if we can ever achieve reliable stress with unreliable listener. |
If we can detect the "connection rejected" case that msquic does, then we can exclude it from test results. I'm not sure if the "connection aborted" errors above are this or not... @ManickaP? |
Also seeing crashes in H/3 stress from time to time: |
Report:
EDIT: |
I was able to get to the 30 minutes clean run locally. It'll take time to get to this state in the official pipeline. I'm closing this, since we don't have any regressions in stress anymore. |
Last success run was on 7/20
https://dev.azure.com/dnceng/public/_build/results?buildId=1248255&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=1451f5f3-0108-5a08-5b92-e984b2a85bbd&l=700
Regression seen from 7/21
https://dev.azure.com/dnceng/public/_build/results?buildId=1250670&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=1451f5f3-0108-5a08-5b92-e984b2a85bbd
The log is big so it's not showing in UI, you'll need 'View raw log' button
It goes like this:
Stream aborted by peer (4294967295)
(means stream was disposed prematurely) or timeout or some other unexpected cancellation. Last statistic I was able to see before all requests will start to failConnection has been shutdown by transport. Error Code: CONNECTION_IDLE
Requesting HTTP version 3.0 with version policy RequestVersionExact while unable to establish HTTP/3 connection.
and they are showing till the end.(also note the numbers that are now significantly bigger than in #55810)
The text was updated successfully, but these errors were encountered: