Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make NuGet tests more stable #10931

Merged
merged 1 commit into from
Nov 18, 2024
Merged

Conversation

brettfo
Copy link
Contributor

@brettfo brettfo commented Nov 13, 2024

There have been some recent stability issues with the NuGet tests, specifically the test hanging for several hours. Being perfectly honest, I have no idea what's happening, but I've made the following changes that should make the issues easier to find:

  1. C# unit tests are no longer wrapped in a Ruby unit test; they've been migrated directly to ci-test. This makes the test output easier to read.
  2. The C# tests are run with a new argument --blame-hang-timeout 5m which means any individual test that is still running after 5 minutes will be killed. I think the longest running C# test is currently ~10 seconds, so the 5 minute limit should be plenty. If this timeout is hit, we'll know exactly what test caused it.
  3. The Ruby unit tests are no longer running in parallel (turbo_tests was removed). I suspect this was the real culprit, because in some of the hang cases the C# tests never even got a chance to start.
  4. The Ruby unit tests were wrapped in a timeout of 5 minutes. Most Ruby unit tests finish in under a second and the longest is around 5 seconds, so the 5 minute limit should be plenty. If the timeout is hit, we'll know exactly what test caused it.

Downsides to this change. The Ruby unit tests are no longer run in parallel across 4 cores; they're run serially, however this only results in a CI increase of a few seconds; all pure Ruby tests usually run in under 30 seconds anyway.

While repeatedly running the CI, I found some flaky Ruby unit tests in file_updater_spec.rb. I've moved the handling of the temp job file closer to where it's used and I haven't seen the flakiness return.

So far I've run the CI 20 times with the current changes (modulo some minor tweaks) and haven't seen the timeout/hang, where before I would see it ~1/10 times. I will run this several more times to be sure. Apologies to whoever pays the CI bill.

@github-actions github-actions bot added the L: dotnet:nuget NuGet packages via nuget or dotnet label Nov 13, 2024
@brettfo brettfo force-pushed the dev/brettfo/nuget-test-hang branch 6 times, most recently from 814c71f to d58b6fa Compare November 13, 2024 22:52
@brettfo brettfo changed the title TEST: turn off native C# tests directly call dotnet test instead of wrapping call in a unit test Nov 13, 2024
@brettfo brettfo force-pushed the dev/brettfo/nuget-test-hang branch 12 times, most recently from 0d4eb48 to b53440d Compare November 15, 2024 19:04
@brettfo brettfo changed the title directly call dotnet test instead of wrapping call in a unit test make NuGet tests more stable Nov 15, 2024
@brettfo
Copy link
Contributor Author

brettfo commented Nov 17, 2024

At this point I've run the CI ~50 times and not seen the original hang, and ~40 times and not seen the flaky Ruby tests. I'm calling this ready.

@brettfo brettfo marked this pull request as ready for review November 17, 2024 16:54
@brettfo brettfo requested a review from a team as a code owner November 17, 2024 16:54
@sachin-sandhu sachin-sandhu merged commit b7ae5f8 into main Nov 18, 2024
72 checks passed
@sachin-sandhu sachin-sandhu deleted the dev/brettfo/nuget-test-hang branch November 18, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L: dotnet:nuget NuGet packages via nuget or dotnet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants