-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime-staging (i.e. iOS and tvOS) legs are timing out in PRs #58549
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @directhex Issue DetailsLooking at the runtime-staging builds from the last 24 hours, a lot mobile legs, i.e. iOS and tvOS timed out. Those legs already have a quite large timeout (180min). As this is causing PRs to turn red, we should fix this immediately. @steveisok @akoeplinger @MattGal
|
@ViktorHofer when you say a lot, how many are we talking? Also, do some fall in the classification of getting a super slow mac? Or are they just getting slightly slower macs? @akoeplinger Can you think of any immediate action we can take outside of shutting PR runs down temporarily? |
I did a manual check as I don't know how to query for timed out legs in AzDO or in Kusto. I looked at ~10 builds and half of those timed out. |
The ones that I looked at weren't slow mac machines. Building the repo didn't take longer than 20-30min but generating the app bundles took over two hours. |
I found these runs while trying to hunt super-slow macs, but I don't see any evidence of real slowness when looking at their stages that should be basically constant, e.g. cloning the repo. There's two major problems IMO:
Some sort of system where we don't build (Number of test assemblies) macOS app packages and only doing so when we actually need them will improve performance a lot. |
Still timing out, i.e. https://github.com/dotnet/runtime/pull/58011/checks?check_run_id=3543565778. |
Your archives here are just too big (it's something in the 15-20 GB range). I'd like to point out something about one of the listed runs to make my case.
It took this hosted macos machine 39 minutes to upload all those zips (wow), but the job it sent? It ran the entire 3 hours, 20 minutes of test work items in a mere 5 minutes, 18 seconds.
I can't tell you what to do here, but something has to make these payloads smaller somehow. I've long though about combining test assemblies into a batches for app packages, but the real decision will of course be owned by the team. |
This is something we are actively working on. We're trying to prove / validate a couple of ways to reduce build times and maybe payload sizes as well. No lightning quick solution though. |
#58965 shaves off a considerable amount of the time it takes to build the app bundles. I did also see what @MattGal noticed in that PR, but there's an interesting twist: The So there must be something else going on than "just" large archives. |
Yes, though I think the largeness is part of the problem since network and disk bandwidth of a virtual machine running on a host is commonly shared between the two agents running there. We've been hunting down examples of slow mac stuff recently and I realized I should re-enable the helix side of it, since perhaps that's what I'm looking for. This job does seem to have interesting data points for our tracking issue, https://github.com/dotnet/core-eng/issues/14027 2 minutes (to upload 4.8 GB) version: Name: Build tvOSSimulator x64 Release AllSubsets_Mono 50 minutes (to upload 3.64 GB) version Name: Build iOSSimulator x64 Release AllSubsets_Mono So the slower one actually uploaded 1.2 GB less, that's very interesting. We'll keep pushing on the IcM linked in the issue, but please have patience because there just isn't enough instrumentation yet to understand the problem. |
We don't have runtime-staging anymore, closing this old issue. |
Looking at the runtime-staging builds from the last 24 hours, a lot mobile legs, i.e. iOS and tvOS timed out. Those legs already have a quite large timeout (180min). As this is causing PRs to turn red, we should fix this immediately.
@steveisok @akoeplinger @MattGal
The text was updated successfully, but these errors were encountered: