-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSX builds on Travis sometimes aren't scheduled #40672
Comments
Question: As an authorized TravisCI user, do you see a "Build" or "Rebuild" button on an individual OSX build that failed to run? For example, on this page: https://travis-ci.org/rust-lang/rust/jobs/212689371 |
In both of the example links, there are both Linux and macOS builds that started, but were cancelled while running. In addition, there are macOS builds that never started. It appears that this may be a case of the entire build being cancelled before running to completion, and there were just more macOS builds at the end of the list...so they weren't started before the build was started. I can't see from my public user whether the build was cancelled by a specific person, or by a feature like auto cancellation, or something else. I don't even know if TravisCI presents that information to privileged users. |
Oh thanks for the re-ping @CleanCut! So the problem here ended up being an upstream Travis issue which has since been fixed, so I'm going to close this. To answer your questions, though, yeah we can definitely retry/cancel individual builds if necessary. The problem here was that the build was scheduled for a VM but some (presumably Travis) bug ended up making it so the build made no progress. That clogged our whole parallelism capacity with "queued" builds but nothing was making progress. I've enabled auto cancellation as well now for PRs and branches, but unfortunately that probably wouldn't have helped this much :( |
Okay, cool. It's always nice when it's upstream's fault AND they fix it. ;-) |
I'm going to go ahead and reopen this because we've seen at least once, and I seem to recall more, that the OS X builders just don't start (e.g., #41661). Edit: Fixed PR link. |
@Mark-Simulacrum The PR link is wrong 😄 The macOS machines typically take more than 1 minutes to boot, and at most 216 builds can run simultaneously for the whole Travis (compared with the Linux ones which support over concurrent 700 builds). So probably the macOS doesn't start simply because is the queue is too long. |
Fixed the PR link. It's certainly possible that this is entirely out of our hands, but it'd be nice to track it anyway, I think. |
Ok, let's try to collect some samples over time to get an idea of frequency (along with concrete logs) and then we can look into emailing Travis. |
I think I'm seeing some instances of this. Not sure if these links will stick around so taking screenshots. On Build #52099, I see 4 apple builds with a ❗️ status: Job #52099.37 looks like this when I click into it: The other 3 look the same as well. It just looks like they never started? |
Saw 2 more of these in https://travis-ci.org/rust-lang/rust/builds/234609993 for #42142. |
Another occurrence in #42167, without any backlog on status page. |
More cases in #42275 normal PR run (w/o r+): xcode8.2, and all three xcode7 builders didn't start with the |
More cases in #43221 (PR w/o r+) , 3 of 5 xcode builds |
https://travis-ci.org/rust-lang/rust/builds/266074368 I've dropped Travis an e-mail. Please nobody cancel the above, I'm hoping a live example might help them fix, and since it's RustConf I figure the PR queue might be slow for a few days :) Edit: new plan - I killed one of them to get the queue moving, but left the other since it's just a PR build and doesn't block anything. |
The upstream infra provider for travis has restarted a faulty host. Apparently this should improve things and any builds from this point on should not hang on starting OSX (already-started builds may stay hanging). I guess now we wait and see. |
Not seen this for a while, closing. |
Examples:
The text was updated successfully, but these errors were encountered: