Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSX builds on Travis sometimes aren't scheduled #40672

Closed
alexcrichton opened this issue Mar 20, 2017 · 16 comments
Closed

OSX builds on Travis sometimes aren't scheduled #40672

alexcrichton opened this issue Mar 20, 2017 · 16 comments
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC

Comments

@alexcrichton
Copy link
Member

Examples:

@alexcrichton alexcrichton added the A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) label Mar 20, 2017
@CleanCut
Copy link
Contributor

@alexcrichton

Question: As an authorized TravisCI user, do you see a "Build" or "Rebuild" button on an individual OSX build that failed to run? For example, on this page: https://travis-ci.org/rust-lang/rust/jobs/212689371

@CleanCut
Copy link
Contributor

In both of the example links, there are both Linux and macOS builds that started, but were cancelled while running. In addition, there are macOS builds that never started.

It appears that this may be a case of the entire build being cancelled before running to completion, and there were just more macOS builds at the end of the list...so they weren't started before the build was started.

I can't see from my public user whether the build was cancelled by a specific person, or by a feature like auto cancellation, or something else. I don't even know if TravisCI presents that information to privileged users.

@alexcrichton
Copy link
Member Author

Oh thanks for the re-ping @CleanCut! So the problem here ended up being an upstream Travis issue which has since been fixed, so I'm going to close this.

To answer your questions, though, yeah we can definitely retry/cancel individual builds if necessary. The problem here was that the build was scheduled for a VM but some (presumably Travis) bug ended up making it so the build made no progress. That clogged our whole parallelism capacity with "queued" builds but nothing was making progress.

I've enabled auto cancellation as well now for PRs and branches, but unfortunately that probably wouldn't have helped this much :(

@CleanCut
Copy link
Contributor

Okay, cool. It's always nice when it's upstream's fault AND they fix it. ;-)

@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented May 2, 2017

I'm going to go ahead and reopen this because we've seen at least once, and I seem to recall more, that the OS X builders just don't start (e.g., #41661).

Edit: Fixed PR link.

@kennytm
Copy link
Member

kennytm commented May 2, 2017

@Mark-Simulacrum The PR link is wrong 😄

The macOS machines typically take more than 1 minutes to boot, and at most 216 builds can run simultaneously for the whole Travis (compared with the Linux ones which support over concurrent 700 builds). So probably the macOS doesn't start simply because is the queue is too long.

@Mark-Simulacrum
Copy link
Member

Fixed the PR link. It's certainly possible that this is entirely out of our hands, but it'd be nice to track it anyway, I think.

@alexcrichton
Copy link
Member Author

Ok, let's try to collect some samples over time to get an idea of frequency (along with concrete logs) and then we can look into emailing Travis.

@carols10cents
Copy link
Member

I think I'm seeing some instances of this. Not sure if these links will stick around so taking screenshots.

On Build #52099, I see 4 apple builds with a ❗️ status:

screen shot 2017-05-08 at 10 12 50 am

Job #52099.37 looks like this when I click into it:

screen shot 2017-05-08 at 10 10 33 am

The other 3 look the same as well. It just looks like they never started?

@carols10cents
Copy link
Member

Saw 2 more of these in https://travis-ci.org/rust-lang/rust/builds/234609993 for #42142.

@ishitatsuyuki
Copy link
Contributor

Another occurrence in #42167, without any backlog on status page.

@Mark-Simulacrum
Copy link
Member

More cases in #42275 normal PR run (w/o r+): xcode8.2, and all three xcode7 builders didn't start with the !.

@MaulingMonkey
Copy link
Contributor

More cases in #43221 (PR w/o r+) , 3 of 5 xcode builds !ed with blank logs:

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Jul 27, 2017
@aidanhs
Copy link
Member

aidanhs commented Aug 19, 2017

https://travis-ci.org/rust-lang/rust/builds/266074368
https://travis-ci.org/rust-lang/rust/builds/266074915

I've dropped Travis an e-mail. Please nobody cancel the above, I'm hoping a live example might help them fix, and since it's RustConf I figure the PR queue might be slow for a few days :)

Edit: new plan - I killed one of them to get the queue moving, but left the other since it's just a PR build and doesn't block anything.

@aidanhs
Copy link
Member

aidanhs commented Aug 22, 2017

The upstream infra provider for travis has restarted a faulty host. Apparently this should improve things and any builds from this point on should not hang on starting OSX (already-started builds may stay hanging). I guess now we wait and see.

@aidanhs
Copy link
Member

aidanhs commented Sep 19, 2017

Not seen this for a while, closing.

@aidanhs aidanhs closed this as completed Sep 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC
Projects
None yet
Development

No branches or pull requests

8 participants