Increase the number of retries (1->3)#444
Conversation
|
Thanks for your pull request, @Geod24! |
|
@Geod24 I hope this won't cause a retry if a test suite failure is actually caused by a bug and not a networking failure? |
|
@WalterBright : That's the downside of it - it will. |
|
The obvious question - can we get a proper fix? |
|
Given that for a human it can be difficult to decide whether a bug is Heisenbug or not, what would be the algorithm to determine that automatically? |
|
Fine for me, the bill for those runners is fairly small. Did lower it to 1 in the past since many PR problems were not intermittent, but some are and human time is quite valuable. |
|
@MartinNowak : Perhaps you could take a look at https://github.com/dlang/ci/blob/master/buildkite/Dockerfile so contributors could run an agent as well ? I have a few servers that I would gladly use as permanent runners. |
All networking errors would be a great first approximation. |
Obviously, yes. The question is how to determine if a failure is networking related. For example, IIRC some (all?) |
|
Over here dlang/dmd#12409 (comment) the failure is: Surely that's detectable. |
|
There seems to be almost zero benefit for a smart retry over a 3x blunt retry, won't even be noticeably faster.
IIRC there is a 5 min. wait-time for running jobs when downscaling agents. If the problem occurs often, we could bump that a bit if there are many long-running jobs.
What's the benefit of someone else running servers? Sounds nice in theory, but reliability on a heterogeneous infrastructure run by an uncoordinated group is likely to suffer.
I guess a simpler dependency file might indeed help us to update the machines. Is this a real problem?
|
|
@MartinNowak thanks for the evaluation. I'll defer to your expertise in the matter! |
Any opinion on whether this is an actual problem @Geod24? |
|
@MartinNowak : The lack of machine has definitely hit us in the past. Sometimes there are no agents running for a visible amount of time, although I don't recall any time when it was more than an hour. I wasn't overly bothered by it because I just hit the retry button but @WalterBright was. |
|
Something that is a bit more lacking is the ability for projects to control their dependencies. With the changes we're seeing in the CI ecosystem (Travis disappearing, Github CI raising) I was hoping we could leverage the Github runner to simplify our current pipeline. That could theoretically make it easier for core contributors to run agents, too. |
Indeed we could rebuild the service in GitHub Actions 👍, might be more accessible for everyone, would require some additional setup time (hopefully fine). Not sure how long their free open source CI will last, I'd guess a while with MSFTs current strategy. |

CC @WalterBright