Can use of the "resume build" button be tracked? #1969

sam-github · 2019-10-18T18:50:06Z

Detecting flakiness of builds is currently distributed to human beings, since if a PR fails to build it isn't necessarily a problem with flaky tests or flaky infrastructure... it could be the PR has a problem.

It occurs to me that there is a case where we can be pretty sure that the problem isn't the PR, its when the build is "resumed", and the same SHA builds sucessfully. This could mean that a change introduced in the PR is actually flaky, becuase it only sometimes passes, but humans usually interpret this as "my PR is good, something else was flaky" , which is also the interpretation of node-core-utils.

Is it possible to get from Jenkins a report on when builds were resumed, and what specifically passed on resume that had failed last time?

It strikes me it might be a gold mine for fixing flakiness in our CI.

rvagg · 2019-10-20T23:03:52Z

I did some grepping on the CI machine and it looks like there is a signifier but it doesn't look like that shows up on the UI.

com.tikal.jenkins.plugins.multijob.MultiJobResumeControl only shows up on a small number of builds and may be what we're after.

For example, in node-test-commit-linux, the 30253 build has:

    <com.tikal.jenkins.plugins.multijob.MultiJobResumeControl plugin="jenkins-multijob-plugin@1.32">
      <run class="matrix-build" resolves-to="hudson.model.Run$Replacer" plugin="matrix-project@1.14">
        <id>node-test-commit-linux#30251</id>
      </run>
    </com.tikal.jenkins.plugins.multijob.MultiJobResumeControl>

If we look at https://ci.nodejs.org/job/node-test-commit-linux/30253/, it's not obvious that this is anything special, but flip to https://ci.nodejs.org/job/node-test-commit-linux/30251/, the one linked in the config, we find the identical gitref and some failures. #30253, #30251 is not, these tests are flaky.

Complications:

We don't keep builds for long, I think we might be on a 7 or 5 day cycle, so any analysis would need to be done regularly(ish)
It's all in XML, yay
It's all locked away on ci.nodejs.org, which build/infra folks have access to - although it's not a super critical resource and we could discuss being slightly less restrictive if someone wants to spend time coming up with an analysis solution that can also exfiltrate the results to a usable place.

rvagg · 2019-10-20T23:07:14Z

FYI here's how they can be found, and the list of resumed builds in node-test-commit-linux in the last week:

$ grep -i '<com.tikal.jenkins.plugins.multijob.MultiJobResumeControl' /var/lib/jenkins/jobs/node-test-commit-linux/builds/*/build.xml | awk -F/ '{print $(NF-1)}'
30203
30216
30220
30253
30258
30286
30288
30312

github-actions · 2020-08-16T00:33:03Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

davidstanke mentioned this issue Feb 9, 2020

Introduction: I'm Dave. How can I help? #2171

Closed

github-actions bot added the stale label Aug 16, 2020

github-actions bot closed this as completed Sep 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can use of the "resume build" button be tracked? #1969

Can use of the "resume build" button be tracked? #1969

sam-github commented Oct 18, 2019

rvagg commented Oct 20, 2019

rvagg commented Oct 20, 2019

github-actions bot commented Aug 16, 2020

Can use of the "resume build" button be tracked? #1969

Can use of the "resume build" button be tracked? #1969

Comments

sam-github commented Oct 18, 2019

rvagg commented Oct 20, 2019

rvagg commented Oct 20, 2019

github-actions bot commented Aug 16, 2020