Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posting build progress to github #236

Closed
jbergstroem opened this issue Nov 2, 2015 · 33 comments
Closed

Posting build progress to github #236

jbergstroem opened this issue Nov 2, 2015 · 33 comments

Comments

@jbergstroem
Copy link
Member

I've been tinkering with a set of scripts that posts feedback to github. Here's a quick outline on how it works:

  • There's a boolean called POST_STATUS_TO_PR in the node-test-pull-request job. Make sure that's checked and enter your PR.
  • Each worker as part of the node-test-commit will ping github and say they've started building/testing (as of now, see Unfold make-ci in jenkins #229 for improvements in this regard). These workers are currently: ci/linter, ci/freebsd, ci/smartos, ci/linux, ci/plinux, ci/windows-fanned, ci/osx ci/arm and ci/arm-fanned.
  • Once the test suite completes we're checking if the test suite returned either unstable or success status. If that's the case we post a success and a text mentioning total tests run plus any skips (linter accts differently plus needs more work since we need tap output; see this example pr for work in progress).
  • If the suite has any other status (currently only supports failed tests) we post the number of failed tests, skipped and total.

In order to achieve above I had to install a new jenkins plugin that gives us access to execute logic as part of the post-build phase. I'm using the xml api and parse output in python (for portability) when needed.

Have a look at a few of the above jobs (edit in jenkins) for how it currently works. There's a few things that still needs to be done, where at least one would be up for discussion:

So, the last one is a bit of a tricky problem. Since we can't really trust input from a PR this is an issue. Jenkins has a few ways of storing passwords (credential store and global passwords) and tries to mask it in some cases; but that sandbox was pretty easy to escape from.

@jbergstroem
Copy link
Member Author

One option to solve the credentials issue would be to create a small proxy script that adds this credential. This creates another trust issue since we then lack logic to control when it should fire, but that might be the better trade.

@Starefossen
Copy link
Member

Just making sure, and you are probably aware, that there exists an GitHub OAuth API scope which only grants access to the Status API – namely the repo:status scope.

@jbergstroem
Copy link
Member Author

@Starefossen yep, that's what we're using but its still leakable in its current form.

@DavidTPate
Copy link

The credentials is where it really starts to get difficult, I haven't seen a good way yet. If you look at Travis CI for example. When dealing with encrypted things (such as credentials or keys) it just doesn't provide them with PRs that aren't from the same repository.

You could totally limit the impact of the credentials being discovered by limiting the scope of the keys (and you want to do this regardless). But the ideal case would be to not have the keys leaked in the first place. What typically happens with most SaSS products that do this right now is that their status is reported by a service that they manage which has open access. It's not exactly ideal since anyone could update the status, but it gets us to the point where we have kept our credentials safe.

The last part would be limiting access to the service for updating statuses. I'm not familiar with the infrastructure, but if a web service can be created which simply updates the status for PRs and either have access limited by CIDRs, Routing, or some other method that would get us pretty much there.

@jbergstroem
Copy link
Member Author

@DavidTPate that's roughly what I suggested with my 'proxy' script. The problem is still that anyone can call it if they know what they're doing. The layer of security by obscurity is tricky to get rid of when you allow people to modify source code.

@thefourtheye
Copy link

Given that our CI is mostly red these days, if we could somehow show the list of failing tests and their corresponding environments, it would be awesome.

@DavidTPate
Copy link

Yeah, someone who knows what they are doing would still be able to manipulate it it would just have a tougher barrier to get to that point. It's definitely a tough problem.

The only way that I can think to really do this and limit exposure would be to have some credential generated for each build that allows exactly one call to update the build status for each job.

@jbergstroem
Copy link
Member Author

Thing is. If you know what you're doing you can privilege escalate other stuff (or rm -rf) as well. I think this is more about finding "good enough" security, then trust that people that start jobs actually glance over a PR before submitting it for execution.

@DavidTPate
Copy link

@jbergstroem Yeah, that seems to be the case to me, there just doesn't seem to be a good way to completely secure something like this and "good enough" is a great start.

@orangemocha
Copy link

Is there any way that we can distinguish Jenkins' success from unstable (only flaky tests failed)? I am concerned that if people start relying on the status checks to vet their PRs, that we'll lose visibility on flaky tests. Reporting the list of failed flaky tests back to GitHub would be ideal.

@jbergstroem
Copy link
Member Author

@orangemocha at github, not really -- we've got in progress, success or failed. What I've done though is added a note in the text mentioning how many flaky tests were run.

@jbergstroem
Copy link
Member Author

After giving it some thought I'm thinking we should do what @rvagg has been suggesting;

  1. have a hook in node-test-pull-request that pings a server that starts polling
  2. poll node-test-commit for slaves
  3. poll each slave for updates until the parent is closed

polling sucks but this completely avoids any security-related issues and makes it more portable, meaning others can benefit from our work.

@Starefossen
Copy link
Member

Makes sense. Nothing wrong in taking the secure route here.

@DavidTPate
Copy link

That sounds like a good solution, polling does suck but it seems like a very acceptable tradeoff here.

@Starefossen
Copy link
Member

@jbergstroem what is the status (pun not intended) here? I will have more time the next few weeks to help out with this if needed.

@jbergstroem
Copy link
Member Author

@Starefossen great news! Haven't started with this yet. Let coordinate something.

@Starefossen
Copy link
Member

Great! Is the polling service still the plan? I have played around with the node-test-pull-request REST API and it looks like we can get all the status we need from that single endpoint without having to poll the individual node-test-commit jobs.

https://ci.nodejs.org/job/node-test-pull-request/932/api/json

  "subBuilds": [
    {
      "abort": false,
      "build": {
        "subBuilds": [
          {
            "abort": false,
            "build": {

            },
            "buildNumber": 1395,
            "duration": "16 min",
            "icon": "blue.png",
            "jobName": "node-test-commit-arm",
            "parentBuildNumber": 1359,
            "parentJobName": "node-test-commit",
            "phaseName": "Tests",
            "result": "SUCCESS",

Just need to know how this endpoint behaves during a build and we should be good to go, my guess is that "building": true while it is building.

@jbergstroem
Copy link
Member Author

While at it, I think we should make something more generic. My thoughts are currently something in style with:

  • create jobs with endpoints. a job would represent a job at jenkins. also, understand the notion of sub-jobs.
  • create an api endpoint which receives a post for job notifications. This could be from github or jenkins (jenkins in our case, every time a job is created)
  • poll the specific job about connected slaves
  • store and update states;
    • what's going on right now?
    • is github successfully updated about it?
  • once a slave finishes:
    • report back to gh, finish or pass
    • store all information since things like flaky tests might be available at github in a later version.

We could also have a generic poller -- wouldn't be my preferred route though.

@Starefossen
Copy link
Member

Not sure I got the first create jobs part, but otherwise this is my understanding on how this service could be implemented:

# post jenkins build status to github pull request
algorithm jenkins-github-status is
  input: Integer job_id
         Integer pr_id
  output: Void

  # save sub-build result between loop intervals
  cache ← new Map()

  do
    job ← jenkins.getJob(job_id)

    for build in job.subBuilds do
      # only update GitHub if status has changed since last loop
      if cache[build.jobName] ≠ build.result
        cache[build.jobName] ← build.result
        github.postStatus(pr_id, build.jobName, build.result)
      end if                 
    end for                  

  while                      
    job.building == true     

  end do                     

  return

end algorithm

This is obviously a simplification as you can not post a status to GitHub without having the shasum of one of the commits in the pull request, and since our builds take 16+ minutes to complete there should probably be a 60 seconds delay between the while loops intervals etc.

@jbergstroem
Copy link
Member Author

Generally looks good. Few comments;

  • jenkins has an guesstimation for job length; we can use that as part of our "polling interval frequency algoritm"
  • the job definition probably needs to be thought through; it can contain things like what input is expected to launch a poller against a job (looking up through shasum might not be impossible since that information is available in the node-test-commit sub-task)

@Starefossen
Copy link
Member

  • jenkins has an guesstimation for job length; we can use that as part of our "polling interval frequency algoritm"

Good suggestion!

  • the job definition probably needs to be thought through; it can contain things like what input is expected to launch a poller against a job (looking up through shasum might not be impossible since that information is available in the node-test-commit sub-task)

From the (current) node-test-pull-request action parameters we can query the GitHub API to get the commits for the pull request under testing (TARGET_GITHUB_ORG + TARGET_REPO_NAME + PR_ID) assuming the service has access to that repo of course.

We can also use the POST_STATUS_TO_PR parameter to control whether the service should post statuses to GitHub or not.

  "actions": [
    {
      "parameters": [
        {
          "name": "TARGET_GITHUB_ORG",
          "value": "nodejs"
        },
        {
          "name": "TARGET_REPO_NAME",
          "value": "node"
        },
        {
          "name": "PR_ID",
          "value": "4116"
        },
        {
          "name": "POST_STATUS_TO_PR",
          "value": true

@jbergstroem
Copy link
Member Author

Yes, but we need to unfold this into the workers at node-test-commit. At that level we'll have sha1 as well. Just saying that it'd be pretty easy to find a job based on sha1 (what we would get from github if we chose that route) since in most cases there'll only be one test-commit running the same sha.

Not saying using sha is the way to go here; I just see this util useful for more people than us.

@jbergstroem
Copy link
Member Author

The main problem with a hook from jenkins is that we'd have to share a secret; similar to the constraints of the current solution. Polling would remedy that, but so would sha1 from github as well; for instance having a hook receive input from github on new pr's or comments on a pr, checking pr id and/or together with link in comment.

@Fishrock123
Copy link
Contributor

As a note, nodejs-github-bot now posts GitHub statuses. The live bot hasn't been updated yet, but it should first roll out for readable-stream, nodejs.org either at the same time or later. (pr#15).

I'll try look into how we might do this, but any help on the build end would be great.

Some important notes:

Possible GitHub statuses: success, pending, failure, error. Additional "description" info can also be provided.
Statuses also have a url parameter to link directly to the build.

We currently do it by PR for travis, but fully by commit only is totally possible. So we could either do the linking via PR id or by sha.

@jbergstroem
Copy link
Member Author

@Fishrock123 I have a few suggestions/ideas; will post them shortly.

@Fishrock123
Copy link
Contributor

Is there any way that we can distinguish Jenkins' success from unstable (only flaky tests failed)? I am concerned that if people start relying on the status checks to vet their PRs, that we'll lose visibility on flaky tests. Reporting the list of failed flaky tests back to GitHub would be ideal.

I contacted GitHub support about this, and their suggestion was to report back flaky tests as a separate status. I'm not really sure that is possible to separate out of jenkins easily though?

(Actually maybe I'm overthinking it and it isn't that hard..)

@Fishrock123
Copy link
Contributor

If it's green we can just report double green (or just a single green) status. If it is flay we change/add one that notes that flaky tests failed.

@jbergstroem
Copy link
Member Author

My plan of attack is to report each sub-worker of node-test-pr with results (if skip or fail; mention fails, skips and total) as well as introducing linter results and doing new code for commit messages.

@Fishrock123
Copy link
Contributor

@jbergstroem do you plan on doing directly from the CI, or just providing hooks to the bot? 9I sorta prefer the latter because then we can do a lot of it in JS..)

@jbergstroem
Copy link
Member Author

@Fishrock123 No, we can't do it from CI. We need to have the bot poll both github [pubsubhub events] and CI (api) to match what's being run, then poll each sub-job and post pending/ok/fail. I'm a bit in transit this week but are putting together a document that should outline what I see needs being done.

@Fishrock123
Copy link
Contributor

@jbergstroem ok sounds good, the bot sound be able to adapt to that. I'm pretty sure anything we do will be less of a mess than trying to proxy travis haha. :)

@jbergstroem
Copy link
Member Author

@Fishrock123: yeah; really looking forward to seeing this in action.

@maclover7
Copy link
Contributor

Can this be closed in favor of #790?

@rvagg rvagg closed this as completed Nov 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants