Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify step signatures #2210

Merged
merged 11 commits into from
Aug 10, 2023
Merged

Verify step signatures #2210

merged 11 commits into from
Aug 10, 2023

Conversation

moskyb
Copy link
Contributor

@moskyb moskyb commented Jul 17, 2023

This PR: Enables the buildkite-agent to verify incoming jobs against signatures uploaded in their pipeline uploads.

This is done by a process (vaguely):

  • When accepting a job, the Job sent by the backend will (WIP) now include a step attribute, which contains a subset of information contained in the step upload - at current, it's the step's command and signature, though as we sign more elements of the signature, this will expand to include other elements
  • After accepting the job, but before kicking off the bootstrap process, the agent computes a signature for the step, and compares it to the signature shipped in the accept payload
  • With the step verified, we need to check that the Job's elements match the Step's
  • If they match, run the job
  • If they don't (or there's a signature mismatch, or any other error happens during this process), fail the job with exit status -1, and an informational signal reason - this is all handled very similarly to the existing pre-bootstrap hook

@moskyb moskyb force-pushed the pdp-1125-verify-steps branch 2 times, most recently from 29b475c to ea0f66e Compare July 17, 2023 05:59
@moskyb moskyb force-pushed the pdp-1125-verify-steps branch 3 times, most recently from ece0f77 to 856cce4 Compare July 25, 2023 05:38
@triarius triarius requested a review from a team July 25, 2023 06:47
Copy link
Contributor

@triarius triarius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

I think all my comments are just various degrees of nits.

But I would encourage exploring what putting this stuff in a different package would look like.

Comment on lines +29 to +34
JobVerificationNoSignatureBehavior string
JobVerificationInvalidSignatureBehavior string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain what these are. The other one seems obvious, but it deserves a doc comment too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see they're explained below, but this is a higher level. I think it's good to talk about them here as well.

agent/integration/test_helpers.go Show resolved Hide resolved
agent/job_runner.go Outdated Show resolved Hide resolved
agent/job_runner.go Outdated Show resolved Hide resolved
agent/run_job.go Outdated Show resolved Hide resolved
agent/run_job.go Outdated Show resolved Hide resolved
agent/run_job.go Show resolved Hide resolved
agent/run_job.go Show resolved Hide resolved
agent/run_job.go Outdated Show resolved Hide resolved
agent/run_job.go Outdated Show resolved Hide resolved
Comment on lines +127 to +138
case "/jobs/" + jobID + "/chunks":
sequence := req.URL.Query().Get("sequence")
seqNo, _ := strconv.Atoi(sequence)
r, _ := gzip.NewReader(bytes.NewBuffer(b))
uz, _ := io.ReadAll(r)
t.logChunks[seqNo] = string(uz)
rw.WriteHeader(http.StatusCreated)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error handling? what's that?

@moskyb moskyb force-pushed the pdp-1125-verify-steps branch 4 times, most recently from 5a4bc99 to 9ac378a Compare July 27, 2023 01:16
@moskyb moskyb requested review from DrJosh9000 and a team July 27, 2023 03:05
@moskyb moskyb force-pushed the pdp-1125-verify-steps branch 2 times, most recently from 8e2c62c to 06fc81e Compare July 31, 2023 23:38
@moskyb moskyb force-pushed the pdp-1125-verify-steps branch 2 times, most recently from 402843b to 02ffc45 Compare August 2, 2023 00:53
@moskyb moskyb marked this pull request as ready for review August 2, 2023 03:31
@moskyb moskyb changed the title [WIP] Verify step signatures Verify step signatures Aug 2, 2023
Copy link
Contributor

@DrJosh9000 DrJosh9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff ...but still reviewing...

agent/run_job.go Show resolved Hide resolved
agent/run_job.go Outdated Show resolved Hide resolved
agent/run_job.go Outdated Show resolved Hide resolved
agent/verify_job.go Outdated Show resolved Hide resolved
@@ -34,6 +34,7 @@ import (
"github.com/mitchellh/go-homedir"
"github.com/urfave/cli"
"golang.org/x/exp/maps"
"golang.org/x/exp/slices"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh goodness. When did we start depending on exp packages?

At least these will be making their way into Go 1.21 largely the same.

Copy link
Contributor Author

@moskyb moskyb Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

um, it's, um, it's hard to say, exactly, when (May 5th 2022) this started or who (extremely me) started it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welp, looks like maps.Keys isn't making it to stdlib maps (yet): golang/go#61538

(slices.Contains is still in, so I'll let this PR slide)

@moskyb moskyb force-pushed the pdp-1125-verify-steps branch 2 times, most recently from 1f7845a to 6cc495d Compare August 2, 2023 05:31
Copy link
Contributor

@triarius triarius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shaping up to be amazing!

agent/verify_job.go Outdated Show resolved Hide resolved
clicommand/agent_start.go Show resolved Hide resolved
agent/integration/test_helpers.go Outdated Show resolved Hide resolved
agent/run_job.go Show resolved Hide resolved
exit.Status = -1
exit.SignalReason = "job_verification_failed_with_error"
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the if/else chain I suggested last time looks better here.

I, too, generally prefer switches to if/else chains, but here:

  1. There is no added efficiency in using a switch as all the case statements have to be evaluated in order in the worst case as there is no value to switch on.
  2. This statement leaks the ise variable. The if/else chain would scope it exclusively to the appropriate branch

But I'm not going to die on this hill.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were to change anything, it would be to replace default: with case err != nil:, and not make err == nil a case in the switch.

As in, the other side of "indent error flow" is "don't indent non-error flow".

agent/verify_job.go Show resolved Hide resolved
Copy link
Contributor

@triarius triarius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@triarius
Copy link
Contributor

triarius commented Aug 3, 2023

This is great!

@moskyb moskyb force-pushed the pdp-1125-verify-steps branch 3 times, most recently from 5bf9034 to c78f0de Compare August 3, 2023 05:41
Copy link
Contributor

@DrJosh9000 DrJosh9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huzzah! 🧃

@@ -18,6 +18,7 @@ type CommandStep struct {
Command string `yaml:"command"`
Plugins Plugins `yaml:"plugins,omitempty"`
Signature *Signature `yaml:"signature,omitempty"`
Matrix any `yaml:"matrix,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also have to set the new field in unmarshalMap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh good point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and interpolate ...

@@ -34,6 +34,7 @@ import (
"github.com/mitchellh/go-homedir"
"github.com/urfave/cli"
"golang.org/x/exp/maps"
"golang.org/x/exp/slices"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welp, looks like maps.Keys isn't making it to stdlib maps (yet): golang/go#61538

(slices.Contains is still in, so I'll let this PR slide)

@@ -53,71 +54,90 @@ Example:

$ buildkite-agent start --token xxx`

var noSignatureBehaviors = []string{agent.VerificationBehaviourBlock, agent.VerificationBehaviourWarn}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used between signature-missing and signature-invalid, so perhaps

Suggested change
var noSignatureBehaviors = []string{agent.VerificationBehaviourBlock, agent.VerificationBehaviourWarn}
var verificationFailureBehaviors = []string{agent.VerificationBehaviourBlock, agent.VerificationBehaviourWarn}

api/jobs.go Outdated
ChunksFailedCount int `json:"chunks_failed_count,omitempty"`
}

func (j Job) ValuesForFields(fields []string) (map[string]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Job looks big enough that I'd make this receiver *Job (for not-copying-it-over-stack reasons, not for modifying-the-fields reasons).

Comment on lines 62 to 63
// Now that the signature of the job's step is verified, we need to check if the fields on the job match those on the
// step. If they don't, we need to fail the job
Copy link
Contributor

@DrJosh9000 DrJosh9000 Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest it would be simpler to override BUILDKITE_COMMAND with step.Command, than to carefully compare each field.

But thinking about it, they would only differ if the backend (ours or the hypothetical attacker's) is doing crimes, in which case we probably shouldn't run anything. And the backend can't migrate away from passing BUILDKITE_COMMAND without breaking every old agent version. (Ah, but it could sniff the agent version then vary the job...)

Might be worth documenting that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to double check, are saying that it might be good to document that the only reason that step and job fields should differ is if an attacker is doing crimes on the backend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be the only reason, but it definitely seems like a reason

agent/run_job.go Outdated
r.verificationFailureLogs(err, r.InvalidSignatureBehavior)
if r.InvalidSignatureBehavior == VerificationBehaviourBlock {
exit.Status = -1
exit.SignalReason = "job_verification_failed_invalid_signature"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was that AGENT_REFUSED might already serve the purpose here, but I imagine you're trying to tick off the objective of showing verification failure in the timeline at the same time.

It feels a little strange to be embedding the failure mode into the signal reason like this. We probably don't want to pack signal reason with a whole bunch of values - more a class of error, rather than an error string, if that makes sense. Beyond feeling a bit wrong, think about how they're used - if a user wanted to catch the errors for an automatic retry, for example, they'd need to list them all. Have you considered one signal reason for these (perhaps VERIFICATION_FAILED, and showing the actual error in the agent/job logs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If signal reason was designed for users to catch and retry, then I don't think we should be overloading its use to also display information in the timeline. Is there another way of creating a job event for signing failures? If not, I think we should examine building one when the time to build the timeline feature arrives.

@moskyb moskyb force-pushed the pdp-1125-verify-steps branch 2 times, most recently from cd13203 to 5777275 Compare August 7, 2023 03:38
@moskyb moskyb merged commit f8118ff into main Aug 10, 2023
@moskyb moskyb deleted the pdp-1125-verify-steps branch August 10, 2023 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants