Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Github workflow run information to the signing certificate #624

Closed
tonistiigi opened this issue May 31, 2022 · 22 comments
Closed

Add Github workflow run information to the signing certificate #624

tonistiigi opened this issue May 31, 2022 · 22 comments
Labels
enhancement New feature or request npm-ga

Comments

@tonistiigi
Copy link

tonistiigi commented May 31, 2022

Description

Currently the certificates created via Github token add the following GH info fields:

            X509v3 Subject Alternative Name: critical
                URI:https://github.com/<user>/<repo>/.github/workflows/<workflow>.yml@refs/heads/<branch>
            1.3.6.1.4.1.57264.1.3:
                <commit>
            1.3.6.1.4.1.57264.1.6:
                refs/heads/<branch>
            1.3.6.1.4.1.57264.1.5:
                <user>/<repo>
            1.3.6.1.4.1.57264.1.4:
                image
            1.3.6.1.4.1.57264.1.2:
                push
            1.3.6.1.4.1.57264.1.1:
                https://token.actions.githubusercontent.com

Would be good if this info would also include the workflow run information. Based on https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect#understanding-the-oidc-token the token includes the run ID, run count and attempt count.

Having this info available would make a more direct connection to where the actual process that did the signing ran and look up the build logs if they are available. Could imagine a case where untrusted party has managed to trigger a workflow run on their terms and then tries to make it look like a legitimate release/branch build.

@tonistiigi tonistiigi added the enhancement New feature or request label May 31, 2022
@haydentherapper
Copy link
Contributor

@asraa Any thoughts on this? Do you have the context on the set of information we initially chose to include in issued certs for GitHub?

Some of this information seems more like build provenance rather than identity. It also depends where we draw the line for what represents a GitHub Actions identity. If we add this additional information, each run will get its own certificate with different identity information in the OIDs, so we're changing the certificate identity from a per-workflow identity to a per-run identity.

@asraa
Copy link
Contributor

asraa commented May 31, 2022

Hey @tonistiigi - thanks for the issue!

Could imagine a case where untrusted party has managed to trigger a workflow run on their terms and then tries to make it look like a legitimate release/branch build.

Yes, I totally see the concern here.

To reiterate on the Hayden's point on identity: I think the information we include in the cert is meant to pinpoint the Workflow itself as an identity, not the Run (the repo, commit hash and ref pinning the workflow content). An anlogy would somewhat be that GitHub logins pinpoint on GitHub username/emails, and does not include extensions like "log in time" to map to the actual actor given that usernames can be overtaken.

Do you have a specific use-case in mind? I work on a project that creates build provenance that signs over the specific WorkflowRun information with Fulcio-issued certs, that may be a similar path for what you are trying to do.

@tonistiigi
Copy link
Author

I agree that fields like these (eg. links to build logs) could also be part of provenance but in the example case, the attacker has already managed to trigger a workflow. So they have full control of the builder. Where builder writes the provenance payload from the environment to say build logs is at path /logs/234 they can modify it to say /logs/123 instead. One thing they can't fake is the identity token from Github with the 123 value and therefore modify the signing identity for the payload.

There could even be a policy that verifies that the builder logs in provenance match up with the signer.

@haydentherapper
Copy link
Contributor

I think it'd be difficult to build a verification policy that enforces run ID rather than workflow ID. I can have a policy that says "I only trust builds from workflow X" because the workflow is provided out of band or some trusted well-known workflow. Run IDs aren't known beforehand, and I can't think of a way to publish the "trusted" run IDs and differentiate those from run IDs where the workflow run was started by an attacker.

@tonistiigi
Copy link
Author

@haydentherapper Not a policy that verifies a specific RunID but that the signer was allowed to sign such a provenance that points to build logs at a specific RunID.

@laurentsimon
Copy link

laurentsimon commented Jun 1, 2022

I agree that fields like these (eg. links to build logs) could also be part of provenance but in the example case, the attacker has already managed to trigger a workflow. So they have full control of the builder. Where builder writes the provenance payload from the environment to say build logs is at path /logs/234 they can modify it to say /logs/123 instead. One thing they can't fake is the identity token from Github with the 123 value and therefore modify the signing identity for the payload.

Can you explain more what you mean by "So they have full control of the builder"? In the case of SLSA3+. the attacker cannot control what the build service writes into the provenance, besides build information such as compiler arguments, env variables and repo source.

For example, for our SLSA3+ Go builder, we record run ID, run number and run attempt https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/go/README.md#example-provenance

From there you can fetch the logs.

@tonistiigi
Copy link
Author

@laurentsimon It depends on what tool you use to generate the provenance. The condition for this case is that attacker already triggered a workflow they can control. So if as part of your workflow you are running a process that runs the builder/generator process the attacker can modify this process before it is invoked. Or it can just instead run its own process that just generates the same JSON bytes that the provenance generator in the actual release build would generate.

@laurentsimon
Copy link

laurentsimon commented Jun 1, 2022

Gotcha. What tool are you using to generate the provenance?

We are going to release a generic provenance in a few weeks (see https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md) which I think may allow you to do what you're asking. It lets you compile your project and "attach" a non-forgeable provenance, which contains the run information you're looking for.

The slsa-verifier will let you verify the provenance, and you can then peek into the provenance and get the run info.

Please let us know if this would work for your use case; and if it does not, what would :-)

/cc @ianlewis

@tonistiigi
Copy link
Author

@laurentsimon I consider anything that runs in the Github VM insecure for this case. It is not even that attacker needs to find a place to inject into your build(what is possible), but they can just run whatever they want and make it output the same value that your generator-builder does. Then they can get it properly signed as well. If the signer identity included the RunID then they could not fake the build logs link and would be caught instead of cleaning up traces that this run ever existed.

@laurentsimon
Copy link

laurentsimon commented Jun 1, 2022

@laurentsimon I consider anything that runs in the Github VM insecure for this case. It is not even that attacker needs to find a place to inject into your build(what is possible), but they can just run whatever they want and make it output the same value that your generator-builder does. Then they can get it properly signed as well. If the signer identity included the RunID then they could not fake the build logs link and would be caught instead of cleaning up traces that this run ever existed.

I'm not following. The attacker cannot influence the content of the provenance, except for the build steps (and the final hash). The attacker cannot influence the run ID, run attempts and run number: the trusted builder (the one I provided links for) retrieves this information itself, without any input from the attacker. The attacker does not control the builder: it has an interface the attacker can call, but that's it. The builder I linked above uses a re-usable workflow, which enforces isolation from the developer workflow (see 1, 2). The provenance generation is done in a different VM than the build itself, so the attacker cannot control it.

I must be mis-understanding something. Please correct me.

@tonistiigi
Copy link
Author

I don't know what you mean by the attacker but it is not what I think.

Going back to Fulcio, I (attacker) have obtained a capability to run my code in workflows in Github. Doesn't really matter how, for simplicity let's say credentials of a project maintainer were stolen. Inside my code, I can contact Fulcio and request a signing certificate with the fields listed above. With this certificate, I can now sign absolutely anything, any random bytes, any provenance, including the bytes that were previously generated by your "trusted builder". Users viewing that provenance will see that it has a verified signature, it points to (fake) build logs that look legit etc. If the attacker now cleans up after itself, removing github runs etc. only way they could be caught is by someone doing audit on the Fulcio transparency log.

@laurentsimon
Copy link

laurentsimon commented Jun 2, 2022

I don't know what you mean by the attacker but it is not what I think.

Can you elaborate on your threat model? What do you mean by attacker?

Going back to Fulcio, I (attacker) have obtained a capability to run my code in workflows in Github. Doesn't really matter how, for simplicity let's say credentials of a project maintainer were stolen. Inside my code, I can contact Fulcio and request a signing certificate with the fields listed above. With this certificate, I can now sign absolutely anything, any random bytes, any provenance, including the bytes that were previously generated by your "trusted builder".

The certificate used by the trusted builder is not accessible by your repo. The trusted builder has it own identity. When you verify the provenance, you first verify that it was generated by the trusted builder. So in your case, verification would fail because it would have the wrong identity (your repo's identity). Once the identify of the builder is verified, the verification verifies the repo the source code came from, in this case your repo. Both are available in the cert by Fulcio in the trusted builder we have built, so there are distinguishable from one another

I can contact Fulcio and request a signing certificate with the fields listed above: this is incorrect. Your code cannot forge the builder's identity. You can sign anything you want with the cert, but it will be rejected at verification time as described above because it contains your repo's identity, not the trusted builder's.

One attack that provenance does not solve is an attacker who pushes code to your repo. In this case, adding a run ID to the cert does not help either, unless you parse the logs and search for malice. In an incidence response, that is probably fine, but in the the general case, it's unlikely you're going to do this for every verification. Either way, the trusted builder gives you the same guarantees: you can get the run ID/etc from the provenance and trust it if the builder is SLSA3+

@tonistiigi
Copy link
Author

tonistiigi commented Jun 2, 2022

So in your case, verification would fail because it would have the wrong identity (your repo's identity)

Which of the fields listed above is different? If I as a repo author can't trigger workflow with the correct identity then who can?

adding a run ID to the cert does not help either, unless you parse the logs

That's exactly the case I'm talking about. Even when there is a compromise to the access to workflows, attacker should not be able to fake the build logs (as logs are managed by Github itself).

@ianlewis
Copy link

ianlewis commented Jun 2, 2022

@tonistiigi As I understand it, in your threat model the attacker has control over the developers Github credentials and can start a workflow in a repository they have access to. Am I understanding correctly?

So in your case, verification would fail because it would have the wrong identity (your repo's identity)

Which of the fields listed above is different? If I as a repo author can't trigger workflow with the correct identity then who can?

I think what @laurentsimon is saying is that by using the reusable workflow he linked to the OIDC token that is retrieved from the Github provider and used to sign the provenance is linked to the identity of the reusable workflow and not the workflow of the developer. The workflow code is also present in a repo not controlled by the developer. An attacker in your scenario can trigger the reusable workflow but cannot forge the provenance unless they take over the slsa-github-generator repo as well.

or they are somehow able to escape and get control of the build environment job VM and jump job VMs to the job VM generating provenance. However, we determine that we are protected from this by the VM security boundary and Github's control over VM execution in our threat model.

So because the attacker cannot forge the reusable workflow's identity, if they sign it in the user's workflow instead, any verification you do would be able to catch that the id used to sign the provenance is not the id of the reusable workflow.

@tonistiigi
Copy link
Author

attacker has control over the developers Github credentials

maybe

and can start a workflow in a repository they have access to.

yes

and used to sign the provenance is linked to the identity of the reusable workflow and not the workflow of the developer.

Thanks. So Subject in the cert is always https://github.com/slsa-framework/slsa-github-generator@...? . Couldn't test because example failed for me in build/provenance step. Policy verifies that it is that exact string irrespective of the project being built. It does make the generator repo a single point of failure, but I guess that is acceptable and not an issue in practice.

That being said, I did open up this issue against Fulcio and not against slsa-github-generator . Unless maintainers want to declare that the only safe way to use Fulcio/Cosign with Github is to use a reusable workflow from an external repo that the maintainers always trust, I think the concerns are still valid.

@ianlewis
Copy link

ianlewis commented Jun 3, 2022

Thanks.

no worries, I hope my explanations are making some sense.

So Subject in the cert is always https://github.com/slsa-framework/slsa-github-generator@...?

If you use the reusable workflow, yes, as that is the ID used to create the certificate and do the signing.

That being said, I did open up this issue against Fulcio and not against slsa-github-generator . Unless maintainers want to declare that the only safe way to use Fulcio/Cosign with Github is to use a reusable workflow from an external repo that the maintainers always trust, I think the concerns are still valid.

I understand. I think it depends on what you consider safe. If tying a signature back to source code and/or specific build run IDs is needed to be safe, then I agree with you that more metadata is needed. Others might be fine with just the signature as it is.

I think we got to this point because the proposal to add the run id, run count, and run attempt to the signature sounded like it was solving a problem that you could instead solve by generating provenance and signing that instead. This is because provenance formats like SLSA can support a more flexible format than adding metadata to the certificate could. slsa-github-generator was brought up just an example of a tool that could be used to generate provenance. In fact, we include exactly this kind of information in our implementation. Though there is a dearth of these tools currently, you could theoretically use any tool to do this, we just happened to be working on one.

Ultimately, I think that the idea of tools like Github's OIDC and fulcio is to create short lived certificates that are used and immediately discarded. So having metadata on a cert itself is likely not what we really want to do because users who follow this practice can't go back and look at it. What we really want is to include this info in provenance metadata and use fulcio to sign that. That way we can have the metadata in a verifiable signature format and also immediately get rid of certs and avoid having to store them.

@tonistiigi
Copy link
Author

tonistiigi commented Jun 3, 2022

What we really want is to include this info in provenance metadata and use fulcio to sign that. T

Interesting. So you mean it is signed by Fulcio root directly, not by the user's key? Is there some discussion/work going on that I could read on this?

I guess that would mean that you can only sign very strictly defined payloads. This isn't just about signing provenance attestations, but artifacts and other related objects as well. You can make some strict rules on how a provenance definition matches up with Github's OIDC token but it already gets fuzzy when I use my Gmail token instead. And if you define a specific payload object that defines the token scope precisely for each object then the user would need to keep hold of that payload the same way they keep the cert today.

@kommendorkapten
Copy link
Member

What we really want is to include this info in provenance metadata and use fulcio to sign that. T

Interesting. So you mean it is signed by Fulcio root directly, not by the user's key? Is there some discussion/work going on that I could read on this?

I think what was meant is that "use Fulcio to generate a short lived certificate", and sign with that. This is for instance how the https://github.com/slsa-framework/slsa-github-generator works.

@ianlewis
Copy link

ianlewis commented Jun 3, 2022

Interesting. So you mean it is signed by Fulcio root directly, not by the user's key? Is there some discussion/work going on that I could read on this?

I think what was meant is that "use Fulcio to generate a short lived certificate", and sign with that. This is for instance how the https://github.com/slsa-framework/slsa-github-generator works.

Apologies it wasn't clear. I don't mean signing using fulcio's root but rather via an "ephemeral certificate" which is a short lived certificate issued by the fulcio server and is used once. The expectation is that using ephemeral certificates is not necessarily specific to any kind of data.

For example, cosign lets you sign data in one go if you enable experimental features.

$ echo "some arbitrary data" > foo 
$ COSIGN_EXPERIMENTAL=1 cosign sign-blob foo

Here it...

  1. Gets an ID token from the sigstore OIDC provider (using Google login). I'm a bit fuzzy on these details but I believe oauth2.sigstore.dev just trusts Google as a federated identity.
  2. Uses fulcio to get an ephemeral cert using the OIDC token
  3. Signs the blob with the cert
  4. Saves the signature to the rekor transparency log
  5. Exits without saving the cert (it looks like it prints it out but the expectation is that it's thrown away)

You could similarly use the Github OIDC provider rather than oauth2.sigstore.dev/Google if you wanted to. So the idea is that this is what a builder would do when generating provenance. The provenance includes a sha256sum or whatever of the binary, and other metadata like the run id etc. and is signed with an ephemeral cert. After that provenance can be verified via the signature, the binary via the sha256sum, and the cert itself is never stored or used again.

@tonistiigi
Copy link
Author

tonistiigi commented Jun 3, 2022

I think I understand how cosign/fulcio work but your latest comments have me quite confused.

In both comments you say that cert is thrown away and never used again. So what is the point of signing and how do you verify the signature?

For example, cosign lets you sign data in one go if you enable experimental features.

cosign sign will store the certificate(and trust chain) with the image in the descriptor. cosign sign-blob has --output-certificate(or stdout print). If you don't store it and provide the cert in verify-blob the certificate is downloaded from the Rekor log by the payload sha so that verification can run. So by throwing away the certificate you meant storing it in Rekor server and then not requiring app flow to store it? The transparency log has some pretty good security aspects (and some pretty bad ones like privacy and DOS) but not really sure how it changes any fundamentals like requiring the cert for verification(no matter where it is stored).

You could throw away the cert metadata and just keep the public key component but as I mentioned in the previous comment then your payload needs to be in a strict format to verify the signing policies.

@ianlewis
Copy link

ianlewis commented Jun 3, 2022

Ah, yes. You're right. I confused things a bit. The signature is included in the DSSE that wraps the provenance info. The public key cert is stored in rekor and is retrieved from there for validation. The private key is what matters to be thrown out and is discarded after signing (the public key is what is printed).

I'm not sure it convinces me that public keys are a good place to store build metadata but you're right that storage is needed for public keys (either in transparency log or otherwise) and me bringing it up probably didn't help.

@haydentherapper
Copy link
Contributor

#945 will obsolete this request, so marking as closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request npm-ga
Projects
None yet
Development

No branches or pull requests

7 participants