Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builder name #655

Closed
laurentsimon opened this issue Mar 2, 2023 · 12 comments
Closed

Builder name #655

laurentsimon opened this issue Mar 2, 2023 · 12 comments

Comments

@laurentsimon
Copy link
Contributor

laurentsimon commented Mar 2, 2023

I have a question regarding #651.

It seems that by defining the builder as the server_url + job_workflow_ref, the number of builders will grow linearly with the number of projects. I have always thought as builders as being a limited set of services that a user would be able to configure as trust anchors, and that are independent of the projects I'm verifying e.g.:

https://cloudbuild.googleapis.com/GoogleHostedWorker, level X
https://github.com/some/repo/.github/workflows/reusable.yml, level X
...

So with the convention builder.id = server_url + job_workflow_ref, if a user is consuming a single project and wants to verify its provenance, their config file would say:

https://github.com/some/repo/.github/workflows/some-workflow.yml, level X

But how can you express the same "at scale", e.g. for 3P dependencies? Something like (?):

https://github.com/*/*/.github/workflows/*, level X

with the implicit assumption that */* is the same as the expected source.uri in the provenance?

This seems a bit awkward. As a consumer, I would like to be able to say, concisely:

https://github.com/actions/runner/github-hosted, level X

I'm wondering whether we should consider 2 cases:

  1. Reusable workflows: these are "reusable" by design, so they can be called as a "service". They are builders. Builders can be called / invoked by different projects.
  2. Normal workflows: those are not exactly builders, they are more like project build scripts, similar to GCB's cloudbuild.yml. Normal workflows cannot be called / invoked by different projects. If the builder ID is set to a GitHub's normal workflow, we could make an argument that the path to the cloudbuild.yml should be the builder ID for GCB.

Thoughts?

@kommendorkapten
Copy link

To make sure I'm following the reasoning here.

The convention is builder.id = server_url + job_workflow_ref, for both the use-case of when a project's build script are used, and when a reusable (trusted builder) workflow is used. I.e the construction of these values are consistent.

For the scenario when the project's builder script is used, you are correct, the number of builders will grow unbounded. But for the reusable workflow, the set should be limited (albeit there may be some versions we need to track).

I'm generally don't see problem with this?

The number of trusted builders are small, and can be vetted and so via a policy an allow list can be created.

For the other case, we typically can not have the same trust, as we can not vet the build scripts, we can only trust the parts of the provenance that's not forgeable (i.e what goes into the cert if using Sigstore/Fulcio). Even if a specific commit is vetted, we can not trust the next build/release as it's impossible to know if the build script was changed, as it's residing in the same repository as the code. So for that scenario, we have to fallback to only trust that the build was built using some script, from a repository using a specific build infrastructure.
A verifier can have a policy around expected repository and build infrastructure used, the rest (i.e build instructions) can change without a possibility to detect it.

Having written that, I was wondering if the similarity of how the builder.id is constructed is the issue? As we can not easily distinguish the type of the build and so the expected level of trust?
We can have an allow list of trusted builders, or a regexp for builds coming from .e.g GitHub, but how to discriminate them can be awkward.

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Mar 3, 2023

For the scenario when the project's builder script is used, you are correct, the number of builders will grow unbounded. But for the reusable workflow, the set should be limited (albeit there may be some versions we need to track).

I'm generally don't see problem with this?

The issue is that we're conflating build script and builder ID. The trigger workflow is not a builder, it's a script that belongs to the repo that is being built. Therefore the builder ID should not be set to its path. I think the builder ID for the normal workflow should simply say "GitHub Action" or something to this effect.

A verifier can have a policy around expected repository

It's hard to do that with the current convention for normal workflows, because there is no stable builder ID: the builder ID depends on the repo. (I'm reading expected repository as provenance's source.uri)

and build infrastructure used

That's where it starts to be a bit fuzzy. The config file can define more that commands, but also jobs that may be self-hosted. Here you could argue it's a "builder" :)

Having written that, I was wondering if the similarity of how the builder.id is constructed is the issue? As we can not easily distinguish the type of the build and so the expected level of trust?
We can have an allow list of trusted builders, or a regexp for builds coming from .e.g GitHub, but how to discriminate them can be awkward.

I think that's the issue, yes. A Builder ID should be independent of the repo itself and be "callable" by other projects. Otherwise it's not a builder service, it's a config / build file: cloudbuild.yml, normal workflow.yml

@behnazh-w
Copy link

A Builder ID should be independent of the repo itself and be "callable" by other projects. Otherwise it's not a builder service, it's a config / build file: cloudbuild.yml, normal workflow.yml

I wonder how this definition of a builder service aligns with the spec v1.0 Build service definition:

All build steps ran using some build service, not on a maintainer’s workstation.

Examples: GitHub Actions, Google Cloud Build, Travis CI.

The spec does not mandate that the builder should be callable (like reusable workflows), and as long as it is run on a CI service (and not locally), it's considered a build service.

@laurentsimon Does the spec need to change to be aligned with you definition?

@mlieberman85
Copy link
Member

Perhaps we can use clearer language. The intent of a build service is the infrastructure along with any controls to enforce SLSA. For example Github actions alone with nothing else would be SLSA level 1 compliant. However, using reusable workflows helps with hitting SLSA 3.

For example FRSCA, another SLSA-compliant build system, makes a clear distinction between user defined steps and what a running build system does. For example, the Tekton chains component in FRSCA enforces workload identities and the build steps don't have access to the secrets, ephemeral or long lived, it uses to sign SLSA procenance. From my understanding this was the intent of using reusable workflows. You call a set of predefined steps that are not tamperable by the repo/project that "calls" them.

It might make sense to clarify that the build service is the infrastructure and software used to run build steps along with any configuration or controls outside the control of the project being build to enforce SLSA requirements.

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Mar 6, 2023

A Builder ID should be independent of the repo itself and be "callable" by other projects. Otherwise it's not a builder service, it's a config / build file: cloudbuild.yml, normal workflow.yml

I wonder how this definition of a builder service aligns with the spec v1.0 Build service definition:

I think I over-stated some claims. A build service may be project-specific. I think I was thinking of the guiding principles https://slsa.dev/spec/v1.0/principles which state that only a small number of "systems" (which I read as "builders") need to be trusted in order to scale verification. I think this means that a build service is a somewhat generic system that can build many projects. A normal workflow is a piece of config that is specific to a project and cannot be used by other projects, so it does not qualify as a "build service". A reusable workflow can be generic and can be used by other projects to build in practice: it can therefore be considered a "build service".

In practice, we anchor our trust in the GitHub runner's claims (embedded in the Fulcio certs) based on the identity of the OIDC provider (GitHub). For "normal workflows", we're really trusting the GitHub platform to run a workflow script, not trusting the script itself. For reusable workflows, we can audit the code once and trust it "at scale" to build other projects, which I think is the motivation behind https://slsa.dev/spec/v1.0/principles.

There are nuances: An org-level reusable workflow is generic, but not to the extent that a Maven re-usable workflow would be, etc. Note that in this case, the org-level reusable workflow may call other generic reusable (ecosystem) workflows, so it's possible for both 1) the org level builder and 2) the ecosystem-specific builders to attest to artifacts.

The spec does not mandate that the builder should be callable (like reusable workflows), and as long as it is run on a CI service (and not locally), it's considered a build service.

I see. I think the GitHub runner is "callable" with a workflow as its input, so it still counts as a build service. The runner may be started manually (workflow_dispatch) or in response to other "events", but it's still "callable".. or "trigger-able"?

@laurentsimon Does the spec need to change to be aligned with you definition?

I don't know if my definition is better :)

It might make sense to clarify that the build service is the infrastructure and software used to run build steps along with any configuration or controls outside the control of the project being build to enforce SLSA requirements.

I think this would help.

@kommendorkapten
Copy link

In practice, we anchor our trust in the GitHub runner's claims (embedded in the Fulcio certs) based on the identity of the OIDC provider (GitHub). For "normal workflows", we're really trusting the GitHub platform to run a workflow script, not trusting the script itself. For reusable workflows, we can audit the code once and trust it "at scale" to build other projects, which I think is the motivation behind https://slsa.dev/spec/v1.0/principles.

I agree with this, and that was what I was trying to describe in my previous comment. That in practice it's impossible to trust a normal workflow, as it can't be audited in way that scales. Each repo and commit has to be vetted.

With some bespoke parsing rules we can detect that a SLSA provenance comes from GitHub, and so opt to trust it to a lower SLSA level. Compare this with a reusable workflow, which can be audited and scale to build a vast number of projects, at a higher SLSA level, by identifying the exact builder being used.

So one builder would be "GitHub", and to identify it we must see if the builder.id matches a pattern (and is signed by a certificate bound to an identity issued by GitHub).
Other builders would be reusable worklfows, identified by an exact builder.id.

And therein lies the problem, the identification is somewhat awkward and possible depending on the order of how they are identified, as a reusable workflow can also be identified as a GitHub builder (with lower promises) than as if it was properly identified as the reusable workflow.

@Nikokrock
Copy link
Contributor

Wouldn't be easier to have only one builder.id for general workflow on github.com and consider it L1 ? And indeed in that case I would consider the workflow defined in the repo to be an input artifact (here the terminology in the SLSA is a bit ambiguous I think when defining Build. For me in the first example build would be only Travis platform and the travis.yml just an input artifact/source)

Then if Github implements other more restricted workflow then we use other builder.ids.

@ianlewis
Copy link
Member

ianlewis commented Mar 8, 2023

With some bespoke parsing rules we can detect that a SLSA provenance comes from GitHub, and so opt to trust it to a lower SLSA level. Compare this with a reusable workflow, which can be audited and scale to build a vast number of projects, at a higher SLSA level, by identifying the exact builder being used.

Wouldn't be easier to have only one builder.id for general workflow on github.com and consider it L1 ? And indeed in that case I would consider the workflow defined in the repo to be an input artifact.

I'm wondering the same thing; if the security properties don't change among all non-reusable workflows, why do we need a lot of different builder.ids? I think what we really want is:

  1. A builder.id per SLSA3 reusable workflow
  2. One builder.id for all other GitHub workflows (i.e. where npm publish is run directly in the user's workflow).

We just need 2 to indicate that it was built on GitHub so that it can be differentiated from when it was built on other platforms like Gitlab etc. in the future.

This also makes it a bit easier to make claims about individual builders from the perspective of the SLSA conformance program because we are expecting to be able to map the builder PKI & builder.id to a set of claims about the SLSA conformance level.

@feelepxyz
Copy link
Contributor

2. One builder.id for all other GitHub workflows (i.e. where npm publish is run directly in the user's workflow).

This seems reasonable to me. We could set builder.id to something like https://github.com/actions/runner/github-hosted for npm provenance. There's currently no way of getting a runner version in GHA and this is also not in the Fulcio signing certificate so could be easily forged if we only add it to the provenance statement.

Sounds like this should become a spec definition in SLSA v1.0 when not using a SLSA3+ workflow on GHA?

@MarkLodato
Copy link
Member

Are there any requested change to the spec as part of this issue? If not, should we consider this issue closed?

@MarkLodato MarkLodato added the status:waiting-for-answer This issue is blocked by feedback from a user. label Mar 29, 2023
@laurentsimon
Copy link
Contributor Author

laurentsimon commented Mar 29, 2023

Fine to close and we'll follow-up with npm team about the final name they selected for their builder.
@feelepxyz have you landed on https://github.com/actions/runner/github-hosted and https://github.com/actions/runner/self-hosted then?

@feelepxyz
Copy link
Contributor

have you landed on https://github.com/actions/runner/github-hosted and https://github.com/actions/runner/self-hosted then?

Yep 👍 we're just waiting for the runner environment to be added to the actions ENV to include it in the builder id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

No branches or pull requests

8 participants