Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

editorial: clarify requirements around cache use by the build platform #901

Closed
wants to merge 6 commits into from

Conversation

arewm
Copy link
Member

@arewm arewm commented Jul 6, 2023

There was an inconsistency between the provenance model and build L3 requirements for how build caches might affect a build:

  • The provenance model indicated that communication with a cache MAY influence the definition of a build (and if it does, then the communication SHOULD go in resolvedDependencies).
  • The build isolation requirements indicated that the build output MUST be identical whether a cache is used.

Clarification includes:

  • At build L3, the provenance MUST be identical whether a build platform's cache is used or not.
  • This implies that all communication with the build cache is to leverage byproducts which should not be present in the provenance.
  • Builds are not precluded from using a cache outside the build platform and therefore outside these isolation requirements.

Clarification does NOT include:

  • Requirements between cache utilization and reproducible builds as reproducibility is not covered in the build track.

Addresses #894

Signed-off-by: arewm arewm@users.noreply.github.com

@netlify
Copy link

netlify bot commented Jul 6, 2023

Deploy Preview for slsa ready!

Name Link
🔨 Latest commit a57d635
🔍 Latest deploy log https://app.netlify.com/sites/slsa/deploys/64e4bfb1aa734d0008baf409
😎 Deploy Preview https://deploy-preview-901--slsa.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@arewm arewm force-pushed the cache-clarification branch from b984852 to 88c070e Compare July 6, 2023 19:43
@arewm arewm changed the title Clarify requirements around cache use by the build platform. spec-editorial: clarify requirements around cache use by the build platform. Jul 6, 2023
@arewm arewm force-pushed the cache-clarification branch 3 times, most recently from ae2cac9 to 9ea5416 Compare July 7, 2023 18:27
@arewm arewm changed the title spec-editorial: clarify requirements around cache use by the build platform. spec-editorial: clarify requirements around cache use by the build platform Jul 7, 2023
docs/spec/v1.0/requirements.md Show resolved Hide resolved
docs/spec/v1.0/requirements.md Outdated Show resolved Hide resolved
@arewm arewm changed the title spec-editorial: clarify requirements around cache use by the build platform editorial: clarify requirements around cache use by the build platform Jul 11, 2023
@arewm arewm force-pushed the cache-clarification branch from 9ea5416 to f754e88 Compare July 11, 2023 16:51
docs/spec/v1.0/requirements.md Outdated Show resolved Hide resolved
@MarkLodato
Copy link
Member

I'm kind of losing the rationale for this change. Could you update the PR description to explain? I'm no longer clear on the problem in the original text, so it's hard for me to say whether this updated text is an improvement.

@arewm
Copy link
Member Author

arewm commented Jul 13, 2023

I'm kind of losing the rationale for this change. Could you update the PR description to explain? I'm no longer clear on the problem in the original text, so it's hard for me to say whether this updated text is an improvement.

The PR description is updated to restate the problem that needs clarification.

@arewm arewm force-pushed the cache-clarification branch from b53211c to 88eb436 Compare July 13, 2023 19:22
@MarkLodato
Copy link
Member

Thank you for updating the PR description. That really helps define the problem.

I'm still not sure about the proposed changes. I find the new wording even more confusing than before, without resolving any of the original problems. In particular, I don't understand the bits about external communication and the provenance being identical.

I feel like there are two main problems:

  1. The original wording of the Build L3 requirement is unclear when the rule about a "cache" applies. I think we can make an argument that, if there is some sort of automatic caching enabled by default, then it MUST NOT be susceptible to cache poisoning, the build output MUST be the same whether or not the cache is enabled, and (by virtue of the previous statement) the provenance SHOULD be identical whether or not the cache is enabled. When the cache is opt-in or user-provided, such as actions/cache, then I'm not sure how much we can say at L3.
  2. We are trying to provide two different things: requirements at L3 about the minimum necessary level of tamper prevention, and guidelines at all levels about how to construct provenance. That is basically what you're saying in the PR description, if I'm understanding correctly. So maybe we should just be more clear about that?

@CircuitSwan
Copy link

I feel like the following description to me was most clear in the desired outcome (matching cache vs no cache)

  • If the build platform is capable of providing the provenance for an external resource without a cache,
    then the provenance should remain unchanged if a cache is used. In other words, the output of the provenance
    MUST be identical whether or not the cache is used

@mlieberman85
Copy link
Member

I feel like the following description to me was most clear in the desired outcome (matching cache vs no cache)

  • If the build platform is capable of providing the provenance for an external resource without a cache,
    then the provenance should remain unchanged if a cache is used. In other words, the output of the provenance
    MUST be identical whether or not the cache is used

Agreed, while clarifying that the subject (which exists outside the provenance but within the ite6 statement) doesn't have to be identical unless the builds are bit for bit reproducible.

@arewm
Copy link
Member Author

arewm commented Aug 21, 2023

I resolved all open threads in this issue as they affected the readability of this PR. @adityasaky, I think most of the open discussion was between us, so if you want to re-state any of it, then can you open a new thread?

The part referenced above is the core part of this PR. This would involve removing the change to the provenance specification as well as lines 325-326.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the attention to detail in this thread all and to @arewm for continuing to drive this change forward.

The part referenced above is the core part of this PR. This would involve removing the change to the provenance specification as well as lines 325-326.

I think removing these hunks helps clarify the change.

docs/spec/v1.0/requirements.md Outdated Show resolved Hide resolved
arewm added 5 commits August 22, 2023 09:49
Addresses slsa-framework#894

Signed-off-by: arewm <arewm@users.noreply.github.com>
Signed-off-by: arewm <arewm@users.noreply.github.com>
Signed-off-by: arewm <arewm@users.noreply.github.com>
Signed-off-by: arewm <arewm@users.noreply.github.com>
Signed-off-by: arewm <arewm@users.noreply.github.com>
@arewm arewm force-pushed the cache-clarification branch from 88eb436 to 7fc41a4 Compare August 22, 2023 13:53
@arewm
Copy link
Member Author

arewm commented Aug 22, 2023

The PR has been distilled to the core change as previously indicated.

@arewm arewm force-pushed the cache-clarification branch from 7fc41a4 to b6c7401 Compare August 22, 2023 13:54
Signed-off-by: arewm <arewm@users.noreply.github.com>
@arewm arewm force-pushed the cache-clarification branch from b6c7401 to a57d635 Compare August 22, 2023 14:01
- If the build platform is capable of providing the provenance information for
an external resource when a cache is not in use, then the provenance
information MUST remain unchanged if a cache is used. In other words, the
information in the provenance MUST be identical whether or not the cache is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this + @mlieberman85's suggestion re the artifact hash addresses the reproducibility issue. There may be a separate question about whether intermediate artifacts that can impact the target artifact should be recorded for completeness but that's out of scope here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlieberman85 , do you have a suggestion to the content of this PR related to your earlier comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this makes sense to me, but I do wonder if it's not clear enough to someone who isn't familiar with what we're implying? Maybe just a clarification that unique identifiers such as checksums are what we're talking about here?

Copy link
Member Author

@arewm arewm Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unique identifiers of what? The produced artifact or the cached entries used?

If the former, are you suggesting that we indicate that the hash of the artifact can differ if the cache is used because we are not claiming anything about reproducibility of the artifact?

If the latter, that seems like it might fit better in the previous bullet like so:

-  It MUST NOT be possible for one build to inject false entries into a build
    cache used by another build, also known as "cache poisoning". In other
    words, the output of the build MUST be identical whether or not the cache is
    used. This SHOULD be achieved using unique identifiers in the cache such
    as checksums

Copy link
Member

@adityasaky adityasaky Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, the output of the build MUST be identical whether or not the cache is used.

This implicitly requires reproducibility, no? Unless this only applies to the provenance predicate, allowing the hash of the produced artifact to still change.

@arewm
Copy link
Member Author

arewm commented Oct 12, 2023

I don't think that we definitively agreed on a strict improvement on the wording. I am going to close the PR and discussion can continue in the issue before another attempt of clarification is made.

@arewm arewm closed this Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants