Author: lodato@google.com
Date: March 2021
Status: IN REVIEW
Standardize the terminology, data model, layers, and conventions for software artifact metadata.
A software attestation is a signed statement (metadata) about a software artifact or collection of software artifacts. (Sometimes called a "software bill of materials" or SBoM. Not to be confused with remote attestation in the trusted computing world.)
An attestation is the generalization of raw artifact/code signing, where the signature is directly over the artifact or a hash of artifact:
-
With raw signing, a signature implies a single bit of metadata about the artifact, based on the public key. The exact meaning must be negotiated between signer and verifier, and a new keyset must be provisioned for each bit of information. For example, a signature might denote who produced an artifact, or it might denote fitness for some purpose, or something else entirely.
-
With an attestation, the metadata is explicit and the signature only denotes who created the attestation. A single keyset can express an arbitrary amount of information, including things that are not possible with raw signing. For example, an attestation might state exactly how an artifact was produced, including the build command that was run and all of its dependencies.
The primary intended use case is to feed into an automated policy framework. See that doc for more info.
Other use cases are "nice-to-haves", including ad-hoc analysis.
We define the following model to represent any software attestations, regardless of format. Not all formats will have all fields or all layers, but to be called an "attestation" it must fit this general model.
The key words MUST, SHOULD, and MAY are to be interpreted as described in RFC 2119.
Example in English:
Summary:
- Artifact: Immutable blob of data, usually identified by cryptographic content hash. Examples: file content, git commit, Docker image. May also include a mutable locator, such as a package name or URI.
- Attestation: Authenticated, machine-readable metadata about one or more
software artifacts. MUST contain at least:
- Envelope: Authenticates the message. At a minimum, it contains:
- Message: Content (statement) of the attestation. The message type SHOULD be authenticated and unambiguous to avoid confusion attacks.
- Signature: Denotes the attester who created the attestation.
- Statement: Binds the attestation to a particular set of artifacts.
This is a separate layer is to allow for predicate-agnostic processing
and storage/lookup. MUST contain at least:
- Subject: Identifies which artifacts the predicate applies to.
- Predicate: Metadata about the subject. The predicate type SHOULD be explicit to avoid misinterpretation.
- Predicate: Arbitrary metadata in a predicate-specific schema. MAY
contain:
- Link: (repeated) Reference to a related artifact, such as build dependency. Effectively forms a hypergraph where the nodes are artifacts and the hyperedges are attestations. It is helpful for the link to be standardized to allow predicate-agnostic graph processing.
- Envelope: Authenticates the message. At a minimum, it contains:
- Bundle: A collection of Attestations, which are usually but not
necessarily related.
- Note: The bundle itself is unauthenticated. Authenticating multiple attestations as a unit is TBD.
- Storage/Lookup: Convention for where attesters place attestations and how verifiers find attestations for a given artifact.
See Survey for examples.
We recommend a single suite of formats and conventions that work well together and have desirable security properties. Our hope is to align the industry around this particular suite because it makes everything easier. That said, we recognize that other choices may be necessary in various cases.
Summary: Generate in-toto attestations.
- Envelope: secure-systems-lab/signing-spec (TODO: Recommend Crypto/PKI)
- Statement: in-toto/attestation
- Predicate: Choose as appropriate.
- Provenance
- SPDX
- If none are a good fit, invent a new one.
- Bundle and Storage/Lookup:
- Local Filesystem: TODO
- Docker/OCI Registry: sigstore/cosign
See survey for other options.
TODO: Can a subject of an attestation be something like "GCP project at time T"? That is logically immutable since the "at time T" cannot change.
TODO: One subject but multiple predicates. Should we offer an opinion on whether this is represented at the Statement layer (repeated predicate) or Predicate layer (a "compound" type predicate)?
TODO: One envelope has multiple statements (separate subject+predicate pairs) signed as a unit, which are not valid individually. Is this one attestation or multiple?
TODO: Should we represent this as multiple messages within the envelope (i.e. a shim) or as a new type of Statement that refers to the other Statements (perhaps too complicated).
TODO: If you have separate signed attestations and want to refer to the collection (e.g. a signed bundle), you can create a statement referring to all of them as the subject.
TODO: Figure out serialization. Previously I had been thinking envelopes didn't have to be serialized deterministically, but now if they are an Artifact it does have to be deterministic/immutable.
TODO(lodato) Provide a survey of possible names we considered, along with pros/cons: Attestation, Testimony, Testament, Claim, Voucher, Statement, Predicate, Message, Finding.