Skip to content

Commit

Permalink
Clarify that digests don't have to be cryptographic ones. (#338)
Browse files Browse the repository at this point in the history
Clarify that digests don't have to be cryptographic ones.

Signed-off-by: Tom Hennen <tomhennen@google.com>
  • Loading branch information
TomHennen authored May 6, 2024
1 parent 64db8d9 commit 06eafe3
Show file tree
Hide file tree
Showing 7 changed files with 57 additions and 15 deletions.
11 changes: 11 additions & 0 deletions spec/v1/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Changelog

## v1.1

- Clarified that subjects are assumed to be immutable and that it is
acceptable to use a non-cryptographic digest (though cryptographic
digests are still strongly recommended).

## v1

Initial release.
2 changes: 1 addition & 1 deletion spec/v1/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Specification for in-toto attestation layers

Version: v1.0
Version: v1.1

Index:

Expand Down
2 changes: 0 additions & 2 deletions spec/v1/bundle.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Bundle layer specification

Version: v1.0

An attestation Bundle is a collection of multiple attestations in a single
file. This allows attestations from multiple different points in the software
supply chain (e.g. Provenance, Code Review, Test Result, vuln scan, ...) to
Expand Down
48 changes: 42 additions & 6 deletions spec/v1/digest_set.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
# DigestSet field type specification

Version: v1.0

Set of one or more cryptographic digests for a single software artifact or
metadata object.
Set of one or more cryptographic digests, or other immutable references,
for a single software artifact or metadata object.

## Schema

```json
{
"<ALGORITHM_1>": "<HEX/BASE64 VALUE>",
"<ALGORITHM_2>": "<HEX/BASE64 VALUE>",
"<ALGORITHM_1>": "<VALUE>",
"<ALGORITHM_2>": "<VALUE>",
...
}
```
Expand All @@ -23,6 +21,12 @@ algorithms below use lowercase hex encoding. Usually there is just a
single key/value pair, but multiple entries MAY be used for algorithm
agility.

Each entry in a DigestSet MUST be an immutable reference to an artifact. It is
STRONGLY RECOMMENDED to use a commonly accepted, cryptographically secure digest
algorithm to achieve this immutability. See [Use cases for non-cryptographic,
immutable, digests](#use-cases-for-non-cryptographic-immutable-digests) for
further guidance.

### Supported algorithms

#### `sha256`, `sha224`, `sha384`, `sha512`, `sha512_224`, `sha512_256`, `sha3_224`, `sha3_256`, `sha3_384`, `sha3_512`, `shake128`, `shake256`, `blake2b`, `blake2s`, `ripemd160`, `sm3`, `gost`, `sha1`, `md5`
Expand Down Expand Up @@ -144,6 +148,38 @@ matches.
New algorithms MUST document how the value is encoded, e.g. URL-safe base64,
lowercase hex, etc...

### Use cases for non-cryptographic, immutable, digests

While cryptographic digests are the strongly recommended immutable identifier,
users might have need to refer to an artifact by some other means. For example,
it might be technically infeasible to compute a digest over the content, or
the user might interact with the content through an interface that doesn't
expose them to the entirety of the content.

In these situations, users MAY use a non-cryptographic identifier in a DigestSet
so long as the risk of the object being mutated is acceptable for the
application.

One concrete example of where a non-cryptographic hash can be useful is when
referring to Virtual Machine images. Often these images are very large
(impractical to run a cryptographic hash over) and users often interact with
them via APIs that the platform provides that don't involve the user having
complete custody of the content. Platforms like AWS and GCP provide 'ids' for
users to use when referring to these images. A user may say something like
"create an instance with image 123". In that case the user doesn't actually have
the bits that correspond to 'image 123' so they cannot digest it themselves. And
by the time the image has started it can be difficult, if not impossible, to
digest the original content that was used to boot the instance.

These IDs can often be treated as immutable and may be perfectly suited to users
threat profiles. Allowing DigestSets to use these types of identifiers allows
providers to make statements about the content of these VM images using the
identifiers their users have ready access to.

In addition, using an ID like this does not preclude including a cryptographic
hash in the DigestSet as well. If possible including both may provide the most
flexibility for the user's various use cases.

## Examples

- `{"sha256": "abcd", "sha512": "1234"}` matches `{"sha256": "abcd"}`
Expand Down
2 changes: 0 additions & 2 deletions spec/v1/predicate.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Predicate layer specification

Version: v1.0

The Predicate is the innermost layer of the attestation, containing arbitrary
metadata about the [Statement]'s `subject`.

Expand Down
2 changes: 0 additions & 2 deletions spec/v1/resource_descriptor.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# ResourceDescriptor field type specification

Version: v1.0

A size-efficient description of any software artifact or resource (mutable
or immutable).

Expand Down
5 changes: 3 additions & 2 deletions spec/v1/statement.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Statement layer specification

Version: v1.0

The Statement is the middle layer of the attestation, binding it to a
particular subject and unambiguously identifying the types of the
[Predicate].
Expand Down Expand Up @@ -38,6 +36,9 @@ Additional [parsing rules] apply.
> Set of software artifacts that the attestation applies to. Each element
> represents a single software artifact. Each element MUST have `digest` set.
>
> Subjects are assumed to be _immutable_, i.e. the artifacts identified by the
> subject SHOULD NOT change.
>
> The `name` field may be used as an identifier to distinguish this artifact
> from others within the `subject`. Similarly, other ResourceDescriptor fields
> may be used as required by the context. The semantics are up to the producer
Expand Down

0 comments on commit 06eafe3

Please sign in to comment.