Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update spec to reflect Artifact ID changes #81

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alilleybrinker
Copy link
Member

@alilleybrinker alilleybrinker commented Sep 27, 2024

Rendered Changes

Closes #74, #70, #59, #60, #53


Over an extensive number of OmniBOR Working Group meetings, we've discussed a number of changes to the OmniBOR specification, specifically around the design of Artifact IDs, which had not yet been reflected in the spec. This change now incorporates those changes, plus several others, which I will attempt to delineate below.

  • Reduce supported hash algorithms for Artifact IDs to explicitly exclude SHA-1, and thus only support SHA-256, while reserving the right in the future to extend the list if, for example, SHA-256 is found to be broken.
  • Eliminating use of the term "Input Manifest Identifier," which I believe muddies the conceptual picture of OmniBOR as being about only two relevant objects: Artifact IDs and Input Manifests.
  • Brings all annexes into the main document.
  • Assigns numbers to all sections and subsections to make referencing specific parts of the specification easier.
  • Many formatting and grammatical corrections, including more consistent use and capitalization of terminology, many fixes to Markdown formatting, adjustments to (outside of code blocks) ensure we remain within an 80-character column limit, and more.
  • Updates to the filesystem storage definition to reflect use of a target index and a renaming of the "objects/" folder to "manifests/"
  • Updates to the Input Manifest format to no longer include a "blob " prefix, and without use of the "input type" concept I had experimented with in the Rust implementation. After discussion with the WG I have concluded that this additional piece of metadata in the input format is a mistake.
  • Removal of the specification for the OmniBOR metadata fields. These may be returned in the future if we deem them truly necessary, but I have removed them here as I believe from our discussions that they are quite underbaked and indicative of some scope creep we would do well to avoid.
  • Clarification of rules around ELF embedding and text file embedding.
  • Removal of specification of the Artifact Dependency Graph beyond the introduction. The Artifact Dependency Graph is a key conceptual part of OmniBOR, as the ability to construct it is the fruit of our labor in defining Artifact IDs and Input Manifests as they are defined. That said, it is not necessary to specify here beyond the conceptual, and attempting to do so is likely to bog us down in extra complexity we can otherwise avoid.

One notable item I did not pursue in this set of changes, per discussion with the Working Group, is to change the textual representation of Artifact IDs from gitoid URI scheme to a new artifactid URI scheme. I still think such a change could be worthwhile, but also that it would warrant more discussion than it has received so far, and should not block progress on the other changes.

Altogether, my goal here is to bring the spec more up to date with the discussions happening in the Working Group, and more generally to make it easier for others to understand and build on OmniBOR by solidifying more of the foundations of the project.

@alilleybrinker alilleybrinker added c-spec Category: Improvements or additions to the OmniBOR specification t-enhancement Type: New feature or request labels Sep 27, 2024
@alilleybrinker alilleybrinker self-assigned this Sep 27, 2024
Over an extensive number of OmniBOR Working Group meetings, we've discussed a
number of changes to the OmniBOR specification, specifically around the design
of Artifact IDs, which had not yet been reflected in the spec. This change
now incorporates those changes, plus several others, which I will attempt to
delineate below.

- Reduce supported hash algorithms for Artifact IDs to explicitly exclude
  SHA-1, and thus only support SHA-256, while reserving the right in the future
  to extend the list if, for example, SHA-256 is found to be broken.
- Eliminating use of the term "Input Manifest Identifier," which I believe
  muddies the conceptual picture of OmniBOR as being about only two relevant
  objects: Artifact IDs and Input Manifests.
- Brings all annexes into the main document.
- Assigns numbers to all sections and subsections to make referencing specific
  parts of the specification easier.
- Many formatting and grammatical corrections, including more consistent use
  and capitalization of terminology, many fixes to Markdown formatting,
  adjustments to (outside of code blocks) ensure we remain within an
  80-character column limit, and more.
- Updates to the filesystem storage definition to reflect use of a target
  index and a renaming of the "objects/" folder to "manifests/"
- Updates to the Input Manifest format to no longer include a "blob " prefix,
  and _without_ use of the "input type" concept I had experimented with in
  the Rust implementation. After discussion with the WG I have concluded that
  this additional piece of metadata in the input format is a mistake.
- Removal of the specification for the OmniBOR metadata fields. These may be
  returned in the future if we deem them truly necessary, but I have removed
  them here as I believe from our discussions that they are quite underbaked
  and indicative of some scope creep we would do well to avoid.
- Clarification of rules around ELF embedding and text file embedding.
- Removal of specification of the Artifact Dependency Graph beyond the
  introduction. The Artifact Dependency Graph is a key conceptual part of
  OmniBOR, as the ability to construct it is the fruit of our labor in
  defining Artifact IDs and Input Manifests as they are defined. That said, it
  is not necessary to specify here beyond the conceptual, and attempting to do
  so is likely to bog us down in extra complexity we can otherwise avoid.

One notable item I did _not_ pursue in this set of changes, per discussion
with the Working Group, is to change the textual representation of Artifact
IDs from `gitoid` URI scheme to a new `artifactid` URI scheme. I still think
such a change could be worthwhile, but also that it would warrant more
discussion than it has received so far, and should not block progress on
the other changes.

Altogether, my goal here is to bring the spec more up to date with the
discussions happening in the Working Group, and more generally to make it
easier for others to understand and build on OmniBOR by solidifying more of
the foundations of the project.

Signed-off-by: Andrew Lilley Brinker <alilleybrinker@gmail.com>
@alilleybrinker alilleybrinker force-pushed the alilleybrinker/structure branch from 2134c7e to 09ec42a Compare September 27, 2024 22:36
@alilleybrinker
Copy link
Member Author

Meeting feedback: split out removal of ADG wording and SHA-1 exclusion into a separate PR.

whose inputs they describe.
can be identified by their own Artifact Identifier. The Artifact ID for the
manifest can be embedded directly into executable files, or can be provided
in a separate file alongside the artifact whose inputs they describe.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight concern that removing the IMID as a concept itself introduces confusion. Given an artifact... one would like a simple way to distinguish conversationally between the artifact id of the artifact, and the artifact id of the input manifest of the artifact. Artifact ID and Input Manifest ID (IMID) filled that role. We introduced the concept because we had confusion downstream by communities like SPDX as to whether the artifact id field for an artifact should be the artifact id of the artifact or the artifact id of the artifacts input manifest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible idea: "Input Manifest ID" -> "Input Manifest AID"

"AID" being a shorthand for "Artifact ID". This would be a new shorthand, but hopefully makes it clearer that we're discussing the Artifact ID of the Input Manifest, and not some new type of ID.


Two artifacts are equivalent if and only if their binary representations are
equal. This can be expressed in pseudocode with the following expression:
`[]byte(artifact1) == []byte(artifact2)`
equal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytewise identical. This rewrite reads a bit too much like "two artifacts are equivalent of they are equal".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reintroduce the byte-equivalence language, but without the Go-ish pseudocode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all n less than the length in bytes of the artifact artifact1[n] == artifact2[n] as bytes.

- Node.js
- Python interpreter
- __Code Generators__

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we work patch into here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the use of colons is also inconsistent, I ought to fix that.


This index file may then be used to improve the performance and reliability of
operations related to Input Manifests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I miss something here about how to persist these targets in a way we can look them up easily?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on discussion I am inclined to remove this section for now while we gather more experience and figure out what makes more sense.

@alilleybrinker
Copy link
Member Author

Update from yesterday's WG meeting: I'll be going through and addressing Ed's comments, including backing out some less-baked sections of the edits, which will have issues opened for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-spec Category: Improvements or additions to the OmniBOR specification t-enhancement Type: New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define "Mandatory Hashes" and permit additional hashes for Artifact Identifier construction.
2 participants