diff --git a/spec/GITOID_URI_SPEC.txt b/spec/GITOID_URI.txt similarity index 100% rename from spec/GITOID_URI_SPEC.txt rename to spec/GITOID_URI.txt diff --git a/spec/SCOPE.md b/spec/SCOPE.md deleted file mode 100644 index 5473b1e..0000000 --- a/spec/SCOPE.md +++ /dev/null @@ -1,16 +0,0 @@ -# Scope - -Specifies procedures for constructing and conveying artifact dependency graphs (ADGs), and other related -data structures for artifacts. Including but not limited to: - -- formats for specifying software artifact identity -- formats for specifying graph relationships between artifacts -- manner of embedding identifiers for ADGs, and other related -data structures in artifacts of various types -- guidance on metadata which references ADGs, and other related data structures -- guidance for build tools for: - - constructing ADGs, and other related data structures - - conveying ADGs, and other related data structures - - embedding identifiers for ADGs, and other related data structures ids in artifacts - - manners of conveyance of ADGs, and other related data structures - - descriptions of use cases for which ADGs, and other related data structures may be used diff --git a/spec/SPEC-annex-A-filesystem-storage.md b/spec/SPEC-annex-A-filesystem-storage.md deleted file mode 100644 index 312f831..0000000 --- a/spec/SPEC-annex-A-filesystem-storage.md +++ /dev/null @@ -1,43 +0,0 @@ -## Annex A - -Annex A documents known methods of persisting OmniBOR Documents to various stores. - -### Input Manifest persistence by a Build Tool to its local filesystem - -If a build tool persists an Input Manifest to its local filesystem, the build tool should write out the Input Manifest to ```${OMNIBOR_DIR}/objects/${Artifact Identifier Type uri prefix with ':' replaced by '_'}/${Input Manifest Identifier:0:2}/${Input Manifest Identifier:2:}``` where ```${Input Manifest Identifier}``` is Input Manifest Identifier in lowercase hexadecimal with leading zeros NOT suppressed. - -Example: - -If ```OMNIBOR_DIR=.omnibor``` then the Input Manifest with ```gitoid:blob:sha1`` Input Manifest Identifier -```0e8efd4cdf0d5bafcfcae658c2662a73b199b301``` would be stored in: - -``` -.omnibor/objects/gitoid_blob_sha1/0e/8efd4cdf0d5bafcfcae658c2662a73b199b301 -``` - -#### Build tool persistence of related metadata - -A build tool may persist additional metadata to that makes reference to the Artifact Dependency Graph (ADG). -It should persist such metadata to a subdirectory of the directory to which the output artifact is being written of the form: ```${OMNIBOR_DIR}/metadata/${context}/```. - -For metadata specific to a particular build tool ```${context}``` should be a name uniquely associated with the build tool. For example: - -- ```${OMNIBOR_DIR}/metadata/llvm``` -- ```${OMNIBOR_DIR}/metadata/clang``` -- ```${OMNIBOR_DIR}/metadata/go``` -- ```${OMNIBOR_DIR}/metadata/rustc``` -- ```${OMNIBOR_DIR}/metadata/gcc``` - -Build tools should report their selection of ```${context}``` subdirectory name to the OmniBOR spec for inclusion in a list to preclude ```${context}``` collision. - -Metadata persisted by multiple build tools in the same way should be documented in a specification for that metadata. Such specs must include the ```${context}``` for that metadata. Such specs should be reported to the OmniBOR spec for inclusion in a list to preclude ```${context}``` collision. For example, if a group of build tools decide to store metadata about file locations in a common format, they might choose to define a ```${context}``` ```filelocation``` in which case the metadata would be stored in ```${OMNIBOR_DIR}/metadata/filelocation``` - -Subdirectory structure, filenaming, and file schema below ```${OMNIBOR_DIR}/metadata/${context}/``` are at the discretion of the build tool for build tool specific metadata or the metadata spec for common metadata. - -#### Build tool selection of OMNIBOR_DIR - -OMNIBOR_DIR may be set by the following methods, listed in order of precedence: -1. A build tool specific flag -2. A non-empty env variable named OMNIBOR_DIR - -The absence of specification of a location to write omnibor data via either the build tool specific flag or OMNIBOR_DIR variable may be taken as a signal to skip OmniBOR generation. diff --git a/spec/SPEC-annex-B-elf-embedding.md b/spec/SPEC-annex-B-elf-embedding.md deleted file mode 100644 index e57a5ac..0000000 --- a/spec/SPEC-annex-B-elf-embedding.md +++ /dev/null @@ -1,62 +0,0 @@ -# Annex B - -Annex B contains a method of embedding Input Manifest Identifiers into ELF -files. - -## Input Manifest Identifiers - -Input Manifest Identifiers are Artifact Identifiers (Git Object Identifiers -\[GitOIDs\]) for Artifact Input Manifests. They identify an Artifact Input -Manifest and MAY be embedded into an artifact to relate the artifact to its -Artifact Input Manifests. - -If an ELF artifact contains an embedded Input Manifest Identifier, then -implementations MUST conform to the format specified in this document. - -Note that multiple Input Manifests MUST be produced for a single artifact, -reflecting the use of different hash functions to produce the Artifact -Identifiers. - -### Input Manifest Identifier persistence in ELF Objects/Executables - -Input Manifest Identifiers MUST be persisted by build tools when they build -an artifact and produce an Artifact Input Manifest for that artifact. - -When persisting Input Manifest Identifiers into an ELF object or an ELF -executable, the build tool MUST create a [section][elf_section] -`.note.omnibor` and place the Input Manifest Identifiers in the descriptor -field of the note entry. This section MUST be of type `SHT_NOTE` and MUST have -the attribute `SHF_ALLOC`. Multiple Note entries MUST be created, one for each -Artifact Identifier type when multiple Artifact Identifier types are involved. -Each note entry MUST contain the following fields in the same order as given -below: - -1. `namesz` (4 bytes): This field MUST be set to a value of `8`, the length of - the 'owner' field `OMNIBOR\0` in bytes. -2. `descz` (4 bytes): This field MUST contain the length of the Input Manifest - Identifier in bytes, including a byte for the null terminator. -3. `type` (4 bytes): This field MUST contain the value associated with one of - the reserved Artifact Identifier types. The values for the reserved types - are in the range of `0x00000000` to `0x7fffffff`. Permissible types with - reserved values are: - - ``` - NT_GITOID_BLOB_SHA1 = 0x1, - NT_GITOID_BLOB_SHA256 = 0x2, - ``` - -4. `owner` (8 bytes): This field MUST contain the string `OMNIBOR\0`, padded to - 8 bytes. -5. `descriptor`: This field MUST contain the Input Manifest Identifiers as raw - bytes.The length of this field is the same as the value in the `descz` field. - -When recording multiple Input Manifest Identifiers in the note section, - -1. There MUST be only one note entry for each Input Manifest Identifier type. -2. The note entries MUST be in ascending order of Input Manifest Identifier - type. - -Conforming build tools MUST generate all Input Manifest Identifier types, -currently SHA1 and SHA256 Artifact Identifiers. - -[elf_section]: https://refspecs.linuxfoundation.org/LSB_3.0.0/LSB-PDA/LSB-PDA.junk/sections.html diff --git a/spec/SPEC-annex-C-source-embedding.md b/spec/SPEC-annex-C-source-embedding.md deleted file mode 100644 index 40b3ff4..0000000 --- a/spec/SPEC-annex-C-source-embedding.md +++ /dev/null @@ -1,101 +0,0 @@ -## Annex C - -Annex C contains a method of embedding an Input Manifest Identifier into source code files. - -### Embedded Input Manifest Identifier - -Most source code files are hand coded by humans. Some however are generated from other input(s) by a build tool. - -A build tool outputing a source code file may embed the Input Manifest Identifier for the output source code file into -the output source code file by adding a comment line containing a string of the form: - -``` -OmniBOR-Input-Manifest-ID: [ ${comma separated list of Input Manifest Identifier URIs} ] -``` - -For a file with C commenting semantics (like C, C++, Java, Go, etc) a concrete example might be: - -``` -// OmniBOR-Input-Manifest-ID: [ gitoid:blob:sha1:261eeb9e9f8b2b4b0d119366dda99c6fd7d35c64, gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 ] -``` - -For a file with shell commenting semantics (like a shell script, Python, etc) a concrete example might be: -``` -# OmniBOR-Input-Manifest-ID: [ gitoid:blob:sha1:261eeb9e9f8b2b4b0d119366dda99c6fd7d35c64, gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 ] -``` - -When interpretting an OmniBOR-Input-Manifest-ID comment line a reader should ignore any leading and trailing spaces around '[' or ']' -or ','. - -### Placement of OmniBOR-Input-Manifest-ID Comment Line - -The OmniBOR-Input-Manifest-ID comment line should be placed as the last line in the source code file. The OmniBOR-Input-Manifest-ID comment line should be preceded by a blank line to ensure it is not interpretted as part of another comment block. - -A tool reading the source code file should interpret the last OmniBOR-Input-Manifest-ID comment line it encounters in the file as being the Input Manifest Identifier, and ignore previous comment lines in the file which may contain Input Manifest Identifiers. - -Example: - -If the input source code file begins with: - -```go -// Code generated by stringer DO NOT EDIT. - -import ( - "fmt" -) -... -``` - -The output source code file should look like: -```go -// Code generated by stringer DO NOT EDIT. - -import ( - "fmt" -) -... - -// OmniBOR-Input-Manifest-ID: [ gitoid:blob:sha1:261eeb9e9f8b2b4b0d119366dda99c6fd7d35c64, gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 ] -``` - -If the input source code file begins with: - -```c -/* - * Copyright 2023 Yoyodyne Inc - * SPDX-License-Identifier: - */ - -#include -int main() { - // printf() displays the string inside quotation - printf("Hello, World!"); - return 0; -} -``` - -The output source code file should look like: - -```c -/* - * Copyright 2023 Yoyodyne Inc - * SPDX-License-Identifier: - */ - -#include -int main() { - // printf() displays the string inside quotation - printf("Hello, World!"); - return 0; -} - -//* OmniBOR-Input-Manifest-ID: [ gitoid:blob:sha1:261eeb9e9f8b2b4b0d119366dda99c6fd7d35c64, gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 ] */ -``` - -### Tools which mutate existing source code files - -Many source code generation tools, like patch, specifically mutate an existing input source code file which may contain -an existing OmniBOR-Input-Manifest comment. In such circumstances the tool should either - -1. Replace an existing OmniBOR-Input-Manifest comment if found -2. Insert the OmniBOR-Input-Manifest normally, which will cause it to be placed *after* the existing OmniBOR-Input-Manifest comment line. diff --git a/spec/SPEC.md b/spec/SPEC.md index fdbac60..471b264 100644 --- a/spec/SPEC.md +++ b/spec/SPEC.md @@ -2,10 +2,10 @@ | Field | Value | |:--------|:------| -| Version | 0.1 | +| Version | 0.2 | | Status | Draft | -## Foreword +## 1. Foreword This specification is subject to the Community Specification License 1.0, available at . @@ -41,21 +41,12 @@ WITH RESPECT TO THIS DELIVERABLE OR ITS GOVERNING AGREEMENT, WHETHER BASED ON BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR NOT THE OTHER MEMBER HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -## Introduction +## 2. Introduction -Software supply chains face many challenges: security and compliance chief -among them. Often, projects are hamstrung by the inability to easily and -reliably capture a complete, concise, verifiable accounting of exactly -__what__ inputs were built into software. Without this information, -identifying vulnerable software to patch or replace is difficult. While -Software Bills of Material (SBOMs) help identify third-party components, -they do not go far enough to precisely identify the exact inputs necessary -for vulnerability management. - -The OmniBOR standard defines three concepts, which together enable the -consistent, reproducible, and embeddable encoding of the exact inputs used to -build a software artifact: Artifact Identifiers, Input Manifests, and Artifact -Dependency Graphs. +The OmniBOR standard defines two concepts, which together enable the +consistent, reproducible, and embeddable identification of software +artifacts and the encoding of the exact inputs used to build software +artifacts: Artifact Identifiers and Input Manifests. An Artifact Identifier is a content-based identifier of a single input (for example, a single file) used to build a software artifact. Identifiers are @@ -68,16 +59,15 @@ Next, an Input Manifest lists the Artifact Identifier of every input used to produce an artifact. For example, if an executable is compiled by linking together a collection of object files, the Artifact Identifier of every object file would be listed in the Input Manifest for the executable. Input Manifests -can be identified by treating them as artifacts and applying the same identifier -heuristic to them applied to any other artifact. For purposes of discussion, -these are typically called Input Manifest Idenftiers or Input Manifest IDs -or IMIDs. The Input Manifest Identifier can be embedded directly into -executable files, or can be provided in a separate file alongside the artifact -whose inputs they describe. +can be identified by their own Artifact Identifier. The Artifact ID for the +manifest can be embedded directly into executable files, or can be provided +in a separate file alongside the artifact whose inputs they describe. -Finally, a collection of Input Manifests can be combined to produce an Artifact -Dependency Graph. The Artifact Dependency Graph is a complete description of -all inputs, direct or transitive, used to produce a software artifact. +The central purpose in the design of Artifact Identifiers and Input Manifests +is to enable creation of Artifact Dependency Graphs. An Artifact Dependency +Graph is a fine-grained description of all inputs, direct or transitive, used +to produce a software artifact. This graph can be constructed from collections +of Input Manifests for all dependencies used in an artifact's construction. Returning to the example of building an executable: the executable's Input Manifest would list the Artifact Identifier of every object file, and each @@ -89,154 +79,160 @@ produced the executable. With the Artifact Dependency Graph, consumers of this information could then exactly identify when two artifacts were produced with exactly identical inputs, and if inputs vary, could identify the exact inputs which vary and -observe how that affects the entirety of the graph. When coupled with -SBOM information about third-party dependencies, this can provide highly -specific and accurate identification of supply chain differences and their -causes. +observe how that affects the entirety of the graph. Changes in inputs anywhere +in the graph would result in new Artifact Identifiers for the changed input +and all inputs derived from it, enabling easy detection of changes in +the graph. This Artifact Dependency Graph may also be used to supplement vulnerability -information by precisely identifying affects files or resolving the impacts +information by precisely identifying affected files or resolving the impacts of changes to those files across all users of those projects. By leveraging transparent inclusion of Input Manifests into executable and other formats, -users would also gain the benefits of high precision supply chain information +users would also gain the benefits of high precision dependency information without manually recording or updating those manifests as projects develop over time. -## Scope +## 3. Scope -Specifies procedures for constructing and conveying Input Manifests, -Artifact Dependency Graphs (ADGs), and other related data structures -for artifacts. Including but not limited to: +This document specifies: -- formats for artifact identifiers -- formats for specifying graph relationships between artifacts -- manner of embedding identifiers for Input Manifests, ADGs, and other related - data structures in artifacts of various types -- guidance on metadata which references Input Manifests, ADGs, and other related data structures -- guidance for build tools for: - - constructing Input Manifests, ADGs, and other related data structures - - conveying Input Manifests, ADGs, and other related data structures - - embedding identifiers for Input Manifests, and other related data structures' ids in artifacts - - manners of conveyance of Input Manifests, and other related data structure's - - descriptions of use cases for which Input Manifests, ADGs, and other related data structures may be used +- The in-memory format of Artifact Identifiers +- The textual representation of Artifact Identifiers +- The process for constructing Artifact Identifiers +- The textual representation of Input Manifests +- How Input Manifests should be stored in a file system +- How Input Manifests should be embedded into the artifacts whose build + inputs they are describing +- How Input Manifests should be constructed by build tools. -## Normative References +## 4. Normative References - [GitOID URI][gitoid_uri] -## Terms and Definitions +## 5. Terms and Definitions For the purposes of this document, the following terms and definitions apply. -### Artifact +### 5.1. Artifact -An artifact is any object of interest that can be represented as arrays of -bytes (`[]byte`). +An artifact is any object of interest that can be represented as an array of +bytes. -### Artifact Equivalency +### 5.2. Artifact Equivalency Two artifacts are equivalent if and only if their binary representations are -equal. This can be expressed in pseudocode with the following expression: -`[]byte(artifact1) == []byte(artifact2)` +equal. + +### 5.3. Build Tools + +A build tool is anything which reads one or more input artifacts and writes +one or more output artifacts. Examples of build tools include: + +- __Compilers:__ + - `llvm-clang` + - `gcc` + - `javac` + - `rustc` + - `go` +- __Linkers:__ + - `llvm-lld` + - `binutils-ld` +- __Runtimes__ + - Java JVM + - Node.js + - Python interpreter +- __Code Generators__ + +## 6. Specifications -### Artifact Identifiers +### 6.1. Artifact Identifier -It should be possible to identify each artifact with an artifact identifier with +It MUST be possible to identify each artifact with an Artifact Identifier with the following characteristics: - __Reproducible__: Independent parties, presented with equivalent artifacts, - derive the same artifact identity. -- __Unique__: Non-equivalent artifacts have distinct identities. -- __Immutable__: An identified artifact can not be modified without also - changing its identity. - -## Build Tools - -A build tool is something which reads one or more input artifacts and writes -one or more output artifacts. Examples of build tools include: - -- compilers: - - llvm-clang - - gcc - - javac - - rustc - - go -- linkers: - - llvm-lld - - binutils-ld -- runtimes - - Java JVM - - Node.js - - Python interpreter -- code generators - -## Specifications + derive the same Artifact Identifier. +- __Unique__: Non-equivalent artifacts have distinct Identifiers. +- __Immutable__: An identified artifact cannot be modified without also + changing its Identifier. -### Artifact ID +Artifact Identifier can be shortened to Artifact ID. Because two artifacts are equivalent if and only if their binary representations are equal, a hash function may be applied to the binary representation of an artifact to yield an identifier which satisfies the -canonical, unique, and immutable requirements of artifact identifiers. +reproducible, unique, and immutable requirements of Artifact Identifiers. -### Artifact Identifier Types +### 6.2. Artifact Identifier Types -The majority of source code artifacts are already stored in git and -indexed by their git object identifiers ("gitoids") as git objects of type +The majority of source code artifacts are already stored in Git and +indexed by their Git Object Identifiers ("GitOIDs") as Git objects of type "blob". -For this reason, OmniBOR has chosen to use the "gitoid" of an Artifact as +For this reason, OmniBOR has chosen to use the "GitOid" of an Artifact as its Artifact Identifier. -Git currently supports two varieties of gitoids. One is based on SHA1 and is -in common use. The other is based on SHA256 and has been very slow to garner -adoption. The [gitoid URI spec][gitoid_uri] uses different prefixes, +Git currently supports two varieties of GitOIDs. One is based on SHA-1 and is +in common use. The other is based on SHA-256 and has been slow to garner +adoption. + +Git's use of SHA-1 is additionally complicated by the fact that Git actually +uses a variant of SHA-1 in newer versions, called SHA-1CD (where "CD" stands +for "collision detection") which tries to detect attempts to engineer +purposeful hash collisions in SHA-1 and subverts them by modifying the +operation of the hash algorithm in those cases. Git calls this "SHA-1" and does +not frequently distinguish use of the two similar but not equivalent hash +algorithms. + +The [GitOID URI spec][gitoid_uri] uses different prefixes, `gitoid:blob:sha1` or `gitoid:blob:sha256`, to distinguish which algorithm is -being used for computing the gitoid of a blob. This document adopts the gitoid -URI prefixes to distinguish Artifact Identifier Types. This approach is -anticipated to extend gracefully as git adopts new hash types in the future. +being used for computing the GitOID of an artifact, subject to the knowledge +that `gitoid:blob:sha1` may describe either SHA-1 or SHA-1CD depending on the +version of Git being used. -All subsequent references to mandatory identifier types in this document should -be interpreted to mean the list: +This document adopts the GitOID URI prefixes to distinguish Artifact Identifier +Types. This approach is anticipated to extend gracefully as Git adopts new hash +types in the future. -- `gitoid:blob:sha1` -- `gitoid:blob:sha256` +Given the challenges with SHA-1 including: -### Artifact Input Manifest +- Its weakness as a hash algorithm today, with some attacks already being + known publicly which permit collisions in some contexts, +- The fact that Git itself is expending effort to transition away from the use + of SHA-1 and toward SHA-256, +- A concern that SHA-1 could in the next five years be subject to orders to + transition away by some world governments, -An Artifact Input Manifest for an Artifact enumerates the inputs to the -build tool that produced the artifact. +We have decided to only permit the use of SHA-256 as a hash algorithm for +Artifact Identifiers for OmniBOR. -Hereafter in the spec Artifact Input Manifest will simply be referred to as Input Manifest. +We reserve the right to extend this list to support additional hash algorithms +in the future, for example if SHA-256 is determined to be broken by future +computing capabilities. -A given Input Manifest utilizes precisely one Artifact Identifier Type. +All subsequent references to Artifact Identifier types in this document should +be interpreted to mean the list: -#### Input Manifest Identifier +- `gitoid:blob:sha256` + +### 6.3. Input Manifest -An Input Manifest is identified by computing its identifier as an artifact -with the Artifact Identifier Type used for identifiers within the Input Manifest -itself. +An Input Manifest for an artifact enumerates the inputs to the build tool that +produced the artifact. -The Input Manifest Identifier for the Input Manifest of an artifact is sometimes -referred to as the Input Manifest Identifier of the artifact. +A given Input Manifest utilizes precisely one Artifact Identifier Type. -#### Input Manifest Header +#### 6.3.1. Input Manifest Header In order to distinguish the type of identifier used in the Input Manifest, -it begins with a single newline terminated header line: +it begins with a single newline-terminated header line: ``` -${Artifact Identifier Type uri prefix}\n +${Artifact Identifier Type URI prefix}\n ``` For example: -``` -gitoid:blob:sha1\n -``` - -or - ``` gitoid:blob:sha256\n ``` @@ -244,104 +240,308 @@ gitoid:blob:sha256\n All identifiers in a Input Manifest MUST be of the Artifact Identifier Type declared in the header. -#### Input Manifest Records +#### 6.3.2. Input Manifest Records The Input Manifest after the header consists of a list of newline terminated -input records +input records. + +Each input record consists of: -An input record for an artifact for which no Input Manifest Identifier is known is represented as: +- The Artifact ID of the input artifact +- Optionally, the Artifact ID of the Input Manifest for the input artifact + +An Input Manifest record for an artifact for which no Input Manifest +is known is represented as: ``` -blob⎵${artifact identifier of the input artifact}\n +${artifact identifier of the input artifact}\n ``` -An input record for an artifact for which an Input Manifest Identifier is known is represented as: +An Input Manifest record for an artifact: ``` -blob⎵${artifact identifier of the input artifact}⎵bom⎵${input manifest identifier of the input artifact}\n +${artifact identifier of the input artifact}⎵manifest⎵${artifact identifier of the input manifest for the input artifact}\n ``` -```⎵``` above refers to the ASCII space character (0x20). +`⎵` above refers to the ASCII space character (0x20). + +Artifact Identifiers in Input Manifests should be represented as a string in +lowercase hexadecimal. For example: -Artifact identifiers in Input Records should be represented as a strings in lower case hexadecimal. For example -514516097a2f95c893f2a9685bcecfb85b7598e6. +``` +514516097a2f95c893f2a9685bcecfb85b7598e6 +``` -The input artifact records must be written to the Input Manifest in lexical -order. +The input artifact records MUST be written to the Input Manifest in lexical +order. This is defined by sorting primarily by the input type, and secondarily +by the Artifact ID of the input artifact. -The Artifact Identifier and Input Manifest Identifier must both be of the Artifact Identifier -Type declared in the Input Manifest header. +The Artifact Identifier for the input artifact and for the input artifact's +Input Manifest MUST both be of the Artifact Identifier Type declared in the +Input Manifest header. -#### Input Manifest Character Encoding +#### 6.3.3. Input Manifest Character Encoding All characters in an Input Manifest are encoded in ASCII. Please note: all '\n' -must be encoded as '\n' characters, _not_ the line delimiter of the platform. +MUST be encoded as '\n' characters, _not_ the line delimiter of the platform. +This is necessary because the Input Manifest will be hashed to produce its +Artifact Identifier, and these Artifact Identifiers MUST be consistent +regardless of the platform on which the Input Manifest generation is performed. -#### Input Manifest Identifier Embedding +#### 6.3.4. Input Manifest Embedding -Each build tool should embed into the output artifact a deterministically -ordered list of Input Manifest Identifiers for each mandatory Artifact +Each build tool SHOULD embed into the output artifact a deterministically +ordered list of Artifact IDs for the Input Manifest for each mandatory Artifact Identifier Type in a manner: 1. Appropriate to the type of artifact 2. Generally agreed upon for that artifact -#### Input Manifest Construction by a Build Tool +If embedding is not possible — for example, if the format of the output +artifact does not permit a method to embed additional information without +breaking the functionality of that artifact — then embedding SHOULD be +skipped. + +#### 6.3.5. Input Manifest Construction + +A build tool creating an output artifact MUST compute an Input Manifest of +each mandatory Artifact Identifier Type. + +For each input artifact the build tool MUST: + +1. Compute the artifact identifier of the input: `${artifact identifier}` +2. Examine the input for an embedded Artifact ID for an Input Manifest: + `${input manifest artifact id}` + +The build tool MUST persist an Input Manifest using the +`${artifact identifier}` and `${input manifest artifact id}` for each input. + +#### 6.3.6. Input Manifest Example + +``` +gitoid:blob:sha256 +09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 +230f3515d1306690815bd9c3da0d15d8b6fcf43894d17100eb44b6d329a92f61 +2f4a51b16b76bbc87c4c27af8ae062b1b50b280f1ab78e3eec155334588dc88e manifest 4f3a822f776412c049dda53c3277bf2225b51b805ce8a99222af23a7d9f55636 +c71d239df91726fc519c6eb72d318ec65820627232b2f796219e87dcf35d0ab4 +f47ffb3518f236eea6525fd29f057ddd5cda1bb803ccc662e6bc5925afd1e4af +``` + +## 7. Storing Input Manifests + +This section documents known methods of persisting Input Manifests to +various stores. + +If a build tool persists an Input Manifest to its local filesystem, +the build tool should write out the Input Manifest to +`${OMNIBOR_DIR}/manifests/${Artifact Identifier Type URI prefix with ':' replaced by '_'}/${Input Manifest Artifact ID:0:2}/${Input Manifest Artifact ID:2:}` +where `${Input Manifest Artifact ID}` is the Artifact ID of the Input +Manifest in lowercase hexadecimal with leading zeros NOT suppressed. + +Example: + +If `OMNIBOR_DIR=.omnibor` then the Input Manifest with Artifact ID +`gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772` +would be stored in: + +``` +.omnibor/manifests/gitoid_blob_sha256/09/c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 +``` + +### 7.1. Use of a Target Index + +Some desirable operations related to Input Manifests require associating an +Input Manifest with the artifact it is describing. Ideally, the artifact itself +has embedded in it the Artifact ID of its Input Manifest. However, in cases +where this is not true, or as a performance optimization even when it _is_ true, +it is desirable to maintain a "Target Index" which associated the Artifact ID +of an artifact with the Artifact ID of its Input Manifest. -A build tool creating an output artifact must compute an Input Manifest of -each mandatory artifact identifier type. +This Target Index MUST be stored at `${OMNIBOR_DIR}/targets`, and MUST take +the form of a text file where each line (separated only by the `\n` character) +MUST be formatted as follows: -For each input artifact the build tool must: +``` +${Artifact ID of the target artifact}⎵${Artifact ID of the Input Manifest} +``` + +`⎵` above refers to the ASCII space character (0x20). + +This index file may then be used to improve the performance and reliability of +operations related to Input Manifests. + +### 7.2. Selection of Storage Location + +The storage location for Input Manifests MAY be set by the following methods, +listed in order of increasing precedence: + +1. A non-empty env variable named `OMNIBOR_DIR` +2. A build tool specific flag + +The absence of specification of a storage location via either the build tool +specific flag or `OMNIBOR_DIR` variable MAY be taken as a signal to skip +generation of Input Manifests by build tools. + +## 8. Embedding Artifact IDs in ELF Files + +This section contains a method of embedding Artifact IDs for Input Manifests +into ELF files. + +If an ELF artifact contains an embedded Artifact ID for an Input Manifest, +then implementations MUST conform to the format specified in this document. + +Artifact IDs for Input Manifests MUST be persisted by build tools when they +build an artifact and produce an Input Manifest for that artifact. + +When persisting Artifact IDs for Input Manifests into an ELF object or an ELF +executable, the build tool MUST create a [section][elf_section] named +`.note.omnibor` and place the Artifact IDs in the descriptor field of the note +entry. + +This section MUST be of type `SHT_NOTE` and MUST have the attribute +`SHF_ALLOC`. Multiple Note entries MAY be created, one for each Artifact +Identifier Type used. + +Each note entry MUST contain the following fields in the same order as given +below: + +1. `namesz` (4 bytes): This field MUST be set to a value of `8`, the length of + the 'owner' field `OMNIBOR\0` in bytes. +2. `descz` (4 bytes): This field MUST contain the length of the Artifact ID for + the Input Manifest in bytes, including a byte for the null terminator. +3. `type` (4 bytes): This field MUST contain the value associated with one of + the reserved Artifact Identifier types. The values for the reserved types + are in the range of `0x00000000` to `0x7fffffff`. Permissible types with + reserved values are: + - `NT_GITOID_BLOB_SHA256` is `0x1` +4. `owner` (8 bytes): This field MUST contain the string `OMNIBOR\0`, padded + to 8 bytes. +5. `descriptor`: This field MUST contain the Artifact IDs for the Input + Manifests as raw bytes. The length of this field is the same as the value + in the `descz` field. + +When recording multiple Artifact IDs for Input Manifests in the note section, + +1. There MUST be only one note entry for each Artifact Identifier Type. +2. The note entries MUST be in ascending order of Artifact Identifier Type. + +Build tools MUST generate all Artifact Identifier Types, currently only SHA-256. + +## 9. Embedding Artifact IDs in Text Files + +This section contains a method of embedding an Artifact ID for an Input +Manifest into source code files. + +Most source code files are hand coded by humans. Some however are generated +from other input(s) by a build tool. + +A build tool outputing a source code file may embed the Artifact ID for the +Input Manifest for the output source code file into the output source code +file by adding a comment line containing a string of the form: -1. Compute the artifact identifier of the input - `${artifact identifier}` -2. Examine the input for an embedded Input Manifest Identifier - - `${input manifest identifier}` +``` +OmniBOR-Input-Manifests: [ ${comma separated list of Artifact ID URIs for Input Manifests} ] +``` + +For a file with C commenting semantics (like C, C++, Java, Go, etc) a concrete +example might be: -The build tool must persist an Input Manifest using the -`${artifact identifier}` and `${input manifest identifier}` for each input. +``` +// OmniBOR-Input-Manifests: [ gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 ] +``` -#### Input Manifest Examples +For a file with shell commenting semantics (like a shell script, Python, etc) a +concrete example might be: -```gitoid:blob:sha1 -blob 06a6891154fff74e1ddb6245f4a0467b09c617c5 -blob 06dd79bc831bb06a6267a36ad2d62beccd7900b2 bom a9a64def763517df596fbb4348a8561069b5dc4b -blob 0bc39408c1e5feaadd6f0420d14324b477420b93 -blob 15acd4427ca14000111aad5071563bc7f2dc09f4 -blob 1be90e6fab4ab9b7dd3b27cea5bb1fe29acc0204 -blob 1d8a4e28d1b62a2bfeba837fe18422cd106e6ddf bom 5bda8237d1676df0a2d0b8682d40f99a27ef5b13 -blob 28488e0b05954ccf87c779f5f9258987e4d68ac5 -blob 2c0cde251f1a9f05563a5f7a7f32588f04aaa235 ``` +# OmniBOR-Input-Manifests: [ gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 ] +``` + +When interpretting an OmniBOR-Input-Manifest-ID comment line a reader should +ignore any leading and trailing spaces around `[` or `]` or `,`. + +### 9.1. Placement of Embedded Artifact IDs + +The `OmniBOR-Input-Manifest` comment line should be placed as the last line in +the source code file. The `OmniBOR-Input-Manifest` comment line should be +preceded by a blank line to ensure it is not interpretted as part of another +comment block. + +A tool reading the source code file should interpret the last +`OmniBOR-Input-Manifest` comment line it encounters in the file as being the +Artifact ID of the Input Manifest, and ignore previous comment lines in the +file which may contain additional Artifact IDs. + +Example: + +If the input source code file begins with: -```gitoid:blob:sha256 -blob 09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 -blob 230f3515d1306690815bd9c3da0d15d8b6fcf43894d17100eb44b6d329a92f61 -blob 2f4a51b16b76bbc87c4c27af8ae062b1b50b280f1ab78e3eec155334588dc88e bom 4f3a822f776412c049dda53c3277bf2225b51b805ce8a99222af23a7d9f55636 -blob c71d239df91726fc519c6eb72d318ec65820627232b2f796219e87dcf35d0ab4 -blob f47ffb3518f236eea6525fd29f057ddd5cda1bb803ccc662e6bc5925afd1e4af +```go +// Code generated by stringer DO NOT EDIT. + +import ( + "fmt" +) +... +``` + +The output source code file should look like: +```go +// Code generated by stringer DO NOT EDIT. + +import ( + "fmt" +) +... + +// OmniBOR-Input-Manifest: [ gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 ] ``` -### Artifact Dependency Graph (ADG) +If the input source code file begins with: -The Artifact Dependency Graph (ADG) of an artifact is the recursive DAG -(Directed Acyclic Graph) of all the "input artifacts" that are transformed -by a build tool into that artifact. It includes the direct input artifacts, -and the recursive set of input artifacts to each input artifact, all the way -down the graph. +```c +/* + * Copyright 2023 Yoyodyne Inc + * SPDX-License-Identifier: + */ -Concretely the Artifact Dependency Graph (ADG) of an Artifact is: +#include +int main() { + // printf() displays the string inside quotation + printf("Hello, World!"); + return 0; +} +``` + +The output source code file should look like: + +```c +/* + * Copyright 2023 Yoyodyne Inc + * SPDX-License-Identifier: + */ + +#include +int main() { + // printf() displays the string inside quotation + printf("Hello, World!"); + return 0; +} + +// OmniBOR-Input-Manifest: [ gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772 ] +``` -- The set of Input Manifests defined by: - - The Input Manifest of the Artifact - - Any Input Manifest referenced in an Input Manifest in the set (ie the transitive closure of the Input Manifests) -- The Input Manifest Identifier of the Artifact -## Annexes +### 9.2. Modifying Existing Text Files -- [Annex A - File System Storage](SPEC-annex-A-filesystem-storage.md) -- [Annex B - Elf Embedding](SPEC-annex-B-elf-embedding.md) -- [Annex C - Source Embedding](SPEC-annex-C-source-embedding.md) +Many source code generation tools, like `patch`, specifically mutate an +existing input source code file which may contain an existing +`OmniBOR-Input-Manifest` comment. In such circumstances the tool SHOULD +either -## Bibliography +1. Replace an existing `OmniBOR-Input-Manifest` comment if found +2. Insert the `OmniBOR-Input-Manifest` normally, which will cause it to + be placed _after_ the existing `OmniBOR-Input-Manifest` comment line. +[elf_section]: https://refspecs.linuxfoundation.org/LSB_3.0.0/LSB-PDA/LSB-PDA.junk/sections.html [rfc_2119]: https://tools.ietf.org/html/rfc2119 [gitoid_uri]: https://www.iana.org/assignments/uri-schemes/prov/gitoid