diff --git a/text/0000-signing-registry-commits.md b/text/0000-signing-registry-commits.md new file mode 100644 index 00000000000..4ad9e4651f9 --- /dev/null +++ b/text/0000-signing-registry-commits.md @@ -0,0 +1,358 @@ +- Feature Name: verified_registry_commits +- Start Date: 2018-05-08 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Enable commits to crate registry indexes to be signed & verified, providing +stronger guarantees as to the authenticity of their content and (transitively) +guaranteeing the authenticity of the content of the packages in the registry. + +# Motivation +[motivation]: #motivation + +Crate registries like the crates.io registry are a way of distributing source +code which is ultimately built and executed on end user machines. It is vital +that users receive valid source files for the packages they download, +especially because a malicious package could be a major attack vector to harm +users of Rust, or users of those users' projects. + +The index of a registry is a git repository downloaded by cargo over HTTPS or +SSH. This index contains information about each package, including a SHA-256 +checksum of the content of that package. When cargo downloads a package from +the registry, its contents are verified against that checksum. This is intended +to guarantee that the content the user downloads is the same data this referred +to by the registry index. + +However, a malicious or otherwise ill-behaved third party could conceivably +(such as with a MITM attack) intercept the download of the index and modify the +checksum of a package, allowing them later to modify the content of that +package when the user requests it. They could also, conceivably, modify the +index repository at its central storage, making similar malicious edits which +will be accepted by every user. + +If the content of the registry index could be authenticated in a stronger way, +this would make it more difficult for an attacker to modify index data. Because +the registry is a git repository, and a git repository is a kind of [merkle +tree][merkle], signing a commit verifies the content of all the data that +commit contains (modulo the security properties of the repository's hash +function - a discussion of SHA-1 is later in this RFC). Because a hash of each +package contents is contained in the index repository, the index as a whole can +be thought of as a merkle tree, some of the leaves of which are all of the +packages in the registry. + +Signing commits is an effective and cheap way to provide stronger authenticity +of package contents. It enables cheap key rotation, because a new signature on +the head of the index repository validates all of the content in the registry. +Because git already supports commit signing, it is a natural extension of our +existing practices, rather than a large scale re-engineering of the registry +system. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +For normal users, this change should be completely transparent. Whenever the +index is updated if the registry in question is a registry that ought to be +signed (a 'signed registry'), the signature of HEAD is verified against the +public keys which are permitted to sign commits to the index. + +## The keys TOML + +Inside the `.cargo` directory, a TOML file exists for each signed registry +containing the public key data for all signers. Public key data is an object +containing these members: + +- `key`: An ASCII armored OpenPGP public key (conforming to RFC 4880) +- `info` (optional): Information about this key, such as the user it is + supposed to belong to, which is not cryptographically verified. +- `can-commit` (optional): Whether or not this key can be used to create + commits to the registry. Boolean; Defaults to false. +- `can-rotate` (optional): Whether or not this key can be used to perform a key + rotation on this registry. Boolean; Defaults to false. + +This file is maintained through a key rotation mechanism described later in the +RFC. It is plain text, and users may freely review or edit it for each registry +they depend on (though if they remove keys that are not considered invalid by +the registry administrators, they may break their dependency on that registry). + +## When is a registry a signed registry? + +Registries are considered signed registries if either of these hold true: + +1. Current `HEAD` is a signed commit (by any key). +2. A keys file exists for that registry. + +An attempt to update the `HEAD` of a signed registry to a commit that is not +signed by one of the existing committer keys is a hard failure that will +prevent cargo from resolving dependencies in that registry any longer. Until +the state is fixed, cargo will not generate lockfiles for crates that depend on +packages in that registry. This includes a commit which is not signed at all. + +If the HEAD of a registry is signed but that registry has no signing keys TOML +file, that registry will be considered a signed registry, but in a broken state, +because HEAD is signed but there are no trusted signing keys. In general, this +definition of signed registry is supposed to "fail closed." + +## Why PGP formatted keys? + +This RFC specifies that keys and signatures are exchanged using PGP format, +even though it also adds that cargo will not ship a full PGP implementation to +verify signatures. The PGP format is a rather complex binary format which +supports many options that are not relevant for our use case: as a result, +we've specified that we only support a subset of the PGP format. One could +fairly ask why we use PGP at all instead of a more straightforward solution. + +The primary reason to use the PGP format is to integrate with existing git & +gpg tooling. The subset of PGP we support is compatible with keys generated by +gpg2 and with signatures on commits made with `git commit -S`. This allows +users to manually produce and verify signatures as necessary either for +administrative purposes or to check the correctness of cargo's behavior. +Additionally, GitHub has a mechanism to host GPG keys associated with a GitHub +account, which would not be possible for a custom key format. + +## Advisory for people running registries + +cargo will only validate the signature of the `HEAD` commit, no intermediate +commits. All committers to the registry **MUST** verify the signature of +parents to their commits, or the security properties of this system will be +violated. + +Registries are free to make their own policies regarding the distribution of +keys and when to perform rotations, but if a registry operator commits +without verifying the signature of that commit's parents, they have +nullified the benefits of running a signed registry over an unsigned +registry. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## Signature schemes & data formats supported by cargo + +Though keys are stored in the OpenPGP format, cargo will not actually verify +signatures using a complete OpenPGP implementation such as gpg, which would be +a significant new C dependency to ship to all users. Instead, cargo will use +pure Rust implementations of established signature schemes, as well as a pure +Rust parser for a subset of the OpenPGP format. + +All signatures are valid OpenPGP signatures, which means they are signatures of +hashes of the data being signed as well as metadata in accordance with the +OpenPGP spec ([IETF RFC 4880][rfc4880]). + +### OpenPGP subset + +All signatures and public keys will be distributed in a subset of the data +format described in RFC 4880 which is consistent with the behavior of recent +versions of gpg2 (allowing the manual creation of signatures and keys as +necessary). + +These are the requirements for a signature or public key to be supported by +cargo: + +- It uses what RFC 4880 calls the "old format packet header" for all packets +- All packets have a two octet length (so the packet header is 3 octets long). +- Signatures and public keys are both version 4 of their format. +- It uses one of cargo's supported signature and hash algorithms. +- The first hashed subpacket of any signature is the public key fingerprint + with subpacket type 33 (note this is the 20 byte fingerprint, not the 8 byte + key ID). + +This conforms to the default behavior of gpg2 and is what is accepted and +generated by [a Rust implementation of a subset of OpenPGP written by the +RFC author][pbp]. + +### Signature and hash algorithms + +Our initial implementation will support only one signature algorithm and one +hash algorithm. This may be extended in time. The signature algorithm is EdDSA +with curve 25519 (ed25519) and the hash algorithm is SHA-256. + +## Signature distribution format + +Signatures are distributed in the same manner that git distributes commit +signatures - as an additional header on the commit, verifying the commits +contents, commit message, and other headers. This way, users can verify +index commits using their own version of git, and administrations making manual +edits to the registry can use git to generate the signatures. + +## Key rotation + +Key rotation is performed by adding an annotated tag to the index repository, +called a key rotation tag. A key rotation tag must be signed by a key with the +`can-rotate` privilege. + +A key rotation tag should point to the HEAD commit of the registry index at the +time it is made. The commit it points to should be signed using a key which is +in the post-rotation key set. + +The name of a key rotation tag must be `rotation-$FORMAT-$N` where $N is an +integer and $FORMAT is the format used for this rotation; the only format +specified by this RFC is `v1`. We recommend that registries use a counter +starting at `0` for `$N`. The number is only significant to distinguish +rotations from each other, and has no semantic role. + +The message body of a key rotation tag in the `v1` format is a TOML document, +having the same structure as the keys toml which contains the authorized +signing keys described previously in the RFC. + +When cargo updates the index, it will iterate through the new tags matching +this format that have been added to the repository in their order in the commit +history. cargo will verify that the tag is signed by a key with the +`can-rotate` privilege and then update the trusted key set by completely +replacing it with the signing keys. + +## crates.io initial policy + +crates.io keys will always contain a GitHub user account name in their `info` +field (possibly along with other info), and will publish those keys to that +GitHub account. The current set of crates.io keys will be published on +rust-lang.org as well. Users can use these other forms of publication to verify +that the keyset they have is valid. + +A single key, belonging to the bors account, will be stored in the crates.io +service. This key will only have the commit privelege, not the rotate +privilege. Other keys, with various priveleges, will be stored in an offline +format, and belong to individual members of the Rust core team. These keys will +be used for administrative purposes. + +Keys, especially the online bors key, may be rotated at irregular intervals, +not necessarily because of a known compromise. An explanation of the rotation +will always be published at rust-lang.org. We do not commit to any particular +rotation schedule. + +## Security considerations + +### Trust on first use + +When a user first downloads a registry index, or transitions an index from +unsigned to signed, they have no pre-existing trusted keys for that registry. +For this reason, the first access of a signed registry is a leap of faith in +which the user also obtains the keys to trust for future updates. A successful +attack at that point would leave the user will with an invalid index. + +We can harden crates.io against attacks at this point by distributing the +current trusted keys with the rustup distribution, allowing any security +mechanisms we put in place for the distribution of Rust and cargo binaries to +also secure this operation. + +### An attacker with no keys + +An attacker with no keys cannot sign commits or rotation tags. Because of this, +an attacker with no keys would not be able to modify the registry index, even +if they were able to defeat the existing security measures like our use of +HTTPS or SSH to transmit the index data over the network. + +However, such an attacker could still prevent the user from receiving updates +to the index by - for example - MITMing their connection to the git server to +report that there is no update. Such an attack could keep users from receiving +essential security updates to their dependencies. Hardening cargo against this +sort of attack is left as future work. + +### An attacker with a `can-commit` key + +An attacker who has compromised a key with the `can-commit` privilege could +make commits to the index, modifying the data. This would essentially revert +the system to its security properties before this RFC. + +However, if a key compromise of this sort is discovered, an automated key +rotation could remove the compromised key from the set, restoring the security +properties of the system. + +### An attacker with a `can-rotate` key + +An attacker who has compromised a key with the `can-rotate` privilege could +rotate the key set, adding and removing keys at will. This would allow them to +take control of the index, preventing legitimate updates (such as key +rotations) from reaching users. If a `can-rotate` key were compromised, it +would likely be very disruptive to all users. + +For that reason, crates.io will adopt policy that `can-rotate` keys are stored +in an offline medium, with due security precautions. As a future hardening, we +could also support a threshold signature scheme, requiring signatures from +multiple `can-rotate` keys to perform a key rotation, reducing the impact of +compromising a single key with this privilege. + +### SHA-1 + +The security of this system hinges on the security of the hash function used to +implement the git repository. Unfortunately, git currently uses SHA-1, a hash +function with known flaws that allow for collision attacks. A successful +collision attack against a crate index would nullify the security benefits of +this RFC: an attacker would be able to swap out one commit for another which +would both appear to be signed. + +However, there are a few mitigating factors which make this RFC worthwhile to +pursue despite these problems: + +1. The SHAttered collision attack depends on the ability to control data in +both colliding objects in order to create a collision (that is, they cannot +create a collision with an arbitrary hash). Hypothetically, an attacker could +upload a crate to the registry with index metadata that they can use to create +a collision; however, this increases the difficulty significantly in comparison +to the SHAttered case. +2. Even the SHAttered case was prohibitively expensive. Breaking into one of +our administrators' homes to copy their signing key is probably cheaper. +3. Our current git host, GitHub, checks for the signs of this sort of collision +attack and would not accept an object containing a collision. + +That said, we take the weakness of SHA-1 seriously, and will commit to +switching to a stronger hash function as soon as it is possible for us to do +so. In order to do that, git needs to be updated to support a new hash +function, and that upgrade needs to be supported by both GitHub (which hosts +our index) and libgit2 (which cargo uses for git operations). + +# Drawbacks +[drawbacks]: #drawbacks + +The primary drawback of this is that it increases the operational complexity of +managing crates.io, and the complexity of the cargo codebase. The Rust +infrastructure and dev-tools teams would be taking on the burden of maintaining +this system & provisioning and protecting secret keys. The additional security +benefits of this RFC will depend on their key management practices. + +# Rationale and alternatives +[alternatives]: #alternatives + +The primary alternative to a security enhancement like this would be to switch +wholesale to [TUF][tuf], a complete, designed framework for secure upgrades. + +TUF as described by its spec is not backward compatible with the existing +registry index format that cargo uses. We could not simply bolt on TUF to +cargo, we would have to do a complete system change over. Such a change over is +expensive in terms of our resource allocation and in terms of the user +experience of migration: realistically, we would not be able to prioritize it +in the near future. Even if we do some day switch to TUF, it is worthwhile to +moderately improve the security of our system *now*. + +The specified format for TUF target files is also not designed to be an +efficient mechanism for storing the index for a package repository like cargo. +It stores the data cargo would store in its index in a single, large JSON file. +cargo's use of a git-based index was designed on the basis of its original +authors' experience with package managers like this - specifically, the issue +of monotonic, incremental updates, and of parsing only the information for the +specific packages needed in dependency resolution. For this reason, TUF would +likely need to be modified nontrivially to support crates.io, making it an even +larger task to switch to TUF. + +Given all of this, an incremental change like signing the repository git +commits seems advantageous given that we cannot perform a migration to TUF at +this time. + +# Prior art +[prior-art]: #prior-art + +The most important prior art is the [TUF][tuf] specification, described in the +previous section. It would be good to continue to take learnings from TUF as we +try to improve the security of our infrastructure. + +# Unresolved questions +[unresolved]: #unresolved-questions + +No major unresolved questions as of this time. + + +[tuf]: https://theupdateframework.github.io/ +[rfc4880]: https://tools.ietf.org/html/rfc4880 +[pbp]: https://github.com/withoutboats/pbp +[merkle]: https://en.wikipedia.org/wiki/Merkle_tree