Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncommitted Cargo.lock files prevent reproducing .crate files from source #9242

Open
Gaelan opened this issue Mar 6, 2021 · 2 comments
Open
Labels
A-reproducibility Area: reproducible / deterministic builds S-triage Status: This issue is waiting on initial triage.

Comments

@Gaelan
Copy link

Gaelan commented Mar 6, 2021

(Apologies if I've filed this in the wrong place; happy to move it wherever it makes sense.)

There's been some interest lately in ensuring that code uploaded to crates.io is the same as the code in the repository on GitHub.

Aside on why this is worth doing

Most people who are interested in a crate's code will look at GitHub (or similar) repository, not the code uploaded to crates.io (citation needed, but I know I do this and I assume most others do too). This means that the code that's actually running is looked at by comparatively fewer people, and authors of malicious crates can make their vulnerabilities less likely to be discovered. This has happened in practice with the event-stream NPM package. By comparing the published crate to the GitHub source, this ensures that any malicious code must be visible when people go looking for it.

This doesn't need to be done by cargo, of course, but Cargo's current method of generating crate files makes it difficult for any tool to do this.

Ideally, .crate files would be bit-for-bit reproducible. If that were the case, this would be as simple as downloading the .crate file, cloning the source, running cargo package, and comparing hashes. #8864 made it most of the way there, but it fails in practice (with at least the crates I tested, the latest versions of hyper and rand), because the Cargo.lock files in the uploaded crate differ from the newly generated one. The crates follow the official guidance to omit the file (because they're libraries), so my Cargo generates a new one on the fly, including any new versions of dependencies since the crate was uploaded. Therefore, there's a mismatch.

I see a few solutions here:

  • Use a more sophisticated method to do the comparisons, ignoring Cargo.lock. This would work, but I since this is a security-related feature, more complexity is more attack surface, and malicious code could potentially be hidden either by a bug in the comparison algorithm or if it's somehow possible to do something malicious with just a Cargo.lock file (maybe if it isn't actually TOML, but contains rust code or a binary?)
  • Change the guidance to suggest checking in Cargo.lock files, even for libraries. I think some libraries are doing this anyway, to avoid dependency updates breaking their CI at arbitrary times? But this would be a major departure from current guidance for a relatively small gain, and it would likely take a long time for maintainers to adapt.
  • Omit the Cargo.lock from .crate files for crates that only contain libraries. I assume this file is never actually read unless the crate in question was passed directly to cargo install? If so, this seems like the best way forward.
@ehuss
Copy link
Contributor

ehuss commented Mar 10, 2021

Omit the Cargo.lock from .crate files for crates that only contain libraries.

This is already how Cargo works. The lock is only included if the package has a binary or example. I suspect the projects you looked at might have had some examples?

@ehuss ehuss added the A-reproducibility Area: reproducible / deterministic builds label Mar 10, 2021
@Gaelan
Copy link
Author

Gaelan commented Mar 11, 2021

Aha, you're right, rand_core.crate, which has no examples, is reproducible. (Interestingly, libc.crate also doesn't have examples, but isn't reproducible because vxworks/mod.rs is marked as executable in the git repository but not the official crate file. Weird.)

That still leaves the question of how to handle this. The current practice of shipping non-version-controlled Cargo.lock files for libraries with examples isn't great; you might get lucky and get something meaningful if the developer did their packaging from the same checkout they'd been working in, but that goes out the window if the packaging is done on another machine (or CI). I think the options now are:

  • Document that libraries with examples should check in Cargo.lock
  • Have cargo leave out the Cargo.lock, even if examples are present, if it is listed in .gitignore

@epage epage added the S-triage Status: This issue is waiting on initial triage. label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-reproducibility Area: reproducible / deterministic builds S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

3 participants