Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow .cargo-checksum.json to be missing #11063

Open
leighmcculloch opened this issue Sep 8, 2022 · 10 comments
Open

allow .cargo-checksum.json to be missing #11063

leighmcculloch opened this issue Sep 8, 2022 · 10 comments
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage.

Comments

@leighmcculloch
Copy link

leighmcculloch commented Sep 8, 2022

Problem

Cargo requires .cargo-checksum.json to be present in all crates vendored in the vendor/ directory but provides no convenient means for generating the file when manually adding a crate to the directory.

Adding crates to the vendor/ directory manually is sometimes necessary when building local vendor directories to be used for source replacement, but having to manually create a .cargo-checksum.json is difficult.

It is worth noting that cargo allows the .cargo-checksum.json file to contain no checksums, i.e. {"files":{}} and so the convenient way to add crates to the vendor directory without actually building the file is with:

echo '{"files":{}}' > vendor/.../.cargo-checksum.json

Other than this one file building custom vendor/ directories for source replacement is very convenient and easy thanks to the fact that .crate files can be simply extracted into the vendor/ directory. It should be convenient and simple to do this without writing mostly empty .json files into the tree.

Proposed Solution

Allow the .cargo-checksum.json file to be missing. If it is present use its checksums. If it is not present, continue. This is the same behavior as what happens if a checksum isn't found in the file, and so this is not a breaking change and barely any new functionality.

Notes

Discussion on Zulip: https://rust-lang.zulipchat.com/#narrow/stream/246057-t-cargo/topic/cargo.20vendor.20for.20multi-crate.20workspace.20package.20verification/near/297866366

@leighmcculloch leighmcculloch added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Sep 8, 2022
@weihanglo
Copy link
Member

On the other hand, I feel like directory source (a.k.a. vendored source) is not meant to be under any manual modification. The purpose of .cargo-checksum.json verification layer is to make alternative sources more reproducible and prevent from accidental modification. You can find the notion in the commit message of 7fd5243.

Also worth mentioning that personally I see what cargo-vendor does as a blackbox, since re-running cargo vendor might delete whatever you just manually added.

Adding crates to the vendor/ directory manually is sometimes necessary when building local vendor directories to be used for source replacement.

I am more curious about the use case here. Would you like to elaborate more on what verification you want to achieve?

@leighmcculloch
Copy link
Author

leighmcculloch commented Sep 23, 2022

The use case is to use the vendor/ directory for broad source replacement. It is by far the simplest way to perform source replacement across a large number of crates. The specific flow for why I'm using it is for do package verification of multiple crates in a workspace together before publishing. The full flow is here: #1169 (comment).

Maybe cargo vendor should have an add subcommand that adds a .crate file, and creates the checksum file for you, but it seems overkill given the otherwise simple nature of the vendor directory and the fact that cargo vendor doesn't require the checksum file to contain any hashes.

Maybe cargo vendor shouldn't be the one adding this file and instead the file should be included in all .crate files at the point of packaging. It is somewhat odd that it is tacked on only when building the vendor directory.

@weihanglo
Copy link
Member

Thank you for the information! Let me simplify what I understand so far: You want a cargo publish --dry-run that works well with a workspace which member depends on each other. If it is true, I would say improving cargo publish might be a more pertinent approach to this issue.

The Cargo Team are working on improving experience of publishing. For example, #11062 is an attempt to wait for a publish propagaing to crates.io index. There is also an external tool cargo-release I would recommend giving it a try. @epage is an expert on this topic and may have more experiences and ideas to share for publishing a workspace.

@leighmcculloch
Copy link
Author

leighmcculloch commented Sep 25, 2022

Improving cargo publish to support workspaces as first class would be a wonderful but #1169 has been open for 8 years so I think it's a little more complex of an issue.

However, this issue is primarily about the .cargo-checksum.json file, and the fact that it's a bit of an oddity:

  • cargo doesn't actually check all of the crates files, only those listed in the json
  • cargo errors when the file is missing but is okay with it being empty
  • it's only present in vendored sources no where else so no where else can benefit from it
  • other similar tools, like go mod, address the same issue of preventing accidental editing of files by making the files read only

After thinking about this file more it feels like it's being generated in the wrong place. Shouldn't a checksum file for a crate be available at all times in the .crate file at the time of packaging?

If packaging generated the file it would:

  • allow anyone in possession of a crate file to validate the files
  • make the vendor directory just a directory of crates with no other bespoke functionality

@weihanglo
Copy link
Member

Thank you for the thorough investigation! It does point out the discrepancy between different sources on handling checksums, which has rooms to improve.

So far, I see several proposals around checksums, and each of them does a bit differently. Before we proceed, I would like to take a step back to make sure we are on the same page. Here are a few questions we need to clarify:

  • What is the problem Cargo users bump into today with .cargo-checksum.json?
    • If it is a new feature, what does it actually improve in terms of user experience?
    • If it is a refactor, How does it help Cargo contributors or third party extensions developers?
  • Is there any alternative approach today to solve this problem without touching Cargo internals?
    • How much does it cost? Is it painful?

I am sorry for throwing a bunch of questions here, but it is important to frame the question in a correct scope before we diverge. It also helps we focus on what we care about instead of going into yak shaving.


Below are some responses for "the oddity" of .cargo-checksum.json, which I suggest you not looking into them at this moment before we figure out general questions above, but feel free to do so if you want.

Expand
  • cargo doesn't actually check all of the crates files, only those listed in the json
  • cargo errors when the file is missing but is okay with it being empty

I would say it is not an issue for the current implementation of cargo-vendor, since the checksum file is, as you mentioned, a bespoke format for cargo-vendor (or more precisely, for DirectorySource). It serves as what it does unless someone is hacking on it.

  • it's only present in vendored sources no where else so no where else can benefit from it

That's true. However, finish_download also does that for remote registry source. I currently haven't seen any benefit for creating the other checksum file for remote registry sources. Maybe it's like a stronger validation for filesystem corruption?

  • other similar tools, like go mod, address the same issue of preventing accidental editing of files by making the files read only

That's really fascinating! I believe this is a good direction for vendor crates. There is also a previous work on making registry sources read-only though some difficulties emerged. I would probably say we can do experiments on this!

  • allow anyone in possession of a crate file to validate the files

I doubt that this is the responsibility of cargo-vendor. Maybe some other command should take the place? Either way, reusing the logic is not a bad idea in fact.

BTW, "validation" here needs a clearer definition.

  • make the vendor directory just a directory of crates with no other bespoke functionality

This probably doesn't really solve the problem. As an end user of cargo-vendor I tell no difference. As a Cargo contributor we cannot ditch the bespoke checksum validation process, since most of the crates uploaded today have no such file.

@ds-cbo
Copy link

ds-cbo commented Nov 15, 2023

Got sent here by https://thomask.sdf.org/blog/2023/11/14/rust-without-crates-io.html
That article has some more information about this issue and how the workaround is (ab)used to resolve packaging issues.

@leighmcculloch
Copy link
Author

It's a wonderful work around. Given that the contents of the file is already optional, is it controversial to make the file optional?

@leighmcculloch
Copy link
Author

leighmcculloch commented Nov 17, 2023

  • What is the problem Cargo users bump into today with .cargo-checksum.json?

The problem is even though checksum verification of files in a DirectorySource folder is optional, developers are required to write mostly empty files into the folder. Verification of files in a DirectorySource folder is optional because Cargo does not require that all or any of the files located in the vendor crate directories to be included.

  • If it is a new feature, what does it actually improve in terms of user experience?

It allows developers to build a DirectorySource without needing to write one empty checksum file per crate.

  • How much does it cost? Is it painful?

Cost is minimal. The change only requires checking if the file exists before reading it, and using an empty default value instead of a loaded value. The change would be limited to one code location.

@juleskers
Copy link

juleskers commented Nov 18, 2023

Got sent here by https://thomask.sdf.org/blog/2023/11/14/rust-without-crates-io.html That article has some more information about this issue and how the workaround is (ab)used to resolve packaging issues.

Tl;Dr of the linked blog post, so we have all pertinent information in-thread:
Making the checksum file optional would remove a wart (creating an empty placeholder checksum file) for Linux distribution packagers (e.g. Debian).

Those distribution have policies that everything in the ecosystem must be buildable without internet access, so they do ecosystem wide patching to replace crates.io with a (single, os-wide shared) vendor folder.

Given that empty contents are currently accepted, making the entire file optional is not a functional change. instead it is merely taking the optionality of checksums all the way to its logical conclusion.

I am certain packagers would appreciate a way to generate a useful checksum file (e.g. via the suggested cargo vendor add <somecrate>.crate), but that is a different, larger feature.

@juleskers
Copy link

I did some digging to find the relevant parts of the code.

Checksum file is opened here:

let cksum_file = path.join(".cargo-checksum.json");
let cksum = paths::read(&path.join(cksum_file)).with_context(|| {
format!(
"failed to load checksum `.cargo-checksum.json` \
of {} v{}",
pkg.package_id().name(),
pkg.package_id().version()
)
})?;

DirectorySource seems pretty specific that checksums are required, making it weird that empty content is allowed.

/// * There must be a [`Checksum`] file `.cargo-checksum.json` file at the same
/// level of `Cargo.toml` to ensure the integrity when a directory source was
/// created (usually by `cargo vendor`). A failure to find or parse a single
/// checksum results in a denial of loading any package in this source.

From other comments in directory.rs, I get the impression that the empty-content allowed is an un-intentional edge-case of flexibly ignoring "other stuff" in the directory (e.g. Readmes, VCS folders, etc.)


Other random, relevant tidbits gleaned during my expedition:

  • Checksum technicals
    /// The checksum file to ensure the integrity of a package in a directory source.
    ///
    /// The file name is simply `.cargo-checksum.json`. The checksum algorithm as
    /// of now is SHA256.
    #[derive(Deserialize)]
    struct Checksum {
    /// Checksum of the package. Normally it is computed from the `.crate` file.
    package: Option<String>,
    /// Checksums of each source file.
    files: HashMap<String, String>,
    }
    • current checksum algorithm is SHA256
    • checksum for .crate file is intentionally optional (logical, given that .crate file doesn't exist if unpacked)
    • checksums for source-files seem to expected to be present.
  • I haven't found code that actually generates the checksum file
    • despite searching the entire rust-lang org for the filename.
  • Recalculating checksums seems to be a widely desired feature. On Zulip I found that Android, Debian and Fedora run into troubles with this checksum file, when patching rustc
    # The configure macro will modify some autoconf-related files, which upsets
    # cargo when it tries to verify checksums in those files.  If we just truncate
    # that file list, cargo won't have anything to complain about.
    find vendor -name .cargo-checksum.json \
       | -exec sed -i.uncheck -e 's/"files":{[^}]*}/"files":{ }/' '{}' '+'

@epage epage added the S-triage Status: This issue is waiting on initial triage. label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

5 participants