Consider making the `src` cache read-only. #9455

ehuss · 2021-05-03T20:45:10Z

Registry dependencies get extracted into cargo home (in the src directory) with whatever metadata is in the tar file. One issue with this is that the files are usually writeable. This can cause a problem if the user accidentally modifies these files, which breaks cargo's assumption that they are immutable and reusable. One way this can happen is that in some editors, when there is an error or warning, they may open those files to display the error/warning (particularly with macros). The user may not realize that this is from a remote location, and may not understand the consequence of making changes.

We may want to consider making those files read-only when extracting them. This would help with confusing situations where the src cache is inadvertently corrupted.

This would not protect from general filesystem corruption, which is also an issue. This is also an issue with git dependencies, which may be more difficult to adjust permissions on.

There is some risk that this would introduce new problems. For example, if people are using tools to clean the src directory, and those tools have trouble with read-only files.

The text was updated successfully, but these errors were encountered:

Eh2406 · 2021-10-13T15:06:04Z

@matthiaskrgr would making files read-only give any problems to cargo-cache?

matthiaskrgr · 2021-10-13T17:45:29Z

Oh that's an interesting idea!
While I fully agree that it makes sense to make the cache (or at least .cargo/registry/src and .cargo/git/checkout) write-only, it does indeed prevent cargo-cache from cleaning up anything;

$ cargo cache --autoclean
Clearing cache...

Warning: failed to recursively remove directory "/home/matthias/.cargo/registry/src".
error: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }
Warning: failed to recursively remove directory "/home/matthias/.cargo/git/checkouts".
error: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }

BUT I am already happy that it does not panic :D
If it turned out that cargo-cache does modify the cache in any way (anything that goes beyond just removing whole files) I would indeed consider this a bug.

Right now I am just running remove_dir_all::remove_dir_all() on directories without doing any checks whatsoever.
https://crates.io/crates/remove_dir_all
I never had a read-only cache in mind when writing cargo-cache but adding a little bit of extra logic which tries to change file permission to read-write before removing them does not sound too difficult, assuming that we have a crate around that handles file permissions on linux, macos and windows.

eminence · 2022-05-25T18:42:15Z

I'm bringing up a new rust user on my team, and I found that they ended up editing some files in $CARGO_HOME (which is actually quite easy to do with VSCode+RA) to add some debugging messages. A read-only src cache might have given them a sufficient hint that what they were doing was not the right approach.

weihanglo · 2022-06-01T01:45:15Z

Just did a PoC and benchmarks with rust-lang/cargo itself. It turns out that there is a slight performance hit (~5%) on macOS. It's in expectation, as there are some extra permission operations on every file for the first time checking out/unpacking a crate (Windows might do it better since file permissions can be inherited, though).

However, it does get new problems 🤯. When I removed sources residing in ~/.cargo/registry/src and re-run the build again, I got a permission denied error.

$ ./target/release/cargo b
$ rm -rf ~/.cargo/registry/src
$ ./target/release/cargo b
error: failed to run custom build command for `curl-sys v0.4.55+curl-7.83.1`

Caused by:
  process didn't exit successfully: `/Users/myuser/projects/cargo/target/debug/build/curl-sys-f4af01668b6e1680/build-script-build` (exit status: 101)
  --- stdout
  ...
  --- stderr
  fatal: not a git repository (or any of the parent directories): .git
  thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }', /Users/myuser/.cargo/registry/src/github.com-1ecc6299db9ec823/curl-sys-0.4.55+curl-7.83.1/build.rs:93:10
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I guess its curl-sys's build script that uses fs::copy to copy contents from registry/src to build script's OUT_DIR, but unfortunately fs::copy also copies permission bits. Any rebuild that triggers a re-run of the same build-script binary might fail because it cannot fs::copy from cache into existing readonly files under target-dir. I am afraid that more cases in the wild will be broken if registry/src becomes readonly. Here is the search result that a build.rs contains fs::copy throughout GitHub.

Personally, I love to see this enhancement landed, but it seems a bit risky to me after the analysis.

ehuss · 2022-06-01T03:12:54Z

That's really unfortunate, I hadn't thought about that.

I'm wondering if a compromise would be to only adjust permissions on packages that don't have a build script? I don't know if it would be easy to detect that (that might require parsing Cargo.toml, which might be too much overhead). I also don't know what % of packages that would cover.

It's also unfortunate that the tar API doesn't allow us to override the permissions directly. Another option is to make the files readonly when creating the .crate file. That would only help with newly published packages, but might be an option to consider.

nagisa · 2022-06-05T20:58:46Z

Given that most of these cases would be caught by crater, maybe its worthwhile to just make the change and make sure crates are fixed up?

Alternatively: maybe only apply this for crates published after a certain timestamp?

ehuss · 2022-06-10T20:50:56Z

@weihanglo Asked about how to do a crater run to test the impact of changing the readonly status. The rough steps are:

In your clone of cargo, make the changes to incorporate the new behavior.
Get a clone of https://github.com/rust-lang/rust/
Check out a branch to add your changes.

Modify .gitmodules to point to your clone and branch of cargo with the changes you want to test. For example:

git submodule set-url src/tools/cargo https://github.com/ehuss/cargo.git
git submodule set-branch --branch my-awesome-feature src/tools/cargo
git submodule update --remote src/tools/cargo
git add .gitmodules src/tools/cargo
git commit

Push the changes to GitHub and make a PR . Clearly label the PR as an experiment, and assign yourself or @ghost.
Make a "try" build. I think all cargo members have this permission. Write a comment @bors try.
After the try build finishes (an hour? I forget), ask someone to make a crater run. The Cargo team does not have that permission, so just ask someone on Zulip.

I think doing a "check" run should be sufficient? I'm concerned that by default it will not get dev-dependencies, so maybe add the --all-targets flag? I'm not sure if that's worth it. There's a bunch of documentation at https://github.com/rust-lang/crater/blob/master/docs/bot-usage.md on crater commands. You may also consider doing a top-100 run first just to make sure the experiment is configured properly.

I still feel that it might be good to ease into the change, such as only marking read-only if there are no build scripts. We discussed some other options to consider:

Adding some kind of option so that build-scripts can say "don't mark me read-only" or "don't mark these directories read-only".
Make all packages read-only on an edition boundary. This wouldn't be able to support automatic migrations, which conflicts with the spirit of Editions being easy to transition.

Doing a crater run can maybe give us a sense of the scale of the problem.

kornelski · 2022-06-12T23:06:05Z

This could be limited to only *.rs files, leaving other data files with a writeable bit.

bjorn3 · 2022-06-13T04:59:05Z

Cargo.{toml,lock} should probably also be read-only.

…ources-dir, r=epage Improve CI caching by skipping mtime checks for paths in $CARGO_HOME Skip mtime checks for paths pointing into `$CARGO_HOME` to avoid rebuilds when only caching $CARGO_HOME/registry/{index, cache} and $CARGO_HOME/git/db and some of the dependencies have `rerun-if-changed=directory` in their `build.rs` I considered skipping mtime checking only on `$CARGO_HOME/registry/src` but it looks like other functionality (like downloading a newer version of dependency) is unaffected by this and this way we also cover the same issue with git based dependencies (except the part where `cargo` is forced to re-fetch submodules if `$CARGO_HOME/git/checkouts` is missing) and it is more in line with the discussion in #9455 Fix #11083 Credit `@weihanglo` for the test (I did add a case of checking that dependency update still triggers a rebuild but they are the original author of the rest of the test)

torhovland · 2023-05-15T09:22:32Z

As suggested here, a possible alternative is to compute a hash when downloading a dependency, and to verify that hash whenever Cargo needs to access that dependency.

matklad · 2023-08-23T10:34:06Z

Somewhat related, we might consider sanitizing the executable bit as well, if we consider distribution of compiled executables via crate tarballs an unsupported use-case.

wizeguyy · 2023-11-01T16:32:12Z

If these files are made readonly, what would the process be for updating dependencies in the cache? i.e. if cargo update detects a new version of a crate in the cache, it will need write permissions to update that crate.

Will cargo temporarily modify the file permissions before performing an update? On unix systems it might make more sense for cargo to run as its own user. Other users have read-only access, but the cargo user has write access. Thoughts?

bjorn3 · 2023-12-04T20:15:40Z

The umask is process global, but someone may be creating files concurrently with cargo when using cargo as library. Also for directories setting the umask I believe will require using chmod anyway to make them writable again while creating files inside the directory. And on Unix I'm not too worried about a single extra chmod per file and directory. On Windows I would be more concerned, but the Windows kernel doesn't have anything like umask afaik. The CRT supports the function, but I wouldn't be surprised if it internally calls the equivalent of chmod after every file creation.

kornelski · 2023-12-05T03:00:00Z

This could be opt-in per crate. Cargo currently makes tarballs with -rw-r--r-- permissions (ignores original files' permissions). Cargo could be taught to make tarballs with -r--r--r-- permissions, and preserve that when unarchiving.

sanmai-NL · 2023-12-05T07:38:08Z

Just an idea here off the top of my head, but why extract source archives at all? Can't we get their contents as needed, in-memory? That may sound costly but I wonder how often that really happens and if extracting them amortizes well (the extra cost being the filesystem operations that notably result in syscalls). That would reduce the number of files to be created and changed, so even particular file metadata like immutable flag can be set at the cost of $O(n)$ instead of $O(nm)$ time in the average case, where $n$ is the number of crates (tar archives) and $m$ is the number of files per crate.

sanmai-NL · 2023-12-05T07:43:55Z

This could be opt-in per crate. Cargo currently makes tarballs with -rw-r--r-- permissions (ignores original files' permissions). Cargo could be taught to make tarballs with -r--r--r-- permissions, and preserve that when unarchiving.

@kornelski

https://docs.rs/tar/latest/tar/struct.Entry.html#method.set_mask

bjorn3 · 2023-12-05T12:27:16Z

Can't we get their contents as needed, in-memory?

Rustc currently doesn't support that. We could add it, but build scripts and proc macros will still need it extracted somewhere on the filesystem.

mathstuf · 2023-12-05T14:42:42Z

Can't we get their contents as needed, in-memory?

Tools communicate by filesystem paths, so how do you propose to send an in-memory "path" to sccache for caching? Or coverage tools that render the coverage data with the source code? That's even if we do manage to ignore build.rs scripts that do…whatever they do. Without some cross-platform FUSE-like solution that exposes such paths to arbitrary other tools, this would restrict the current set of use cases. Maybe that's fine and can be opt-in, but opt-out is, I believe, only going to continue finding more and more corner cases.

kornelski · 2023-12-05T15:07:25Z

The invention of not-a-tarball crate storage system is irrelevant here, because the problem is crates relying on files existing in the file system, and existing with specific permissions. If you succeed in removing the files from src dir, you'll break them even worse.

sanmai-NL · 2023-12-05T15:24:14Z

Can't we get their contents as needed, in-memory?

Tools communicate by filesystem paths, so how do you propose to send an in-memory "path" to sccache for caching? Or coverage tools that render the coverage data with the source code? That's even if we do manage to ignore build.rs scripts that do…whatever they do. Without some cross-platform FUSE-like solution that exposes such paths to arbitrary other tools, this would restrict the current set of use cases. Maybe that's fine and can be opt-in, but opt-out is, I believe, only going to continue finding more and more corner cases.

Does sccache cache source code and would that be important for Rust? The challenge would be more to let rustc compile tar archives, no?

sanmai-NL · 2023-12-05T15:25:31Z

The invention of not-a-tarball crate storage system is irrelevant here, because the problem is crates relying on files existing in the file system, and existing with specific permissions. If you succeed in removing the files from src dir, you'll break them even worse.

What is your opinion on this breakage? Would it preclude moving forward with a more performant design in any case?

bjorn3 · 2023-12-05T15:27:09Z

Sccache doesn't cache source code. It reads the .d file emitted by rustc to know when it needs to recompile. If rustc were to compile tarballs, this .d file would simply mention the tarball and thus sccache would work as expected. For something entirely in-memory passed in through eg an fd or unix socket, that wouldn't work though.

sanmai-NL · 2023-12-05T15:29:26Z

Let's also discuss differentation between crates. Crates without build scripts were mentioned. But only-binary or only-library is also relevant. Binary-only crates's source as some specific version needs to be only read very infrequently compared to libraries, especially popular libraries. So for binary-only, non-buildscript crates you'd most want to avoid extracting the tar archive (even though they may be smaller, in comparison, perhaps) and perhaps doing other filesystem operations such as setting permissions.

sanmai-NL · 2023-12-05T15:32:47Z

Sccache doesn't cache source code. It reads the .d file emitted by rustc to know when it needs to recompile. If rustc were to compile tarballs, this .d file would simply mention the tarball and thus sccache would work as expected. For something entirely in-memory passed in through eg an fd or unix socket, that wouldn't work though.

Yes, and the latter would probably require much more significant architecture reengineering than supporting tar-like archives directly.

sanmai-NL · 2023-12-05T15:34:14Z

Let's also discuss differentation between crates. Crates without build scripts were mentioned. But only-binary or only-library is also relevant. Binary-only crates's source as some specific version needs to be only read very infrequently compared to libraries, especially popular libraries. So for binary-only, non-buildscript crates you'd most want to avoid extracting the tar archive (even though they may be smaller, in comparison, perhaps) and perhaps doing other filesystem operations such as setting permissions.

And this assuming that no incremental compilation cache is available. In my experience one sometimes cleans this cache. But not too often I suppose.

kornelski · 2024-05-04T20:09:21Z

Can this be done for 2024 edition crates?

Cargo.toml is usually at the very beginning of tarballs, so it's relatively cheap to check edition of a .crate file before decompressing it for real to the file system.

epage · 2024-05-04T20:40:29Z

Independent of the question of whether this can be tied to an Edition, they asked for Edition 2024 changes to have their PRs posted by May 1st. There is a slim chance something small can be rushed in late but members of the Cargo team have already been exhausting themselves to meet the May 1st date with what was already committed.

HKalbasi · 2025-01-17T15:00:28Z

An alternative to making the registry read only is to detect changes (for example by looking at mtime) and warn user and download/unpack the crate again. Would it solve the problem?

bjorn3 · 2025-01-17T15:06:19Z

That would be slower as you did have to check it on every build rather than set the readonly flag once, right?

HKalbasi · 2025-01-17T15:10:31Z

I think we can just check the mtime of the root directory, and check mtime of individual crates only when the root one is wrong. So it would be a O(1) check in the happy path.

bjorn3 · 2025-01-17T15:19:19Z

The mtime of the root directory is not updated when a file inside a subdirectory changes.

weihanglo · 2025-01-17T15:26:01Z

I think we can just check the mtime of the root directory,

Also note that Cargo doesn't pack directories into .crate tarball. As a result, when unpacking, directories are always recreated.

kornelski · 2025-01-17T15:38:25Z

The check could be asynchronous, running in parallel with the build.

epage · 2025-01-17T15:44:05Z

Going back a bit to the discussion for the PoC

I'm wondering if a compromise would be to only adjust permissions on packages that don't have a build script? I don't know if it would be easy to detect that (that might require parsing Cargo.toml, which might be too much overhead). I also don't know what % of packages that would cover.

With #13713, we can parse Cargo.toml and check if package.build = false and mark them read-only in that case. We don't even need to do a full manifest parse (including target discovery), just use cargo-util-schemas or even a one-off type for accessing that field (which carries risk of getting out of sync with the rest of the code).

Considering this is an extraction-time cost, I suspect the overhead for that minimal parse would be acceptable. If it is a high cost, we could put it in the Index.

This won't help with packages published with a Cargo older than 1.80 but over time the coverage will improve. We'd already be adopting a limited approach anyways as packages with build scripts would be skipped though #14948 has ways to help reduce the presence of those.

epage · 2025-01-17T15:45:08Z

If we add a check, I would expect it would be in a health check command, like #9341.

weihanglo · 2025-01-17T15:46:57Z

Not sure if it makes sense. Since we have -Zchecksum-freshness already, and we also plan to deal with build script checksum freshness, I wonder this registry cache and build script checksum may be able to share a pretty similar mechanism/implementation.

Also, this check mechanism could be an opt-in option under either cargo configuration, or in [lints.cargo] table. Performance is always a concern, and we do need an option for disabling it for CI builds if we decide to do this.

ehuss added the A-caching Area: caching of dependencies, repositories, and build artifacts label May 3, 2021

ehuss mentioned this issue Feb 27, 2022

Weird error with cargo #10418

Closed

ehuss mentioned this issue Jun 11, 2022

Prevent editing of cargo-private registry files rust-lang/rust-enhanced#469

Closed

weihanglo mentioned this issue Jun 22, 2022

[Experiment] set .cargo/registry/src as readonly rust-lang/rust#98370

Closed

weihanglo mentioned this issue Sep 26, 2022

allow .cargo-checksum.json to be missing #11063

Open

repi mentioned this issue Oct 1, 2022

Rust wants EmbarkStudios/rust-ecosystem#84

Open

35 tasks

weihanglo mentioned this issue Dec 23, 2022

Cargo unpacks files with too restrictive mode (breaking multi-user shared cargo registry) #3442

Open

weihanglo mentioned this issue Jan 18, 2023

Cargo should skip rerun-if-changed paths to files in published crate sources #11083

Closed

arriven mentioned this issue Jan 22, 2023

Improve CI caching by skipping mtime checks for paths in $CARGO_HOME #11613

Merged

weihanglo mentioned this issue Jun 13, 2023

cargo vendor modifies the package cache without locking #12254

Closed

tamird mentioned this issue Jul 17, 2023

fix: only skip mtime check on ~/.cargo/{git,registry} #12369

Merged

This was referenced Aug 16, 2023

Hold the mutate exclusive lock when vendoring #12509

Merged

cargo vendor --versioned-dirs doesn't notice that local vendored content has diverged from crate registry #11897

Open

weihanglo mentioned this issue Oct 26, 2023

Cargo and/or other tools add Cargo.lock files and other build artifacts to registry #5956

Closed

weihanglo mentioned this issue Mar 23, 2024

Package files are randomly filled with \u{0} #10689

Closed

weihanglo mentioned this issue Jun 24, 2024

Use of patch in Cargo.toml to override dependency requires .cargo/registry to be cleared when patch removed. #11778

Closed

fenollp mentioned this issue Aug 9, 2024

Compile a crate from its source archive directly rust-lang/rust#128884

Open

weihanglo mentioned this issue Oct 12, 2024

Documented CARGO_HOME cache paths may result in rebuilding git dependencies #8845

Closed

weihanglo mentioned this issue Jan 17, 2025

Detect and warn against changes in ~/.cargo/registry #15078

Closed

weihanglo marked this as a duplicate of #15078 Jan 17, 2025

Consider making the src cache read-only. #9455

Consider making the src cache read-only. #9455

Comments

ehuss commented May 3, 2021

Eh2406 commented Oct 13, 2021

matthiaskrgr commented Oct 13, 2021

eminence commented May 25, 2022

weihanglo commented Jun 1, 2022

ehuss commented Jun 1, 2022

nagisa commented Jun 5, 2022

ehuss commented Jun 10, 2022 • edited Loading

kornelski commented Jun 12, 2022

bjorn3 commented Jun 13, 2022

torhovland commented May 15, 2023

matklad commented Aug 23, 2023

wizeguyy commented Nov 1, 2023

bjorn3 commented Dec 4, 2023

kornelski commented Dec 5, 2023

sanmai-NL commented Dec 5, 2023 • edited Loading

sanmai-NL commented Dec 5, 2023

bjorn3 commented Dec 5, 2023

mathstuf commented Dec 5, 2023

kornelski commented Dec 5, 2023

sanmai-NL commented Dec 5, 2023

sanmai-NL commented Dec 5, 2023

bjorn3 commented Dec 5, 2023

sanmai-NL commented Dec 5, 2023 • edited Loading

sanmai-NL commented Dec 5, 2023

sanmai-NL commented Dec 5, 2023

kornelski commented May 4, 2024

epage commented May 4, 2024

HKalbasi commented Jan 17, 2025

bjorn3 commented Jan 17, 2025

HKalbasi commented Jan 17, 2025

bjorn3 commented Jan 17, 2025

weihanglo commented Jan 17, 2025

kornelski commented Jan 17, 2025

epage commented Jan 17, 2025

epage commented Jan 17, 2025

weihanglo commented Jan 17, 2025

Consider making the `src` cache read-only. #9455

Consider making the `src` cache read-only. #9455

ehuss commented Jun 10, 2022 •

edited

Loading

sanmai-NL commented Dec 5, 2023 •

edited

Loading

sanmai-NL commented Dec 5, 2023 •

edited

Loading