Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming the project root dir invalidates compilation cache #3273

Closed
alubbe opened this issue Nov 8, 2016 · 16 comments · Fixed by #4788
Closed

Renaming the project root dir invalidates compilation cache #3273

alubbe opened this issue Nov 8, 2016 · 16 comments · Fixed by #4788
Labels
A-cargo-api Area: cargo-the-library API and internal code issues A-rebuild-detection Area: rebuild detection and fingerprinting C-bug Category: bug Command-build

Comments

@alubbe
Copy link

alubbe commented Nov 8, 2016

I was surprised to learn that renaming the project root dir invalidates the compilation cache. This is something that happens during every CI and deployment run, making it run quite a bit slower when none of the rust code was actually changed. Is this intentional / is there a way to keep the cache fresh when renaming the directory?

The behaviour can be reproduced as follows (stable 1.12.1):

/$ cargo new rrrrrr
     Created library `rrrrrr` project
/$ cd rrrrrr/
/rrrrrr$ cargo build
   Compiling rrrrrr v0.1.0 (file:///rrrrrr)                            <-- Code gets compiled
    Finished debug [unoptimized + debuginfo] target(s) in 0.11 secs 
/rrrrrr$ cargo build
    Finished debug [unoptimized + debuginfo] target(s) in 0.0 secs     <-- Code does not re-compile
/rrrrrr$ cargo build
    Finished debug [unoptimized + debuginfo] target(s) in 0.0 secs     <-- Code does not re-compile
/rrrrrr$ cd ..
/$ mv rrrrrr/ rrrrrr2/
/$ cd rrrrrr2/
/rrrrrr2$ cargo build
   Compiling rrrrrr v0.1.0 (file:///rrrrrr2)                           <-- Code re-compiles
    Finished debug [unoptimized + debuginfo] target(s) in 0.11 secs
@alexcrichton
Copy link
Member

I believe Cargo stores absolute path names in various places of its crate cache, which is likely leading to this.

@alubbe
Copy link
Author

alubbe commented Nov 8, 2016

That sounds like a likely source. In what cases wouldn't it be preferable to work with relative paths (to the directory containing Cargo.toml)?

@alexcrichton
Copy link
Member

Ideally, yeah, although I don't think this would be easy to do at all in Cargo, the absolute paths are plumbed pretty far throughout the whole system. If possible I'd recommend getting the same directory working on CI (like Travis)

@KalitaAlexey
Copy link
Contributor

Dear @alexcrichton,
Is incremental compilation data for the project sources stored inside the project dir?
I think it should use relative paths for these cases.
The cache also will be invalidated if you move the project dir to another place. I don't like it.
Is a fix hard enough?

@alexcrichton
Copy link
Member

The problem here is that we store everything as an absolute path in memory, and then that gets hashed and/or serialized to disk. Changing the in-memory representation is likely to be difficult, so is changing how it's serialized/hashed. Unfortunately the fix here will probably be hard.

@alubbe
Copy link
Author

alubbe commented Nov 9, 2016

Isn't it a matter of changing the places that pass the absolute path to the hashing function, so that they pass the relative path instead? Are there a lot of those?

@alexcrichton
Copy link
Member

Patches are of course always welcome! I probably don't personally have time to work on this, but I can help out if needed.

@alubbe
Copy link
Author

alubbe commented Nov 9, 2016

I might not have time this week, either, but I'd love to have a deeper understanding of what changes are involved and how much work it would be. I've done some searching around the repo to figure out where this hash is generated - if you could point in the right direction of the relevant files/code bits, that would super helpful!

@alexcrichton
Copy link
Member

A PackageId ends up being hashed a bunch of places, which transitively contains a SourceId, containing a Url, representing a Path. That's probably absolute.

@alubbe
Copy link
Author

alubbe commented Nov 13, 2016

I took a look at the source code and I believe these are the relevant places. I tested it locally by hard-coding fixed strings/paths at these places, but a real implementation would replace these absolute paths with relative ones that do not change when the whole directory is moved.
https://github.com/alubbe/cargo/blob/master/src/cargo/core/source.rs#L441
https://github.com/alubbe/cargo/blob/master/src/cargo/core/package.rs#L59
https://github.com/alubbe/cargo/blob/master/src/cargo/core/manifest.rs#L183

However, that only solves the first fingerprint. cargo will then check the fingerprint of src/lib.rs, but it expects to find it at the old, absolute path. Could someone help me find out where that happens and also, how to convert these all these absolute paths to relative ones?

@alexcrichton
Copy link
Member

I believe Target::src_path is relative (or we could easily make it relative), Package::manifest_path AFAIK doesn't make its way to fingerprints, and then SourceId::url is the hard one to deal with.

Fingerprints being expected at absolute paths happens here -- https://github.com/alubbe/cargo/blob/f6500c6b6480c89492773bff6d41c4abf6c5ee15/src/cargo/ops/cargo_rustc/fingerprint.rs#L672-L687

Which... makes me think that this may almost not ever work?

@matklad
Copy link
Member

matklad commented Nov 16, 2016

In intelliJ there's also a problem of project local paths. It's solved in a hackish, but simple and transparent way: in memory, all paths are absolute. During serialization, any string which contains the absolute path to the project is replaced with %PROJECT_DIR%, and during deserialization it is restored back to the current project directory.

@alexcrichton
Copy link
Member

@matklad that seems like a great solution!

@alubbe
Copy link
Author

alubbe commented Nov 21, 2016

IIUC we can re-use that approach so that not only is the absolute path replaced before serialization, but also before hashing (in the places I mentioned before).
@matklad what are the chances you could contribute your solution as a PR to cargo?

@matklad
Copy link
Member

matklad commented Nov 21, 2016

@alubbe I think I'll be able to tackle this eventually (~ by the end of the year?), but at present I don't have that much time, and for me personally #3221 and #3194 are of higher priority :)

@whitequark
Copy link
Member

One related issue is that build scripts may capture an absolute path, and are not currently scheduled to re-run if the project root directory changes. I've just hit this.

@carols10cents carols10cents added A-cargo-api Area: cargo-the-library API and internal code issues A-rebuild-detection Area: rebuild detection and fingerprinting C-bug Category: bug Command-build labels Sep 29, 2017
bors added a commit that referenced this issue Dec 12, 2017
Avoid rebuilding a project when cwd changes

This commit is targeted at solving a use case which typically comes up during CI
builds -- the `target` directory is cached between builds but the cwd of the
build changes over time. For example the following scenario can happen:

1. A project is compiled at `/projects/a`.
2. The `target` directory is cached.
3. A new build is started in `/projects/b`.
4. The previous `target` directory is restored to `/projects/b`.
5. The build start, and Cargo rebuilds everything.

The last piece of behavior is indeed unfortunate! Cargo's internal hashing
currently isn't that resilient to changing cwd and this PR aims to help improve
the situation!

The first point of too-much-hashing came up with `Target::src_path`. Each
`Target` was hashed and stored for all compilations, and the `src_path` field
was an absolute path on the filesystem to the file that needed to be compiled.
This path then changed over time when cwd changed, but otherwise everything else
remained the same!

This commit updates the handling of the `src_path` field to simply ignore it
when hashing. Instead the path we actually pass to rustc is later calculated and
then passed to the fingerprint calculation.

The next problem this fixes is that the dep info files were augmented after
creation to have the cwd of the compiler at the time to find the files at a
later date. This, unfortunately, would cause issues if the cwd itself changed.
Instead the cwd is now left out of dep-info files (they're no longer augmented)
and instead the cwd is recalculated when parsing the dep info later.

The final problem that this commit fixes is actually an existing issue in Cargo
today. Right now you can actually execute `cargo build` from anywhere in a
project and Cargo will execute the build. Unfortunately though the argument to
rustc was actually different depending on what directory you were in (the
compiler was invoked with a path relative to cwd). This path ends up being used
for metadata like debuginfo which means that different directories would cause
different artifacts to be created, but Cargo wouldn't rerun the compiler!

To fix this issue the matter of cwd is now entirely excluded from compilation
command lines. Instead rustc is unconditionally invoked with a relative path
*if* the path is underneath the workspace root, and otherwise it's invoked as an
absolute path (in which case the cwd doesn't matter).

Once all these fixes were added up it means that now we can have projects where
if you move the entire directory Cargo won't rebuild the original source!

Note that this may be a bit of a breaking change, however. This means that the
paths in error messages for cargo will no longer be unconditionally relative to
the current working directory, but rather relative to the root of the workspace
itself. Unfortunately this is moreso of a feature right now rather than a bug,
so it may be one that we just have to stomach.

Closes #3273
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cargo-api Area: cargo-the-library API and internal code issues A-rebuild-detection Area: rebuild detection and fingerprinting C-bug Category: bug Command-build
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants