Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to clean up incremental compilation artifacts should not be a hard error #57958

Open
Manishearth opened this issue Jan 28, 2019 · 18 comments
Labels
A-incr-comp Area: Incremental compilation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Manishearth
Copy link
Member

Manishearth commented Jan 28, 2019

error: Failed to delete invalidated or incompatible incremental compilation session directory contents `/home/manishearth/mozilla/servo/target/debug/incremental/script-23kpbvowy6y9i/s-f8z0p73x3w-17kitl0-working/dep-graph.bin`: No such file or directory (os error 2).

This is basically rustc attempting to delete a file that no longer exists. This breaks the build, perhaps it should be a warning instead?

(I think you can get this kind of error if you keep around target dirs post rustup.)

@Manishearth Manishearth added the A-incr-comp Area: Incremental compilation label Jan 28, 2019
@Centril Centril added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 28, 2019
@Centril
Copy link
Contributor

Centril commented Jan 28, 2019

r? @Zoxc

@Zoxc
Copy link
Contributor

Zoxc commented Jan 30, 2019

cc @michaelwoerister

@michaelwoerister
Copy link
Member

Yeah, that could be handled more robustly. Especially if the error indicates that the file does not exist.

However, this is a strange case. The only code path triggering this just tries to delete all files in the directory:

pub fn delete_all_session_dir_contents(sess: &Session) -> io::Result<()> {
let sess_dir_iterator = sess.incr_comp_session_dir().read_dir()?;
for entry in sess_dir_iterator {
let entry = entry?;
safe_remove_file(&entry.path())?
}
Ok(())
}

And the code even double-checks if that the file exists before trying to delete it. Is there some other process messing the incr. comp. dir?

@Manishearth
Copy link
Member Author

I don't think so, I've turned off RLS in my editor because it's too much overhead for Servo, so there shouldn't be.

I think the directory may not have existed, I don't recall what the precise setup was (and that target dir is long gone now)

@michaelwoerister
Copy link
Member

How often does this occur? I've never seen it.

@Manishearth
Copy link
Member Author

Manishearth commented Feb 1, 2019 via email

@cdetrio
Copy link

cdetrio commented Feb 11, 2019

I ran into this, and I was using rustup to switch rustc versions and rebuild a project. Also on a dep-graph.bin file in a directory that didn't exist.

@Marwes
Copy link
Contributor

Marwes commented Mar 26, 2019

Seeing this frequently when building inside docker on travis https://travis-ci.org/gluon-lang/gluon-lang.org

Marwes added a commit to Marwes/rust that referenced this issue Mar 26, 2019
@mati865
Copy link
Contributor

mati865 commented Mar 27, 2019

Which systems are you using?
Maybe https://doc.rust-lang.org/stable/std/fs/fn.canonicalize.html is broken on some systems?

@Marwes
Copy link
Contributor

Marwes commented Mar 27, 2019

Centril added a commit to Centril/rust that referenced this issue Mar 27, 2019
fix: Make incremental artifact deletion more robust

Should fix the intermittent errors reported in rust-lang#57958

cc rust-lang#48614
cuviper added a commit to cuviper/rust that referenced this issue Mar 28, 2019
fix: Make incremental artifact deletion more robust

Should fix the intermittent errors reported in rust-lang#57958

cc rust-lang#48614
@x46085
Copy link

x46085 commented Nov 29, 2019

@Manishearth @michaelwoerister I see this issue is still open; did anyone ever find a workaround for this? This error breaks my CI consistently, having to do with the base image for my builds having several pre-built crates on the image. I saw a few references to setting MAKE_JOBS=1 and MAKE_JOBS_SAFE=no but I'm not sure exactly where / how I could set that before running cargo build. Appreciate any thoughts you may have.

@Manishearth
Copy link
Member Author

I haven't seen the error in a while

@michaelwoerister
Copy link
Member

I thought that #59449 fixed this issue.

@hugwijst
Copy link
Contributor

I think I see how this error could still be thrown. Working backwards:

  1. The call to delete_all_session_dir_contents returns a file not found error as the directory containing dep-graph.bin doesn't exist when trying to iterate over its entries.
  2. This would mean that open is called on a LoadResult::DataOutOfDate.
  3. The only way to get a LoadResult is by calling load_data.
  4. There a delayed call to load_data in load_dep_graph which bubbles up a DataOutOfDate return.
  5. load_dep_graph is called from the Query returned by dep_graph_future.
  6. That query gets executed in the Query returned by dep_graph, which directly calls open on the LoadResult.

Never in this whole sequence is there a check to see if the directory containing dep-graph.bin still exists. I think filtering the result of the directory iterator outputs for FileNotFound errors probably would fix this issue, but I've not been able to reproduce it.

This leaves the question of why the directory would not exist in the first place...

x46085 added a commit to m10io/rust that referenced this issue Jan 31, 2020
@xMAC94x
Copy link

xMAC94x commented Jun 12, 2020

@Manishearth @michaelwoerister I see this issue is still open; did anyone ever find a workaround for this? This error breaks my CI consistently, having to do with the base image for my builds having several pre-built crates on the image. I saw a few references to setting MAKE_JOBS=1 and MAKE_JOBS_SAFE=no but I'm not sure exactly where / how I could set that before running cargo build. Appreciate any thoughts you may have.

Hi Daniel,
interestingly i also have the error in my CI. do you by accident know if your CI is run on a server with ZFS as filesystem?
For our CI it only seems to fail on those.

@x46085
Copy link

x46085 commented Jun 12, 2020

Hi Daniel,
interestingly i also have the error in my CI. do you by accident know if your CI is run on a server with ZFS as filesystem?
For our CI it only seems to fail on those.

Doesn't appear so (we're on GCP), although we switched to Rook recently so perhaps our fs type changed with that:
image

Other factors to compare might be we are using Drone for CI, Skaffold + Helm for build & deploy into GKE

Also we've not seen this issue since we made this patch, so unsure if it's gone away:
m10io@556f9ac

p.s. we've been experimenting with Docker build bind mounts, which potentially remove the need for our pre-baked build image, and have confirmed this is not an issue of you go this route:

# syntax = docker/dockerfile:1-experimental
FROM rust:1.43-slim-buster as builder

RUN rm -f /etc/apt/apt.conf.d/docker-clean && \
    echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,id=dvca,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,id=dvla,target=/var/lib/apt,sharing=locked \
    apt update && \
    apt install -y build-essential libssl-dev cmake pkg-config lld

hope that is helpful

@xMAC94x
Copy link

xMAC94x commented Jun 15, 2020

Hi, indeed that is helpful.
We are running Gitlab+Gitlab Runners with a custom docker image. The docker image is the same on every runner, still only 2 runners from me, using Proxmox + ZFS + Docker in VM having this error sporadically:
Example: https://gitlab.com/veloren/veloren/-/jobs/594860738

 Using docker image sha256:5ebf5da012391d1556e312fb199d7172c202c7d6b51bac71d423ff7d4a64645e for registry.gitlab.com/veloren/veloren-docker-ci:latest ...

ln -s /dockercache/cache-all target
$ cargo test
   Compiling veloren-common v0.6.0 (/builds/veloren/veloren/common)
   Compiling veloren-voxygen v0.6.0 (/builds/veloren/veloren/voxygen)
   Compiling veloren_network v0.1.0 (/builds/veloren/veloren/network)
   Compiling veloren-world v0.6.0 (/builds/veloren/veloren/world)
   Compiling veloren-client v0.6.0 (/builds/veloren/veloren/client)
   Compiling veloren-server v0.6.0 (/builds/veloren/veloren/server)
   Compiling veloren-chat-cli v0.6.0 (/builds/veloren/veloren/chat-cli)
error: Failed to delete invalidated or incompatible incremental compilation session directory contents `/dockercache/cache-all/debug/incremental/veloren_voxygen-29cc5aon4ib6e/s-foe120g0hi-17gfq9r-working/dep-graph.bin`: No such file or directory (os error 2).
error: aborting due to previous error
error: could not compile `veloren-voxygen`.
To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed

Interestingly you can get our docker image and see that the file does not exist!
It's not there, but the error only occurs sporadically on 2 of 8 hosts and even there in prob 30%

We are linking our cache docker container into the CI execution. Docker bind mounts might be feasible for us in the future, lets see.

@shebang89
Copy link

Same experience here with Jenkins and building inside a Docker container... it happens totally randomly.
Had to put CARGO_INCREMENTAL=0 in front of cargo commands to disable caching. CI pipelines take forever now but at least there are no surprises anymore.

Hopefully someone turns this into a warning or something like that. Soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests