Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments with distributed compilation #249

Merged
merged 53 commits into from
Aug 14, 2018

Conversation

aidanhs
Copy link
Contributor

@aidanhs aidanhs commented May 17, 2018

An initial version that works and can actually be used to at least try it out.

@aidanhs aidanhs force-pushed the aphs-dist-sccache-1 branch 2 times, most recently from 22355ca to ef230a7 Compare July 30, 2018 22:27
Cargo.toml Outdated
@@ -45,7 +46,10 @@ rust-crypto = { version = "0.2.36", optional = true }
serde = "1.0"
serde_derive = "1.0"
serde_json = "1.0"
# TODO: publish to crates.io
sccache-dist = { path = "sccache-dist" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how you'd prefer to deal with this - I can publish and update this PR? Or you can do it next time you release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason you're adding this code as a separate crate? It's not really usable standalone, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to split things out into two separate binaries - the sccache-dist binary has a number of things (e.g. rouille) that sccache itself doesn't need to have compiled in.

It also makes the compile a bit more incremental. Are you thinking I could just have it be a separate binary within the same crate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a separate binary within the same crate is what I had envisioned. I get the compile argument (although incremental compilation should help with that nowadays), but managing crates.io releases for two intertwined crates like that is a PITA. (I've dealt with this with another project of mine.)

// With -fprofile-generate line number information is important, so don't use -P.
if !parsed_args.profile_generate {
cmd.arg("-P");
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With distributed compilation, line number information is always important. More generally, it's important if you want to use a preprocessed file as input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. This will regress the change in #208. Would it be possible to change things so that when distributed compilation is not enabled we can continue to use this?

Copy link
Contributor Author

@aidanhs aidanhs Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is disappointing, I should've done a bit more digging.

I'm not sure if it's better to say "distributed compilation may happen, your cache is completely invalid (because it's based on non-line-number-annotated source files)", or just preprocess a second time on-demand when sending for remote compilation. Looking at a reasonably large file of C++ code I have easily to hand, it's spending 10-20% of wall-clock time in preprocessing so running twice isn't ideal.

What about: if we might be distributing, do the full preprocess with line numbers but strip out the line numbers with a regexp when considering the cache key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds fine. Since users have to explicitly opt-in to using distributed compilation, I don't think it's unreasonable for them to get this behavior in that case. Running the preprocessor twice seems like too much overhead just for this purpose. We can make a note somewhere if necessary that cache hits using different source paths are not possible with distributed compilation enabled.

@@ -286,20 +309,110 @@ pub trait CompilerHasher<T>: fmt::Debug + Send + 'static
fn box_clone(&self) -> Box<CompilerHasher<T>>;
}

#[cfg(not(feature = "dist"))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is disabled by default, therefore so is distributed compilation

@aidanhs
Copy link
Contributor Author

aidanhs commented Aug 1, 2018

@luser this is ready for review - it adds the basic structure of two binaries for distributed compilation, the way they communicate and the initial support for C, C++ and Linux. Although this unconditionally shuffles some things around (e.g. the generation of compilation command rather than executing them), distributed compilation is hidden behind a feature flag.

Followups coming soon.


use errors::*;

// TODO: possibly shouldn't be public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with TODO comments, but I'd prefer if they either get removed or if you file follow-up issues and put the issue URL in the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of the TODOs will be gone by the final PR.

Ok(hex(m.finish().as_ref()))
}

fn hex(bytes: &[u8]) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something we could pull in from a crate, like the hex crate or similar.

Copy link
Contributor Author

@aidanhs aidanhs Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just stole it from src/util.rs, I can pull in the crate if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hah! If you move this code into the existing crate then you can just reuse that code, I guess. I'm certainly not going to hold you to a higher standard than the existing code. :)

.truncate(true)
.open("/tmp/toolchain_cache.tar")?;
create(file)?;
// TODO: after, if still exists, remove it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like something that should definitely be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refined caching and scheduling are a slightly later PR.

return Ok(strong_key)
}
debug!("Weak key {} appears to be new", weak_key);
// TODO: don't use this as a poor exclusive lock on this global file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be an exclusive lock? You've already got a mutex around the LRUDiskCache here. If not, using a tempfile would be much nicer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a tempfile (for now)

pub struct ClientToolchainCache {
cache_dir: PathBuf,
cache: Mutex<TcCache>,
// Local machine mapping from 'weak' hashes to strong toolchain hashes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some exposition on what weak and strong hashes are would be helpful. I gather that the strong hash is a SHA512 of the toolchain tar file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the explanation that 'weak' is just what sccache uses as part of the cache key.

Some(_) => {
warn!("Scheduler address configured but dist feature disabled, disabling distributed sccache");
Arc::new(dist::NoopClient)
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit has added in light of the new preprocessing rules, to use -P in more cases when it's possible.

// With -fprofile-generate line number information is important, so don't use -P.
if !parsed_args.profile_generate {
if !may_dist && !parsed_args.profile_generate {
cmd.arg("-P");
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this back in and explained why it's important for a distributed compile

@aidanhs
Copy link
Contributor Author

aidanhs commented Aug 4, 2018

sccache-dist has been moved back in-crate as a separate binary, enabled by the dist-server feature.

Are you ok to wait for the rest of my PRs for the TODOs to be resolved?

@luser
Copy link
Contributor

luser commented Aug 6, 2018

Are you ok to wait for the rest of my PRs for the TODOs to be resolved?

Yes. The separate crate part was the only thing I was really hung up on. I'll make sure I've given the rest of the code here a once-over today and we should be good to merge this.

Cargo.toml Outdated
[[bin]]
name = "sccache-dist"
required-features = ["dist-server"]
path = "sccache-dist/main.rs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unconventional to have source in a path that's not under src/, isn't it?

@@ -202,6 +202,23 @@ impl LruDiskCache {
})
}

/// Add a file by calling `with` with the open `File` corresponding to the cache at path `key`.
pub fn insert_with<K: AsRef<OsStr>, F: FnOnce(File) -> io::Result<()>>(&mut self, key: K, with: F) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure on the precise differences of this vs. insert_by. Can this just call insert_by, passing a closure that opens the file and calls with(f)?

@aidanhs
Copy link
Contributor Author

aidanhs commented Aug 8, 2018

Added two new commits, one to put sccache-dist in a conventional location (and remove explicit paths to the main.rs files), one to simplify insert_with (this had some minor knock-on effects in insert_by as we don't have the size immediately available).

I also took the opportunity to clean up a couple of things in add_file, as well as add a rel_to_abs_path helper that means we no longer need to store the absolute path in the LRU (since it can be derived from the key).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants