Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider caching file hashes in the sccache server #758

Open
froydnj opened this issue May 27, 2020 · 2 comments
Open

consider caching file hashes in the sccache server #758

froydnj opened this issue May 27, 2020 · 2 comments

Comments

@froydnj
Copy link
Contributor

froydnj commented May 27, 2020

...so that, particularly for Rust compilations, we don't waste a bunch of time re-hashing the same files over and over again.

I don't know how much this really helps, because it probably hurts a little bit (?) on single-shot builds where the server doesn't stay up very long (e.g. Firefox automation builds)...though maybe not having to touch the disk (or the kernel's file cache, or whatever) is a win overall. Also unsure how easy it is to arrange things so you don't wind up badly serializing the whole process on your hash cache; maybe the locking overhead would really not be that large.

@luser
Copy link
Contributor

luser commented May 28, 2020

From a cursory search there appear to be several concurrent hashtable crates out there, one of those might be suitable. Given that this is a cache there might be some special-purpose data structure that would work better, I don't know. Would you need to cache by (filename, mtime) for this to be correct? I assume your concern is mostly the time spent hashing rlib / rmeta files, right? The existing cargo fingerprinting there probably makes that less of a concern.

A data structure that allowed lock-free reads ought to keep the fast path fast, and given that the file hashing code is already async, the write path could be something like:

  1. Attempt to insert a "pending" entry (I'd probably use the read end of a channel).
    2a. If a pending entry already exists (some other thread racing), await it.
    2b. If not, insert the pending entry and kick off the hash calculation.
  2. When hash calculation finishes, swap out the pending entry for an actual calculated hash entry, and resolve the pending entry so anyone awaiting it is unblocked.

Honestly if you get updated to the latest tokio and whatnot you could likely define a better threading model for sccache where most of the server code always runs on the main thread (you've already got a thread pool for the CPU-intensive stuff) and this could just be a standard HashMap that the server owns.

@the8472
Copy link

the8472 commented Jan 16, 2021

Would you need to cache by (filename, mtime) for this to be correct?

It can't be entirely correct since there are ways to modify files without updating mtime. But yeah, on windows that's probably a decent heuristic. On unix systems (dev, ino, mtime, size) could be used instead as a more compact alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants