consider caching file hashes in the sccache server #758

froydnj · 2020-05-27T20:07:18Z

...so that, particularly for Rust compilations, we don't waste a bunch of time re-hashing the same files over and over again.

I don't know how much this really helps, because it probably hurts a little bit (?) on single-shot builds where the server doesn't stay up very long (e.g. Firefox automation builds)...though maybe not having to touch the disk (or the kernel's file cache, or whatever) is a win overall. Also unsure how easy it is to arrange things so you don't wind up badly serializing the whole process on your hash cache; maybe the locking overhead would really not be that large.

luser · 2020-05-28T12:26:25Z

From a cursory search there appear to be several concurrent hashtable crates out there, one of those might be suitable. Given that this is a cache there might be some special-purpose data structure that would work better, I don't know. Would you need to cache by (filename, mtime) for this to be correct? I assume your concern is mostly the time spent hashing rlib / rmeta files, right? The existing cargo fingerprinting there probably makes that less of a concern.

A data structure that allowed lock-free reads ought to keep the fast path fast, and given that the file hashing code is already async, the write path could be something like:

Attempt to insert a "pending" entry (I'd probably use the read end of a channel).
2a. If a pending entry already exists (some other thread racing), await it.
2b. If not, insert the pending entry and kick off the hash calculation.
When hash calculation finishes, swap out the pending entry for an actual calculated hash entry, and resolve the pending entry so anyone awaiting it is unblocked.

Honestly if you get updated to the latest tokio and whatnot you could likely define a better threading model for sccache where most of the server code always runs on the main thread (you've already got a thread pool for the CPU-intensive stuff) and this could just be a standard HashMap that the server owns.

the8472 · 2021-01-16T10:11:57Z

Would you need to cache by (filename, mtime) for this to be correct?

It can't be entirely correct since there are ways to modify files without updating mtime. But yeah, on windows that's probably a decent heuristic. On unix systems (dev, ino, mtime, size) could be used instead as a more compact alternative.

luser mentioned this issue Feb 4, 2021

Support caching cc invocations with -fprofile-use #941

Open

michaelwoerister mentioned this issue Feb 11, 2021

Cache file hashes #953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider caching file hashes in the sccache server #758

consider caching file hashes in the sccache server #758

froydnj commented May 27, 2020

luser commented May 28, 2020

the8472 commented Jan 16, 2021

consider caching file hashes in the sccache server #758

consider caching file hashes in the sccache server #758

Comments

froydnj commented May 27, 2020

luser commented May 28, 2020

the8472 commented Jan 16, 2021