-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ccache support #44
Ccache support #44
Conversation
This cuts down true incremental builds, but only slightly on Workers (~2m 46s => ~2m 25s). I notice the following stats (zeroed before the build but with a fully populated cache):
I think the preprocessor error is perhaps Ekam invoking the compiler in an interim state maybe? If so, that could be a large part of the cost as compiler invocations can be expensive if the compiler is actually evaluating things. It's certainly in the right ball park as an explanation (we have 121 executables vs 131 preprocessor errors, so maybe some executable take multiple invocations to resolve?). The "could not write to output file" might indicate there's some kind of remaining bug within Ekam that's preventing ccache from materializing the cached file in the output location for some reason? I didn't see any errors in the ccache log & the Ekam log wasn't helpful either (it's a lot of log content though, so possible to have missed). This overall does seem to indicate that perhaps Ekam itself is adding a noticeable amount of overhead to do its magic "no build files". If that's the case, the benefit of caching (& distributed compilation) disappears pretty quickly. Or it could be a mix of magic discovery & interposition (they're too intertwined to provide a reasonable guess). |
Actually, thinking about this more, the preprocessor errors wouldn't be a linkage error & we don't even call ccache for that.. Might need to investigate more thoroughly via more ccache debug logs what the preprocessor errors are. Are there scenarios where Ekam ends up calling the compiler where it can fail & requires calling it another time. |
Does this actually work with Ekam's execution model? Consider that Ekam frequently speculatively executes compiler commands and kills them early if it decides that some other action has changed what the result would be. For example, Ekam might try to compile Is ccache designed in such a way that being randomly killed won't result in cache corruption? |
I think that's fine in terms of cache integrity. ccache being terminated unexpectedly isn't generally the source of their corruption issues (skimming their NEWS file, it more has to do with not playing nice with the compiler or NFS systems). They also have checksumming of the cache so even if it corrupts in theory it shouldn't be a problem. Given the robustness & how wide-spread ccache is used, I'm not concerned about this aspect. I'm curious under what timing scenario a compiler will have encountered a missing file but end up with the compiler still running when the missing file is generated. The compiler should exit pretty quick on any such error. What you've described does explain why there's no benefit from ccache. All the speculative compilation is actually expensive. Could probably speed up Ekam significantly by doing something as simple as prioritizing some of the rules (e.g. run any source code gen rules first since they should drastically reduce the amount of erroneous compiles). |
Yes, it's probably rare for compile steps, but if the code gen happened to run at the same time as the compile that needed it, it could happen I think. The more common scenario is linking, where as soon as ekam sees an object file with a |
Ekam doesn't wait for the entire set of missing symbols for the current linkage to be resolved before trying to link? On the topic of this PR itself, this commit is harmless & fixes some issues within Ekam (e.g. stat64 wasn't wrapped). I think I will try to tackle having Ekam do less work at some point as it seems like it's costing us ~2 minutes 20 seconds out of ~2 minutes 50 seconds of the overall build which seems a bit high (+ it'll also make the ccache/distcc support more meaningful). |
No. The Ekam engine only knows: "When I last ran this action, it looked for all of these tags, and I gave this set of answers. If any of my answers change, I need to re-run the action." The rule implementation doesn't have any way to tell the engine "Don't bother running me again unless all of these tags are found." Note that searching for a tag that doesn't exist isn't necessarily an error -- for example, the compiler commonly searches the include path for headers. For any system header that you include, the compiler will first look for the header in the local repo, resulting in Ekam saying "not found", after which it will look in the system include directory. So if we blocked re-running of actions with "missing dependencies", pretty much all compile actions would always be blocked from re-running. It might be reasonable to de-prioritize actions that have known missing dependencies, as long as they do still run once there's no other actions left to run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the topic of this PR itself, this commit is harmless
Unfortunately, this isn't true -- this PR contains several bugs. If it isn't adding much real value, it might make sense to shelve it rather than iterate on it.
it seems like it's costing us ~2 minutes 20 seconds out of ~2 minutes 50 seconds of the overall build
I think that's an exaggeration. Remember that the build does a lot of things other than compiling -- e.g. it links a lot of binaries and runs a lot of tests. ccache isn't eliminating those.
True, but given that the UI & htop seems to indicate that a non-trivial amount of time is spent compiling, you would expect ccache to show a non-trivial speedup. I didn't mean to state it as conclusive evidence. It could be that ccache support is still incomplete in some way that causes it to not realize the gains you might expect. It's hard to say for sure, but I think reducing Ekam overbuild is part of the strategy to get clarity on that (otherwise it's too hard to spot issues in the ccache logs).
Did you find any other issue beyond the |
948e621
to
9571f7b
Compare
Distcc/ccache/sccache can wrap the compiler command line & speed up the build. Setting `CXX_WRAPPER` to the path to that binary lets Ekam invoke correctly when compiling & doesn't break linking. See discussion in capnproto#44 for more context.
This environment variable is a colon separated list of paths that have a trailing '/' for which remapping should be skipped. This is useful for things like ccache which try to access `~/.ccache` to write the cache result. Since there's no actual guarantee that ccache is writing to ~/.ccache (could be overwritten as a setting or via runtime environment variable), it's up to the end user to orchestrate this correctly.
systemd recommends using /var/run/X/ for temporary directories (where X is your uid) for security & ccache makes use of it if available. Add this to the list of temporary paths that are allow-listed to bypass remapping.
Not sure why we hadn't hit this elsewhere but ccache does resolve stat to stat64 when building.
Distcc/ccache/sccache can wrap the compiler command line & speed up the build. Setting `CXX_WRAPPER` to the path to that binary lets Ekam invoke correctly when compiling & doesn't break linking. See discussion in capnproto#44 for more context.
There's a few things that were needed here.
/var/run<UID>
as a temporary directory. Could be worked around with ccache environment variables, but since this is a systemd convention, I figured it makes more sense as a part of ekam.