-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hitting multiple "global" module caches #575
Comments
Hrm interesting. I have heard in the past that these modules can be tied to some of those configurations, for example in the minimum OS case wouldn't you have to make sure to produce the cache with the lowest number? Or is that done automatically with the hashes in the filenames? |
Also how many module caches do you have that this causes a problem? |
Yeah I believe it does have to namespace material in module cache at the config level, I haven't looked at this in depth in a while. Case in point other apps seem to use LMK if your open to this change and I can put up a PR for it. Also, I think it might cause less indexer data to be written but I need to double check. |
Good point. Seems reasonable to me, @allevato might be able to provide more context on the original path decision tho |
ATM I've got one swift cache per OS version, per transition config, per bazel workspace |
In the two examples shown above, since the iOS minimum deployment target differs, then the implicit modules between the two builds would differ as well. Moving the cache location out to
you'll have this:
So you still won't actually be sharing anything between the two builds, because Clang still has to compile the modules differently when the minimum deployment target changes. Now, because Starlark transitions can affect Bazel settings that end up being a superset of what gets passed to the Clang invocation, you could theoretically end up with multiple configurations that could share modules that don't today—and you can detect this by looking at all the FWIW, the addition of this feature was before we started using any Starlark transitions, so the possibility of the above happening was zero; all the configuration bits that affect the part of the path before the Starlark transition hash (e.g., |
Ah thanks for clarifying @allevato Yep, my example isn't 100% comprehensive of where the current feature beaks down. In the example above, getting 1 module cache per OS version in that case would be hinged on having identical transitions per OS, which may not always happen in practice. Specifically, since Bazel's transition config hash is computed on other Bazel CLI args than OS version. There's some issues on github about this e.g. bazelbuild/bazel#12731 and rules_ios ( a ruleset that wraps rules_swift ) originally used multiple config hashes. I'd still like to have a global cache to:
For me the indexstore issue is enough of an improvement move the "global module cache" out of |
While you could share the same module cache location, just as the situation above, you won't actually be sharing any modules themselves—you'll end up with the same modules in two different hash subdirectories. There are two reasons for this:
These two points are why I had to add support for |
@allevato thanks for sheding some light on the clang/swift interop - looks like we might not be able share the modules from clang / swift and at a higher level might not make sense anyways. This being said, using the proposed global path still is still cutting my global swift indexstore size in half which makes this a useful to me. Also as we're guaranteed to get a module cache per transition hash - that puts it at O(N So atleast 2 main benefits of using a global cache are:
From the compilers' perspective only, it shouldn't cause issues given data is correctly namespaced in that global cache. In general, it seems worth considering since other apps seem OK with use a global cache. In my experience, I haven't seen any negative side effects of a global cache yet with Bazel or remote builds. Do you know of any cons to the proposed change? |
I can't think of any other problems off the top of my head. The implicit module cache is already non-hermetic because it's placed at the configuration-specific output root instead of being sandboxed to each target (which would of course defeat the purpose), so it's not like moving it higher up would make it "more non-hermetic". In the past, we've seen issues where teams would use the global module cache, sync, and then rebuild and the module cache wouldn't be invalidated properly causing the compiler to crash because the underlying header files changed. It happened nondeterministically enough and rarely enough that I was never able to track down the cause, but that happened even when the cache was restricted to the configuration subdirectory. I suppose there's a slim possibility that moving it higher could cause the likelihood of that to increase, but Clang should be isolating the modules well enough in the first place. But it also should have been doing it so that the crashes we saw would have never occurred... 🤷♂️ So if you see other concrete benefits from it, I think it should be fine. |
Thanks for taking the time out of your day to weigh in on this. I totally agree: re tracking down corruption of that cache is hard to reproduce. I've seen it sparsely too. If sourcekit is pointing at a global module cache it will have the same issue when completing/indexing. Some out band intervention e.g. "Remove derived data" when rebasing or cleaning state during project generation time has always seemed required - effectively treating every last module cache as ephemeral. Today the compilers still put the global state into the |
Doesn't seem like there's anything super clear to do here. I think the true way out at this point is explicit modules |
Apologies for restarting this old conversation but we pretty regularly hit compilation issues like one described here. Do ya'll still run into this or have you worked around it somehow? |
I'd like to use a global module cache to mitigate i/o from a bazel build. It looks like the feature "global module cache" is almost doing this with a non hermetic cache, but it ends up creating a per "configuration" cache.
Specifically:
rules_swift/swift/internal/compiling.bzl
Line 942 in 22a2472
This is an issue when the build has multiple OS versions, or transition configurations. It seems to creates module cache per config
To fix, we can take the first part of the
bin_dir
instead of the entire dir: it'd have the effect of pointingswiftc
atbazel-out/_swift_module_cache
. We could make the change to fix it rules_swift or just fix it externally 👍The text was updated successfully, but these errors were encountered: