-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: external dependency caching using remote gRPC protocol #2557
Comments
It's worth noting that we'd be willing to contribute time to implement it, but guidance would be much appreciated before we start :) |
FWIW, getting the CAS interface unified for repository feature and remote cache would be good. As with the prototype of on disk cache, where I tried to share the on-disk cache for both of those. Indeed sharing the my 0.02$ |
@ola-rozenfeld is there an upper limit to Blob sizes in CAS? i.e. if we had a large external dep tar-gzed in there, would it exceed it? |
@damienmg Would you guys accept upstream patches if we were to contribute them?
|
So, having looked at the We're hitting the Git external checkout issues hard :( We have a rule of our own that downloads github.com git repos using a HTTPS archive link, but it doesn't work for transitive workspace deps (ones that come from external git repos). |
@mwitkow: Sorry for the delay apparently I missed your message. IIUC what you propose that doesn't seem something we would want to integrate. You mean like you define a git_repository and git repository figure out by magic that it points to a github repository so we download the tarball instead of the github repository? This is easily doable with a simple skylark macro. For the second one, it would be we download an archive that does not provide a shasum but we compute it after the fact and store it nonetheless in the CAS. Since you won't specify the shasum we won't ask the CAS anyway, why would you want that? What do you mean "it doesn't work for transtive workspace deps"? Do you mean you are hitting #2757? |
Thanks for getting back :) I agree that having a simple skylark rule would solve the problem for external deps that we define in our own workspace. However, we do source external rules, such as Unfortunately, such The proposal to add the Github-archive fetch is just a cheeky work-around to the original problem we have: We have a dockerized "blessed, clean build environment" build pipeline, and we currently rely on an An ideal solution IMHO (pardon my ignorance if incorrect) would be to treat the results of WORKSPACE-scoped Skylark rules as artifacts similarly to how BUILD rules are treated. This way a result of a |
We are thinking of adding workspace overload feature maybe they should also
overload Skylark symbol but we need to see how native module will evolve.
…On Mon, Apr 10, 2017, 6:21 PM Michal Witkowski ***@***.***> wrote:
Thanks for getting back :)
I agree that having a simple skylark rule would solve the problem for
external deps that we define in our own workspace. However, we do source
external rules, such as rules_scala. They usually define a
<something>_repositories() are the things that "flatten" the WORKSPACE
into a single file, circumventing the issue of #2757
<#2757>.
Unfortunately, such <something>_repositories() can use the
new_git_repository themselves. Which means we have no way of controlling
how they download Git repos.
The proposal to add the Github-archive fetch is just a cheeky work-around
to the original problem we have:
Checking out, and clean-building a complicated bazel workspace spends
*tons* of time on git clone and there is no way of caching that.
We have a dockerized "blessed, clean build environment" build pipeline,
and we currently rely on an rclone hack to get an external directory
populated (from an early morning build). You can probably imagine how flaky
that is ;) We'd love to all in on the bazel-buildfarm approach and once
#1413 <#1413> is fixed, the
checkout of workspace external deps would be the only blocker for us.
An ideal solution IMHO (pardon my ignorance if incorrect) would be to
treat the results of WORKSPACE-scoped Skylark rules as artifacts similarly
to how BUILD rules are treated. This way a result of a new_git_repository
would be cached inside the CasService of the bazel-buildfarm exactly like
partial build artifacts are.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2557 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADjHf0y7bdz5NeNSY1Q93GcrGuMhTVtwks5rulcUgaJpZM4MGmrO>
.
|
Any updates here? |
Hey, was wondering, if this is still pending, or is there a way to have a remote-cache'abilty for external |
@pl-misuw see https://github.com/buchgr/bazel-remote and #10622 In short you should be able to do this today. |
Description of the feature request:
As part of implementing a proof of concept of a distributed cache for Bazel builds (see mwitkow/bazel-distcache) it seems that the remote_protocol.proto CASService is not used for caching
<output_dir>/external
content.The reason why we're interested in building
distacache
is because we run Dockerized bazel builds (using Jenkins and Concourse), and we'd rather not share theoutput_dir
verbatim.Not having the
external
cachable is a massive problem for such a use case. We have quite a few git dependencies for rules_go since there are tons of external dependencies in Go, and quite a few Skylark rules are referenced this way. Moreover, we have quite a few Maven deps which would be cacheable easily.There seems to already be a
RepositoryCache
implementation in Bazel that allowsHttpDownloader
andMavenDownloader
to cache from local disk (using--experimental_repository_cache
) as part of #1752. I think it could rely on the CASService, as it even has the appropriate hashing methods.The proposal:
(a): Extend
RepositoryCache
to be able to use theCASService
, by reusingRemoteActionCache
.(b): Come up with a way to use
RepositoryCache
for GitRepos. Most likely split them up to be a ZIP that can be cached likehttp_archive
. Most likely belongs in a separate ticket.CCs:
@ola-rozenfeld - for RemoteCache
@jin - for RepositoryCache
@kchodorow - since seems to be main stakeholder in #1752
@dinowernli - since he probably wants to be on this ticket anyway ;)
The text was updated successfully, but these errors were encountered: