-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go: cache link output binaries in the build cache #69290
Comments
How often do we think the cache would be used for a binary? Pretty much any change to the code of any of the inputs will require a relink. When would this save time in practice? |
I think the main use case is Could we consider a total space ejection policy rather than a time based one? (drop oldest from cache when we try to add something to cache that pushes it over the limit) |
Yes, I think the main use would be for |
Thanks. That at least raises the possibility of using the build cache for |
One thing I'd like for this is if the package name showed up in the output of I think that means that we should store binaries like stringer on disk in a directory: |
Change https://go.dev/cl/613095 mentions this issue: |
I've put together a basic prototype (caches all binaries, doesn't use package name or ExeName as file name) at golang.org/cl/613095 |
Does this need a proposal? It seems to be a mere implementation detail that shouldn't affect the observable behavior, except quantitatively. (I suppose @ConradIrwin's point that "go run" processes might notice their argv[0] name has changed is a counterargument but @matloob has an implementation that avoids that by using a subdirectory |
ETXTBSY might not be an issue on Linux for too long: #22315 (comment) |
There are already people who complain about the size of the build cache (e.g., #68872), so I do think this is more than an implementation detail. |
The original For the That said, I would ideally like a longer cache expiry time for tools than 2 days. It's very common for people to take weekends off (and even long weekends happen occasionally). 5 days is actually quite good for the use-case of "stop working on Thursday afternoon and pick it up again on Tuesday morning". If we're changing to cache To be clear, if we're not changing the caching behavior of |
This proposal has been added to the active column of the proposals project |
@ConradIrwin said:
Please elaborate on these statements, I'm not sure I understand. That last part about it being rare to actively work on multiple modules that use the same tool at the same version being rare caught my attention. My current project has several mock services written as standalone modules that are used by many other modules's test suites. The tests |
@ChrisHines interesting. Are all your modules in one workspace? My experience has been using one module with a bunch of packages to get the same effect. How do you keep all your services' dependencies' versions in sync? |
Each module is in a separate Git repository.
Excellent question. This project was late to adopt modules for reasons I wrote about last year in #60915. Prior to modules our dependencies were managed by a tool that allowed depending on a branch head and we used that for our internal dependencies. With that tool all of our "modules" referenced specific versions of third party libraries, but referenced the We wanted to preserve the existing CI and developer workflows the project had for years as we transitioned to Go modules and fortunately we were able to figure out a way to do that. The short and simplified explanation follows. For any new commit to the |
I think it makes sense to cache the build output of |
I would not do |
it seems less useful for |
Yeah, I don't think we need to cache the binary for |
For the practical purposes of this proposal we'd want to support |
Based on the discussion above, this proposal seems like a likely accept. This proposal is for cmd/go to cache the binary outputs of link actions to the build cache for |
There's actually one valid use case for that: sharding slow tests across many machines, with each running a different subset of tests. e.g. we have a test helper like: // Shard skips t if it's not running if the TS_TEST_SHARD test shard is set to
// "n/m" and this test execution number in the process mod m is not equal to n-1.
// That is, to run with 4 shards, set TS_TEST_SHARD=1/4, ..., TS_TEST_SHARD=4/4
// for the four jobs.
func Shard(t testing.TB) {
e := os.Getenv("TS_TEST_SHARD")
a, b, ok := strings.Cut(e, "/")
if !ok {
return
}
wantShard, _ := strconv.ParseInt(a, 10, 32)
shards, _ := strconv.ParseInt(b, 10, 32)
if wantShard == 0 || shards == 0 {
return
}
shard := ((testNum.Add(1) - 1) % int32(shards)) + 1
if shard != int32(wantShard) {
t.Skipf("skipping shard %d/%d (process has TS_TEST_SHARD=%q)", shard, shards, e)
}
} And then we configure GitHub to run our tests multiple times, each running a different subset of the tests: test:
strategy:
fail-fast: false # don't abort the entire matrix if one element fails
matrix:
include:
- goarch: amd64
coverflags: "-coverprofile=/tmp/coverage.out"
- goarch: amd64
buildflags: "-race"
shard: '1/3'
- goarch: amd64
buildflags: "-race"
shard: '2/3'
- goarch: amd64
buildflags: "-race"
shard: '3/3'
...
- name: integration tests as root
run: PATH=$PWD/tool:$PATH /tmp/testwrapper -exec "sudo -E" -race ./tstest/integration/
env:
TS_TEST_SHARD: ${{ matrix.shard }} So with |
I don't want to let the perfect be the enemy of the good here. I think we have a solid and agreed upon core for caching |
No change in consensus, so accepted. 🎉 This proposal is for cmd/go to cache the binary outputs of link actions to the build cache for |
Proposal Details
This proposal is for cmd/go to cache the binary outputs of link actions in builds to the build cache. Binary outputs would be trimmed from the cache earlier than package outputs.
cmd/go currently caches the outputs of compiling a package (
build
actions) in the build cache, but does not cache the outputs of linking a binary (link
actions) in the build cache:go/src/cmd/go/internal/work/buildid.go
Line 713 in 2707d42
The primary reasons binaries are not cached are that built binaries are much larger than individual package object files and they are not reused as often. We would mitigate that by trimming binaries with a shorter
trimLimit
than we currently use for the package objects in the cache: we currently remove a package output if it hasn't been used for five days, but we would perhaps choose two days for binaries. To make it easy to identify binaries for trimming, we would store them in a different location than package objects: perhaps instead of $GOCACHE// they would be stored in $GOCACHE/exe//.We would also need to figure out what to do about the potential for ETXTBSY issues trying to execute the built binaries: see #22220. If the go command tries to write to a binary and then execute it we can get errors executing the binary. We'll have to figure out what to do about this because we would need to write the build id into the binary and then execute it, if we're doing a
go run
.cc @rsc @samthanawalla
The text was updated successfully, but these errors were encountered: