Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: add GOEXPERIMENT=cacheprog to let a child process implement the internal action/output cache #59719

Closed
bradfitz opened this issue Apr 19, 2023 · 36 comments

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Apr 19, 2023

The cmd/go tool has great caching support. Unfortunately, its caching only supports filesystem-based caching.

I'd like to do things like hook into GitHub's native caching system at a lower level (instead of the inefficient thing people do now: untarring/tarring GOCACHE archives on every CI run, which is often slower than the CI action itself) and support things like a P2P cache gossip protocol between [trusted] coworkers within a company.

Clearly both those examples aren't realistic to add to cmd/go itself. So instead:

I propose that cmd/go support a GOCACHEPROG=/path/to/program environment variable (akin to GOCACHE=/path/to/dir) where the GOCACHEPROG is run as a child process and cmd/go speaks to it over stdin/stdout, translating the Go tool's internal cache interface, and then the GOCACHEPROG can do whatever caching mechanism/policy it wants.

I talked to @rsc about this once and he didn't seem opposed so I went off and implemented it and it's looking like it's going to be pretty awesome. (demo programs)

Thoughts, objections, etc?

(And preemptively: I have a soft spot for FUSE but FUSE is not an answer; it doesn't work in enough environments like CI test runner environments and it's finicky on basically all platforms but Linux, but also on Linux)


The protocol (from the code linked above) is currently:

// ProgCmd is a command that can be issued to a child process.
//
// If the interface needs to grow, we can add new commands or new versioned
// commands like "get2".
type ProgCmd string

const (
	cmdGet   = ProgCmd("get")
	cmdPut   = ProgCmd("put")
	cmdClose = ProgCmd("close")
)

// ProgRequest is the JSON-encoded message that's sent from cmd/go to
// the GOCACHEPROG child process over stdin. Each JSON object is on its
// own line. A ProgRequest of Type "put" with BodySize > 0 will be followed
// by a line containing a base64-encoded JSON string literal of the body.
type ProgRequest struct {
	// ID is a unique number per process across all requests.
	// It must be echoed in the ProgResponse from the child.
	ID int64

	// Command is the type of request.
	// The cmd/go tool will only send commands that were declared
	// as supported by the child.
	Command ProgCmd

	// ActionID is non-nil for get and puts.
	ActionID []byte `json:",omitempty"` // or nil if not used

	// ObjectID is set for Type "put" and "output-file".
	ObjectID []byte `json:",omitempty"` // or nil if not used

	// Body is the body for "put" requests. It's sent after the JSON object
	// as a base64-encoded JSON string when BodySize is non-zero.
	// It's sent as a separate JSON value instead of being a struct field
	// send in this JSON object so large values can be streamed in both directions.
	// The base64 string body of a ProgRequest will always be written
	// immediately after the JSON object and a newline.
	Body io.Reader `json:"-"`

	// BodySize is the number of bytes of Body. If zero, the body isn't written.
	BodySize int64 `json:",omitempty"`
}

// ProgResponse is the JSON response from the child process to cmd/go.
//
// With the exception of the first protocol message that the child writes to its
// stdout with ID==0 and KnownCommands populated, these are only sent in
// response to a ProgRequest from cmd/go.
//
// ProgResponses can be sent in any order. The ID must match the request they're
// replying to.
type ProgResponse struct {
	ID  int64  // that corresponds to ProgRequest; they can be answered out of order
	Err string `json:",omitempty"` // if non-empty, the error

	// KnownCommands is included in the first message that cache helper program
	// writes to stdout on startup (with ID==0). It includes the
	// ProgRequest.Command types that are supported by the program.
	//
	// This lets us extend the gracefully over time (adding "get2", etc), or
	// fail gracefully when needed. It also lets us verify the program
	// wants to be a cache helper.
	KnownCommands []ProgCmd `json:",omitempty"`

	// For Get requests.

	Miss      bool   `json:",omitempty"` // cache miss
	OutputID  []byte `json:",omitempty"`
	Size      int64  `json:",omitempty"`
	TimeNanos int64  `json:",omitempty"` // TODO(bradfitz): document

	// DiskPath is the absolute path on disk of the ObjectID corresponding
	// a "get" request's ActionID (on cache hit) or a "put" request's
	// provided ObjectID.
	DiskPath string `json:",omitempty"`
}
@gopherbot gopherbot added this to the Proposal milestone Apr 19, 2023
@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 19, 2023

A related approved proposal is #26232 for adding a similar helper process mechanism for go get auth.

@bradfitz bradfitz changed the title proposal: cmd/go: let external processes implement the internal action/output cache proposal: cmd/go: let a child process implement the internal action/output cache Apr 19, 2023
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Apr 19, 2023
@ianlancetaylor
Copy link
Member

CC @bcmills @matloob

@ianlancetaylor ianlancetaylor added the GoCommand cmd/go label Apr 19, 2023
@bcmills
Copy link
Contributor

bcmills commented Apr 19, 2023

See previously #42785.

@bradfitz
Copy link
Contributor Author

Right, thanks! I knew I'd seen something similar before.

Unlike that proposal, this one involves no network requests or protocol buffers or auth to figure out or changing Go caching or build semantics. Just JSON over stdin/stdout.

bradfitz added a commit to tailscale/go that referenced this issue Apr 20, 2023
Via setting GOCACHEPROG to a binary which speaks JSON over
stdin/stdout.

Updates golang#59719

Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/486715 mentions this issue: cmd/go: abstract build cache, support implementations via child process

bradfitz added a commit to tailscale/go that referenced this issue Apr 20, 2023
Via setting GOCACHEPROG to a binary which speaks JSON over
stdin/stdout.

Updates golang#59719

Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
@earthboundkid
Copy link
Contributor

JSON over stdin is probably the easiest plug-in option. The other one worth ruling out is running a WASI interpreter. WASI has the advantage that you can sandbox it and then run plug-ins you only mostly trust.

@bcmills
Copy link
Contributor

bcmills commented Apr 20, 2023

The main issue I foresee with JSON is that it is fairly inefficient for binary blobs, unless the idea is to use JSON to transmit filenames from the regular build cache..?

@bradfitz
Copy link
Contributor Author

@bcmills, see the implementation for details, but in summary: for Gets, only the path on disk is returned. From child process to cmd/go it's all JSON objects with no binary data. For Puts (from cmd/go to the child), the base64 binary is streamed to the client process after the JSON object metadata. So it's technically a bunch of JSON values (some objects, some strings) but can be implemented efficiently without slurping it all into memory.

@bradfitz
Copy link
Contributor Author

@carlmjohnson, if you want to run run WASI or Node or C# or a JVM you can pass a path to a program doing that. We're not going to bundle a WASI runtime into cmd/go. :)

bradfitz added a commit to tailscale/go that referenced this issue Apr 21, 2023
Via setting GOCACHEPROG to a binary which speaks JSON over
stdin/stdout.

Updates golang#59719

Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
bradfitz added a commit to tailscale/go that referenced this issue Apr 21, 2023
Via setting GOCACHEPROG to a binary which speaks JSON over
stdin/stdout.

Updates golang#59719

Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
bradfitz added a commit to tailscale/go that referenced this issue Apr 21, 2023
Via setting GOCACHEPROG to a binary which speaks JSON over
stdin/stdout.

Updates golang#59719

Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
bradfitz added a commit to tailscale/go that referenced this issue Apr 23, 2023
Via setting GOCACHEPROG to a binary which speaks JSON over
stdin/stdout.

Updates golang#59719

Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
bradfitz added a commit to tailscale/go that referenced this issue Apr 24, 2023
Via setting GOCACHEPROG to a binary which speaks JSON over
stdin/stdout.

Updates golang#59719

Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
@Xuanwo
Copy link

Xuanwo commented Apr 24, 2023

Great proposal! Additionally, sccache (a ccache-like compiler caching tool with native cloud storage support) is also interested in integrating with this feature.

@bradfitz
Copy link
Contributor Author

And I've posted example child process code at https://github.com/bradfitz/go-tool-cache.

@sluongng
Copy link

I would appreciate if we could get a doc in this proposal the specification of the go cache program:

  1. what are the commands
  2. what are the args to each command
  3. examples
  4. measure to ensure backward compatibility

It would help us separate between implementation and specification.


An alternative implementation I have in mind is to leverage https://github.com/bazelbuild/remote-apis/ which many build tools have started to adopt (Bazel, Buck2, Pants2, recc etc...). For example: separation between fetching Action and fetching Object might be ideal.

@Xuanwo
Copy link

Xuanwo commented Apr 27, 2023

Hello, @bradfitz. I am attempting to manually implement the Go cache protocol and have noticed that all implementations require similar concepts and logic surrounding actionID, outputID, objectID, and DiskCache.

Would it be possible to conceal these concepts within Go's built-in disk cache so that the implementation of cacher can be simplified as follows? Another benefit is that we don't have to transfer content in base64 JSON format through stdin.

For example:

// Request is the JSON-encoded message that's sent from cmd/go to
// the GOCACHEPROG child process over stdin. Each JSON object is on its
// own line. A Request of Type "put" with BodySize > 0 will be followed
// by a line containing a base64-encoded JSON string literal of the body.
type Request struct {
	// ID is a unique number per process across all requests.
	// It must be echoed in the Response from the child.
	ID int64

	// Command is the type of request.
	// The cmd/go tool will only send commands that were declared
	// as supported by the child.
	Command Cmd

	// ObjectID is set for Type "put" and "output-file".
	OutputID []byte `json:",omitempty"` // or nil if not used

        DiskPath string `json:",omitempty"`
}

// Response is the JSON response from the child process to cmd/go.
//
// With the exception of the first protocol message that the child writes to its
// stdout with ID==0 and KnownCommands populated, these are only sent in
// response to a Request from cmd/go.
//
// Responses can be sent in any order. The ID must match the request they're
// replying to.
type Response struct {
	ID  int64  // that corresponds to Request; they can be answered out of order
	Err string `json:",omitempty"` // if non-empty, the error

	// KnownCommands is included in the first message that cache helper program
	// writes to stdout on startup (with ID==0). It includes the
	// Request.Command types that are supported by the program.
	//
	// This lets us extend the gracefully over time (adding "get2", etc), or
	// fail gracefully when needed. It also lets us verify the program
	// wants to be a cache helper.
	KnownCommands []Cmd `json:",omitempty"`

	// For Get requests.
	OutputID  []byte `json:",omitempty"`

	// DiskPath is the absolute path on disk of the ObjectID corresponding
	// a "get" request's ActionID (on cache hit) or a "put" request's
	// provided ObjectID.
	DiskPath string `json:",omitempty"`
}

All cacher will just need to handle requests and implement the following API:

// Load cache of given outputID and write into diskPath
fn Get(ctx context.Context, outputID string, diskPath string) (err error)

// Read content from diskPath and store for outputID
fn Put(ctx context.Context, diskPath string, outputID string) (err error)

@bradfitz
Copy link
Contributor Author

bradfitz commented May 3, 2023

@Xuanwo, I don't want cmd/go to pick the path on disk, though. I want cacher programs to be able to return paths to FUSE filesystems back to cmd/go if they'd like.

@bradfitz
Copy link
Contributor Author

bradfitz commented May 3, 2023

I've updated the top comment with the summary of the proposed protocol from the code review.

@rsc what's the process of getting a GOEXPERIMENT at least so I can get more testing on this over a release cycle without having to carry a big patch for a long time and without getting Hyrum-locked into a particular API if there's a big mistake we didn't realize. I'd ideally love to get this into Go 1.21 (behind a compile-time GOEXPERIMENT) to test it during the next 9 months and maybe get into Go 1.22 on by default.

@bcmills
Copy link
Contributor

bcmills commented Jun 16, 2023

No, the module cache remains separate. (It wasn't cleaned or populated in the same way as the build cache to begin with.)

@seankhliao seankhliao changed the title cmd/go: add GOEXPERIMENT=gocacheprog to let a child process implement the internal action/output cache cmd/go: add GOEXPERIMENT=cacheprog to let a child process implement the internal action/output cache Oct 8, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/556997 mentions this issue: cmd/go: add initial cacheprog integration tests

@aaomidi
Copy link

aaomidi commented Nov 10, 2024

This sounds great, I wonder if a similar proposal can be made for the go module cache? Specifically, I think there are some ways go currently handles the module cache that makes it impossible for tools like Bazel to cache those.

adotkhan pushed a commit to Psiphon-Labs/utls that referenced this issue Dec 11, 2024
Via setting GOCACHEPROG to a binary which speaks JSON over
stdin/stdout.

For now, it requires GOEXPERIMENT=cacheprog.

Fixes golang/go#59719

Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
Reviewed-on: https://go-review.googlesource.com/c/go/+/486715
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/638567 mentions this issue: cmd/go: add "go help gocacheprog" topic

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/638566 mentions this issue: cmd/go: improve GOCACHEPROG types documentation

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/638996 mentions this issue: cmd/go/internal/cacheprog: drop redundant Prog prefixes

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/638995 mentions this issue: cmd/go: move GOCACHEPROG protocol types to their own package

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/638997 mentions this issue: cmd/go: document GOCACHEPROG in go help environment

mateusz834 pushed a commit to mateusz834/tgoast that referenced this issue Dec 31, 2024
Via setting GOCACHEPROG to a binary which speaks JSON over
stdin/stdout.

For now, it requires GOEXPERIMENT=cacheprog.

Fixes golang/go#59719

Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
Reviewed-on: https://go-review.googlesource.com/c/go/+/486715
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
gopherbot pushed a commit that referenced this issue Jan 2, 2025
This is in preparation for adding a "go help" topic for GOCACHEPROG.

Updates #71032
Updates #59719

Change-Id: I9dbbe56fa328dffe89207b5b41a0f37afd51e2b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/638566
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
gopherbot pushed a commit that referenced this issue Jan 2, 2025
This is a step toward making it easy to point to them in
documentation. The other option is that we copy-paste all of these
type definitions wholesale, which seems ridiculous.

Updates #71032
Updates #59719

Change-Id: I7117e03308ae0adc721ed7a57792c33ba68ce827
Reviewed-on: https://go-review.googlesource.com/c/go/+/638995
Auto-Submit: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
gopherbot pushed a commit that referenced this issue Jan 2, 2025
Now that these types are in their own package, drop the unnecessary
Prog prefixes from everything.

Updates #71032
Updates #59719

Change-Id: Id54edf0473754e3b21a71beb72803fb5481206c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/638996
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Jan 2, 2025
This adds GOCACHEPROG to the list of environment variables in "go help
environment" and points to the cacheprog package documentation for
details of the protocol.

Fixes #71032
Updates #59719

Change-Id: Ib8f5804926a8fa59237661076d129c2852665ac3
Reviewed-on: https://go-review.googlesource.com/c/go/+/638997
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
wyf9661 pushed a commit to wyf9661/go that referenced this issue Jan 21, 2025
This is in preparation for adding a "go help" topic for GOCACHEPROG.

Updates golang#71032
Updates golang#59719

Change-Id: I9dbbe56fa328dffe89207b5b41a0f37afd51e2b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/638566
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
wyf9661 pushed a commit to wyf9661/go that referenced this issue Jan 21, 2025
This is a step toward making it easy to point to them in
documentation. The other option is that we copy-paste all of these
type definitions wholesale, which seems ridiculous.

Updates golang#71032
Updates golang#59719

Change-Id: I7117e03308ae0adc721ed7a57792c33ba68ce827
Reviewed-on: https://go-review.googlesource.com/c/go/+/638995
Auto-Submit: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
wyf9661 pushed a commit to wyf9661/go that referenced this issue Jan 21, 2025
Now that these types are in their own package, drop the unnecessary
Prog prefixes from everything.

Updates golang#71032
Updates golang#59719

Change-Id: Id54edf0473754e3b21a71beb72803fb5481206c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/638996
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
wyf9661 pushed a commit to wyf9661/go that referenced this issue Jan 21, 2025
This adds GOCACHEPROG to the list of environment variables in "go help
environment" and points to the cacheprog package documentation for
details of the protocol.

Fixes golang#71032
Updates golang#59719

Change-Id: Ib8f5804926a8fa59237661076d129c2852665ac3
Reviewed-on: https://go-review.googlesource.com/c/go/+/638997
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests