Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add grpc proxy backend #680

Closed
wants to merge 17 commits into from

Conversation

AlessandroPatti
Copy link
Contributor

@AlessandroPatti AlessandroPatti commented Jul 24, 2023

This adds a proxy backend that uses the grpc protocol. When the size of the blobs are unknown (i.e. when using the bazel-remote is used with the REST api), the proxy relies on the asset api to get the blob size, so in order to use another bazel-remote instance as backend --experimental_remote_asset_api must be provided to the proxy.
Tested mostly with the added unit test and some local testing done building bazel-remote itself. These are the step used to test locally (each command executed in a different terminal)

$ rm -rf /tmp/proxy && bazel run //:bazel-remote -- --dir /tmp/proxy -max_size 5 -port 8080 -grpc_port 9090 --storage_mode uncompressed --experimental_remote_asset_api --tls_cert_file=/tmp/certs/server/cert.crt --tls_key_file=/tmp/certs/server/key.key --tls_ca_file=/tmp/certs/client/cert.crt
$ rm -rf /tmp/cache1 && bazel run //:bazel-remote -- --dir /tmp/cache1 --max_size 5 -port 8081 -grpc_port 9091 --storage_mode uncompressed --grpc_proxy.url localhost:9090 --grpc_proxy.cert_file=/tmp/certs/client/cert.crt --grpc_proxy.key_file=/tmp/certs/client/key.key
$ rm -rf /tmp/cache2 && bazel run //:bazel-remote -- --dir /tmp/cache2 --max_size 5 -port 8082 -grpc_port 9092 --storage_mode uncompressed --grpc_proxy.url localhost:9090 --grpc_proxy.cert_file=/tmp/certs/client/cert.crt --grpc_proxy.key_file=/tmp/certs/client/key.key
$ bazel clean && bazel build //:bazel-remote --disk_cache= --remote_cache=grpc://localhost:9091 --remote_upload_local_results && bazel clean && bazel build //:bazel-remote --disk_cache= --remote_cache=grpc://localhost:9092 --remote_upload_local_results --remote_accept_cached

@AlessandroPatti AlessandroPatti force-pushed the apatti/grpcproxy branch 2 times, most recently from 27aebfc to 1bcd37a Compare July 24, 2023 21:51
Copy link
Collaborator

@mostynb mostynb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting on this feature- it has been on my wishlist for a while :)

I will try to find time for a thorough review tomorrow, but I have left some initial comments in the meantime (I think your PR is missing utils/backendproxy/ btw).

cache/grpcproxy/grpcproxy.go Show resolved Hide resolved
&cli.StringFlag{
Name: "grpc_proxy.key_file",
Value: "",
Usage: "Path to the key used to autheticate with the proxy backend. Enables mTLS.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key_file / cert_file help texts should each mention that using one requires the other.

cache/grpcproxy/BUILD.bazel Show resolved Hide resolved
cache/grpcproxy/grpcproxy.go Show resolved Hide resolved
@AlessandroPatti AlessandroPatti force-pushed the apatti/grpcproxy branch 6 times, most recently from 6481feb to 98b2066 Compare July 30, 2023 19:24
Copy link
Collaborator

@mostynb mostynb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it has taken me a while to get back to this.

I think the most important thing to do next is see my comment about investigating more direct gRPC->gRPC backend proxying.

README.md Outdated
@@ -243,6 +243,19 @@ OPTIONS:
Goroutines to process parallel uploads to backend. (default: 100)
[$BAZEL_REMOTE_NUM_UPLOADERS]

--grpc_proxy.url value The base URL to use for a grpc proxy backend.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need an example URL here, so the form is obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added it


func New(clients *GrpcClients,
accessLogger cache.Logger, errorLogger cache.Logger,
numUploaders, maxQueuedUploads int) (cache.Proxy, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function only seems to return a nil error, do we need that return value?

On the other hand, maybe this should check the capabilities of the backend server, and possibly return an error from that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed both!

cache/grpcproxy/grpcproxy.go Show resolved Hide resolved
with the proxy backend using mTLS. If this flag is provided, then
grpc_proxy.key_file must also be specified.
[BAZEL_REMOTE_GRPC_PROXY_CERT_FILE]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How difficult would it be to add support for basic/password authentication? I think we should consider adding this, mostly to avoid documenting that it doesn't work, but also because it's probably easier to setup a system test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The password authentication is right now used only for clients to authenticate with bazel-remote, but not for it to authenticate with the proxy backend. As far as I can tell, none of the existing proxies support this right now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bazel-remote <-> proxy backend authentiation is completely separate from the client <-> bazel-remote authentication though. If bazel-remote is the main intended use case for the grpc proxy backend, then I think we should support that. But this would be OK to land in a followup (I can help if needed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not realise that the http proxy actualy supports basic auth, just need to pass the credentials in the proxy url. I took a stab at doing the same for the grpc proxy.

cache/grpcproxy/grpcproxy.go Show resolved Hide resolved
config/config.go Outdated
@@ -348,6 +361,17 @@ func validateConfig(c *Config) error {
}
}

if c.GRPCBackend != nil {
if c.StorageMode != "uncompressed" {
return errors.New("--grpc_proxy does not yet support compressed storage mode")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lacking support for this mode is unfortunate, it could make the gRPC proxy backend less efficient than HTTP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although it depends on whether clients use compression or not. Need to spend some time to figure out how to efficiently get the raw blob (including the header) from a proxied bazel-remote instance instead having to recompress it on the fly and did want to block this feature in the meanwhile. Is compression support a prerequisite for this to merge?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is compression support a prerequisite for this to merge?

It's not a prerequisite, but if we don't support it then I think we would need some documentation and/or example setups that explain the issue- in particular, comparing to using a HTTP bazel-remote backend. (And maybe it's better to just land this feature which works optimally, and skip that documentation?)

Ideally I would like things to work nicely when clients use compression, because that seems to be the best tradeoff for performance (CPU time vs download time on the client side, and minimal CPU usage + effectively larger cache size on the server side).

The best case scenario IMO is when the client, frontend bazel-remote and optional proxy backend server all support compression. The frontend bazel-remote instance needs to verify the sha256 of blobs from the client, but it can then store a compressed result from its disk cache (increasing its effective disk cache size) and serve blobs without re-verifying their hashes.

The presence of a proxy backend increases complexity- should the frontend trust results from the backend? If the answer is yes, then the network transfers from the backend to the client are most efficient (a compressed result can be sent from the backend, via the frontend, to the client without recompression). If the answer is no, then CPU load on the frontend increases, because it needs to verify results from the backend.

Circling back to your question of should this PR land without compression support: would this gRPC backend be better than using proxy backend bazel-remote instance using HTTP with compression support? (Or do you have a different setup than I am imagining, is the proxy backend something other than bazel-remote? Could you to describe your use-case for this feature?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mostynb Sorry it took me this long to get back on this one. The setup I have in mind is using another bazel-remote as backend, but without compression. From my experience, enabling compression on both client and server (and proxy?) was never better: the overhead from compressing and decompressing each blob on the client side was too much, and the network transfers fast enough to effectively seeing slower builds with compression enabled. Enabling only on server side is also a possibility, which might reduce used storage and increase the effective cache size, but is something that I haven't tried as I suspect it would put too much additional load on the server for too little gain. I guess it all depends on the available resources (Is network fast? Are there a lot of cpus?) and the build profile (how "compressable" are the artifacts?).

Regardless, I've added support for zstd compression, but with a caveat: it does not work with a bazel-remote backend. The reason is that bazel-remote itself expects the backends to return the blobs just like they were pushed by the server (it is actually a bit more subtle, it expects all blobs to start with the casblob header, the content of it can technically change, e.g. use different frame sizes). However bazel remote would always serve the blob after stripping away the header, so it is not behaving as a valid proxy backend. This is true regardless of the protocol used, as of now bazel-remote cannot be used as a http backend for another bazel-remote instace with compression enabled (http additionally suffers from #524).
I though of a few ways to solve this but none of them was great:

  1. Have the grpc (and http?) proxy implementation to decompress and recompress all blobs coming from the backend and add the header, which would probably require writing to file and read back. Overall seems just too much overhead to be an acceptable solution.
  2. Instead of decompressing and recompressing, attempt to recreate the header directly from the compressed stream. The header is mostly used to quickly find the uncompressed offset in the compressed stream, which could be done by decoding each frame and accumulating the Frame_Content_Size field of each frame header (provided it is present). This would still require to consume the entire stream and read it back from a temporary storage, but would save recompressing, which is usually the most expensive part.
  3. Somewhat inspired by the above, drop the header all together and find offsets on-request by parsing the frame headers. Not sure if there is something that is currently done that wouldn't be possible this way.
  4. Have bazel-remote stream back the header as well, since they are valid zstd (skippable) frames. For requests where the offeset is != 0, the header should be updated to match the offset of the truncated stream, but that might be fast enough and a rear enough case. The downside is that each request will stream additional bytes, which might be a significant relative increase for small blobs. Maybe this should be put behind a flag so that it can be enabled conditionally for bazel-remote instance that are supposed to be used as backends.

Curious to know if the above makes sense to you or I have completely misread how this works or if you have a better solution in mind. Unless I am missing something obvious, this would need some thinking, and maybe out of scope for this feature request since it would still have feature parity with http?

case cache.CAS:
// We don't know the size, so send a FetchBlob request first to get the digest
// TODO: consider passign the expected blob size to the proxy?
decoded, err := hex.DecodeString(hash)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why decode the string, just to encode it again below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First we decode it into a byte array, then we encode it into a base64 string, just doing the opposite of what it's done in https://github.com/buchgr/bazel-remote/blob/master/server/grpc_asset.go#L74-L81

}
return true, size
case cache.CAS:
decoded, err := hex.DecodeString(hash)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why decode the string, just to encode it again below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Name: "checksum.sri",
Value: fmt.Sprintf("sha256-%s", base64.StdEncoding.EncodeToString(decoded)),
}
freq := asset.FetchBlobRequest{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit confusing, why is this using the remote asset API instead of BatchReadBlobs or the bytestream API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to know the size of the blob first. The alternative is passing it as parameter to this function.

Qualifiers: []*asset.Qualifier{&q},
}

res, err := r.clients.asset.FetchBlob(ctx, &freq)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the remote asset API instead of FindMissingBlobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

config/config.go Outdated Show resolved Hide resolved
README.md Outdated
Comment on lines 246 to 248
--grpc_proxy.url value The base URL to use for a grpc proxy backend, e.g.
localhost:9090 or example.com:7070.
[$BAZEL_REMOTE_GRPC_PROXY_URL]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably provide a way to use TLS when talking to the proxy backend and not using mTLS. It looks like this PR uses TLS when authenticating with the proxy backend via mTLS (which makes sense), otherwise TLS is not used at all when talking to the proxy backend?

@AlessandroPatti AlessandroPatti force-pushed the apatti/grpcproxy branch 2 times, most recently from 69e7284 to 054cfca Compare August 17, 2023 21:45
@AlessandroPatti AlessandroPatti force-pushed the apatti/grpcproxy branch 2 times, most recently from fdffcd5 to cf04071 Compare August 17, 2023 22:09
@@ -123,6 +124,23 @@ func getLogin(ctx context.Context) (username, password string, err error) {

return username, password, nil
}

if k == "authorization" && len(v) > 0 && strings.HasPrefix(v[0], "Basic ") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to get it working with the :authority header like the above, but couldn't:

  • Passing <user>:<pass>@address directly to grpc.Dial fails with too many colons in address
  • Setting the header explicitly as metadata is simply ovveridden
  • Using grpc.WithAuthority seems to work, but fails with certificate is valid for localhost, not <user>:<pass>@localhost when TLS is enabled.

Supporting authorization as a header wouldn't be too bad anyway, it's the recommended way to do basic auth

with the proxy backend using mTLS. If this flag is provided, then
grpc_proxy.key_file must also be specified.
[BAZEL_REMOTE_GRPC_PROXY_CERT_FILE]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not realise that the http proxy actualy supports basic auth, just need to pass the credentials in the proxy url. I took a stab at doing the same for the grpc proxy.

@AlessandroPatti
Copy link
Contributor Author

Hi @mostynb, any blocker on merging this?

@mostynb
Copy link
Collaborator

mostynb commented Aug 31, 2023

Hi @mostynb, any blocker on merging this?

HI, sorry this is taking me a while. I wanted to make a release before landing this, because I will probably want to make a few changes afterwards. That is done now (after a small hiccup in 2.4.2 and a rushed out 2.4.3 release).

I will try to find time to look at this again tomorrow- thanks for persevering.

@AlessandroPatti
Copy link
Contributor Author

@mostynb any update on this one?

@mostynb
Copy link
Collaborator

mostynb commented Sep 25, 2023

I finally got back to reviewing this, sorry for the delay. I will continue tomorrow.

@mostynb
Copy link
Collaborator

mostynb commented Oct 1, 2023

I landed a squashed version of this- I think it will be easier to iterate on this on master with a few smaller PRs than to drag this one on any longer. Thanks again for being so patient.

@mostynb mostynb closed this Oct 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants