-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove deprecated storage backends #176
Remove deprecated storage backends #176
Conversation
We are one of those connecting Buildbarn to bazel-remote via HTTP. Such Buildbarn setup works well for us today and I would be happy being able to continue using also HTTP. Bazel-remote supports also gRPC, but we get higher throughput and lower CPU usage with HTTP compared to gRPC. I believe that is because HTTP allows using Linux's sendfile(2) for transfers directly in kernel space, between file and socket. More concurrent TCP sessions makes it easier for us to achieve even balance between multiple network interfaces when using link aggregation. That is not a strict requirement to use HTTP, but a typical HTTP setup is using more concurrent TCP sessions than a typical gRPC setup. Querying capabilities is not mandatory for our use cases. But I agree that we lack FindMissingBlobs with HTTP. However, the magnitude of that problem depends on the types of builds performed and artifacts produced. I think one of Buildbarn's main strengths is the great architecture that allows many types of blobstore implementations. |
Though I can well imagine that using What is special about bazel-remote that CPU usage is so high, or your setup that it is so I/O intensive?
Even though I'm not denying there are workloads for which it's necessary to open multiple connections to get decent performance, I do doubt whether it's necessary in most distributed setups. Given a sufficiently large number of workers (e.g., n > 10) that each open a gRPC channel to a storage server, won't it already be the case that these connections will get spread out across network interfaces evenly?
It's not necessarily expensive to maintain, but over the last couple of years I did need to make changes to it to catch up with API changes to BlobAccess (most recently the overhaul of digest functions and GetFromComposite()). Every time I want to make any changes along those lines, I need to patch up about 15-20 BlobAccess implementations, which is getting rather cumbersome. |
Our org builds many artifacts, which tend to blow through the main CAS cache in O(days), but some teams would like select artifacts/builds persisted for O(months). Our team has been toying with setting up a PoC that uses bb-storage in front of a GCS bucket, and tooling around bb_copy/bb_replicator in order to allow users to selectively "archive" an entire build's output tree from the main instance to this GCS-backed instance. We could then use a properly-configured bb_clientd to mount it back, adding caching layers either on the server side in front of the bucket, or on the client side (or both). We would only need to add some out-of-band metadata-recording/tagging of archived builds to allow for later discovery, and cleanup; maybe we would even (ab)use this storage for distribution of built binaries. The attractive piece that this would remove is the ability to get a CAS instance backed by essentially unbounded storage. Do you have recommendations for any alternatives? Is this an inherently flawed plan, even if it is not in a build hot path, and aggressively cached? |
@EdSchouten: would you consider delaying the http backend removal for a bit, while we investigate the perf issues with bazel-remote/grpc? |
I’m not saying the CPU load is particularly high or a limiting factor with bazel-remote, for any of the protocols. But it consumes less CPU with HTTP than gRPC, and lower CPU usage gives more headroom, e.g. to compress blobs. What is more important than CPU is the higher throughput with HTTP. If I remember correctly, we got 30-40% higher throughput with HTTP compared to gRPC when parallelism was limited to <= 64 concurrent blob transfers. Then the difference decreased when the parallelism increased and at parallelism 2048 both HTTP and gRPC saturated the network at about the same throughput for both protocols. I guess that might be explained by lower latency for HTTP, maybe thanks to sendfile(2). All of this was on a fast local network with 2 x 10 Gbit/s network interfaces between each host, and with traffic from load generator. I guess the difference might be less noticeable if masked by a slower network. I’m also not sure how it translates to real builds with Buildbarn. I assume sendfile(2) only help when transferring data directly between socket and file (i.e. not when compressing on the fly, nor with SSL/TLS.) What I want to say is that there are pros and cons with both HTTP and gRPC and I think they both have a role to play. I guess in theory they could even be used in combination, calling FindMissingBlobs via gRPC and doing file transfers via HTTP.
I don’t know. I guess:
It would have been nice if Buildbarn could have a plugin system for separate BlobAccess implementations that do not fulfill the requirements to be officially supported. E.g. stored in a separate repository like bb-event-service so that you don’t need to bother updating them when you do changes to BlobAccess. But I don’t know how to architect such plugin system, or if it would make sense from your perspective? |
Responding to all messages at once. @minor-fixes
This is perfectly reasonable. I have never been against having an S3/GCS/... backend for the purpose of archiving. It's just that the version we had in the tree was intended to be used as a direct CAS for a remote caching/execution system, which is obviously a bad fit. @mostynb
Sure! Would 2023-11-01 work for you? So remove SizeDistinguishing and Redis on 2023-10-01 as planned, and leave HTTP in the tree for one more month. @ulrfa
Exactly. As soon as you want to do anything more advanced (compression and TLS as you stated, but also on-the-fly checksum validation), then it can no longer be used.
With other features such as TLS and compression enabled, the difference in performance between HTTP and gRPC ought to be negligible.
Go doesn't provide a lot of facilities for that. There's the
So that doesn't help. |
Ever since we published ADR #2 which introduced LocalBlobAccess, ShardingBlobAccess and MirroredBlobAccess, the writing has been on the wall that there would no longer be any place for backends such as Redis and SizeDistinguishing. The main reason I kept these two backends around, was because people in the community were still interested in seeing S3 support reappear. Because I have not received any serious proposal from members within the community to do this in a way that it meets the standards for inclusion, I think we've now reached the point where we can assume no work is going to happen in this area in the short term. Fixes: #175 Issue: #176
e483745
to
def1579
Compare
Redis and SizeDistinguishing have just been removed from the tree. As discussed above, HTTP will be removed on 2023-11-01. |
The HTTP backend was mainly of use to talk to bazel-remote, but as far as I know, that also supports gRPC perfectly fine nowadays. There is absolutely no reason to prefer HTTP over gRPC. The latter supports doing bulk existence checks, querying capabilities, while the former does not.
def1579
to
cb5d9df
Compare
Ever since we published ADR #2 which introduced LocalBlobAccess, ShardingBlobAccess and MirroredBlobAccess, the writing has been on the wall that there would no longer be any place for backends such as Redis and SizeDistinguishing.
The main reason I kept these two backends around, was because people in the community were still interested in seeing S3 support reappear. Because I have not received any serious proposal from members within the community to do this in a way that it meets the standards for inclusion, I think we've now reached the point where we can assume no work is going to happen in this area in the short term.
Furthermore, the HTTP backend was mainly of use to talk to bazel-remote, but as far as I know, that also supports gRPC perfectly fine nowadays. There is absolutely no reason to prefer HTTP over gRPC. The latter supports doing bulk existence checks, querying capabilities, while the former does not.
NOTE: My plan is to merge this change on 2023-10-01, assuming no compelling use cases for them are presented.