-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use REAPI batch API for small blob writes #12537
use REAPI batch API for small blob writes #12537
Conversation
[ci skip-build-wheels]
[ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
One thought - in terms of latency reduction, we probably want to spawn a thread doing a GetCapabilities in the background when a remote Store/ProcessExecutor is new'd up, rather than waiting on the first use. Or if not, we may want to race the first file upload using ByteStream against the GetCapabilities call.
Looking forward to seeing the batching!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome
There may be runs where local cache is hit entirely, and thus there would be no need to always send the For example, maybe Pants could remember the capabilities between runs (via pantsd or whatever) and just avoid making the call if it already has a recently-obtained |
Does |
I think the default in gRPC servers is a max size of 4MB, but that's technically configurable to be embiggened. I'd like to say trust that if a server has a limit they'll return it in capabilities? hyperium/tonic#264 suggests tonic itself doesn't introduce its own limits by default, so I don't think we need to worry about it on the client-side. |
Future Work
|
In #12537, Pants learned how to send small blob writes to a remote store via the `BatchUpdateBlobs` API of the REAPI instead of the `ByteStream.Write` streaming API. This PR similarly teaches Pants how to use the `BatchReadBlobs` API for small blob reads instead of the `ByteStream.Read` API. The goal is to reduce the overhead of small blob reads by avoiding the multiple message exchanges required by the streaming API. Note: This PR does not batch multiple small blob reads together, which is an approach that could be worthwhile to evaluate in the future.
…d#15969) In pantsbuild#12537, Pants learned how to send small blob writes to a remote store via the `BatchUpdateBlobs` API of the REAPI instead of the `ByteStream.Write` streaming API. This PR similarly teaches Pants how to use the `BatchReadBlobs` API for small blob reads instead of the `ByteStream.Read` API. The goal is to reduce the overhead of small blob reads by avoiding the multiple message exchanges required by the streaming API. Note: This PR does not batch multiple small blob reads together, which is an approach that could be worthwhile to evaluate in the future. # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
…ck of #15969) (#15997) In #12537, Pants learned how to send small blob writes to a remote store via the `BatchUpdateBlobs` API of the REAPI instead of the `ByteStream.Write` streaming API. This PR similarly teaches Pants how to use the `BatchReadBlobs` API for small blob reads instead of the `ByteStream.Read` API. The goal is to reduce the overhead of small blob reads by avoiding the multiple message exchanges required by the streaming API. Note: This PR does not batch multiple small blob reads together, which is an approach that could be worthwhile to evaluate in the future.
Problem
Pants currently uses the
ByteStream
API exposed by REAPI servers to write all blobs including small blobs. This incurs unnecessary network overhead when reading small blobs which could have been inlined into the request ifBatchUpdateBlobs
had been used.Solution
--remote-store-batch-api-size-limit
option which controls the size threshold for when to switch fromBatchUpdateBlobs
toBytesStream.Write
--remote-store-batch-api-size-limit
) and REAPI server capabilities (i.e.,max_batch_total_size_bytes
) allow a write via the batch API, then write via the batch API otherwise continue to useByteStream.Write
.