-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The remoting client currently buffers all fetched blobs in memory before storing to LMDB. #17065
Comments
One option would be to store into a temporary file, and then copy into LMDB with: pants/src/rust/engine/fs/store/src/local.rs Lines 317 to 331 in abd3bfe
But that would use more passes over the data than we strictly need. We do need to re-compute and validate the |
One other much more fundamental idea would be to actually begin to "size split" our That would have advantages for this codepath, but it would also have advantages for #17282, because if we chose our heuristics well, we could symlink directly from a large-file store, rather than first copying a large file out of the store and into a real file. cc @thejcannon |
This is an no-functionality-change refactoring of `store::remote::ByteStore::load_bytes_with` that's arguably cleaner and also step towards #11149. In particular: 1. that method doesn't need to take a closure any more, and thus is refactored to just be the "simplest": `load_bytes(...) -> Result<Option<Bytes>, String>` 2. that method previously didn't retry, and thus users had to do the retries themselves: this moves the retries to be fully within the `load_bytes` method itself, which is both easier to use, and keeps implementation details like gRPC (previously exposed as the `ByteStoreError::Grpc`/`tonic::Status` error variant) entirely contained to `store::remote::ByteStore` 3. to emphasise that last point, the `ByteStoreError` enum can thus become private, because it's an implementation detail of `store::remote::ByteStore`, no longer exposed in the public API Step 1 resolves (and removes) a TODO comment. That TODO references #17065, but this patch _doesn't_ fix that issue.
Opened #18048 for this idea. |
I've opened #18231 for this, since #18054 solves the basic "avoid buffering into memory" issue, without optimising the hashing. |
This fixes #17065 by having remote cache loads be able to be streamed to disk. In essence, the remote store now has a `load_file` method in addition to `load_bytes`, and thus the caller can decide to download to a file instead. This doesn't make progress towards #18048 (this PR doesn't touch the local store at all), but I think it will help with integrating the remote store with that code: in theory the `File` could be provided in a way that can be part of the "large file pool" directly (and indeed, the decision about whether to download to a file or into memory ties into that). This also does a theoretically unnecessary extra pass over the data (as discussed in #18231) to verify the digest, but I think it'd make sense to do that as a future optimisation, since it'll require refactoring more deeply (down into `sharded_lmdb` and `hashing`, I think) and is best to build on #18153 once that lands.
That's here:
pants/src/rust/engine/fs/store/src/remote.rs
Lines 383 to 402 in d8fba9a
For things like ~GB pytorch wheel zips; that could be problematic.
The text was updated successfully, but these errors were encountered: