-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update Storage.createFrom(BlobInfo, Path) to have 150% higher throughput #2059
Conversation
ITRetryConformanceTests are currently failing due to googleapis/storage-testbench#510 |
d29afd4
to
0a6f7c7
Compare
When uploading a file where we are able to rewind to an arbitrary offset, we can be more optimistic in the way we send requests to GCS. Add new code middleware to allow PUTing an entire file to GCS in a single request, and using query resumable session to recover from the specific offset in the case of retryable error. ## TODO 1. Add failure scenario: send more bytes than are specified in content-range header
google-cloud-storage/src/main/java/com/google/cloud/storage/ResumableSession.java
Outdated
Show resolved
Hide resolved
...loud-storage/src/main/java/com/google/cloud/storage/JsonResumableSessionFailureScenario.java
Show resolved
Hide resolved
google-cloud-storage/src/test/java/com/google/cloud/storage/FakeHttpServer.java
Show resolved
Hide resolved
...-cloud-storage/src/test/java/com/google/cloud/storage/ITJsonResumableSessionPutTaskTest.java
Show resolved
Hide resolved
...-cloud-storage/src/test/java/com/google/cloud/storage/ITJsonResumableSessionPutTaskTest.java
Show resolved
Hide resolved
...-cloud-storage/src/test/java/com/google/cloud/storage/ITJsonResumableSessionPutTaskTest.java
Show resolved
Hide resolved
...-cloud-storage/src/test/java/com/google/cloud/storage/ITJsonResumableSessionPutTaskTest.java
Show resolved
Hide resolved
...-cloud-storage/src/test/java/com/google/cloud/storage/ITJsonResumableSessionPutTaskTest.java
Show resolved
Hide resolved
...-cloud-storage/src/test/java/com/google/cloud/storage/ITJsonResumableSessionPutTaskTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had one comment on a todo; otherwise lgtm
|
||
@ParametersAreNonnullByDefault | ||
enum JsonResumableSessionFailureScenario { | ||
// TODO: send more bytes than are in the Content-Range header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't the server just stop reading bytes after content-range is satisfied? Guessing this would be more of a dataloss scenario client side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time thinking about this, and this would actually be a client side bug. Since the client computes both Content-Range and Content-Length if those don't sync up, we have a bug so not a server response failure we need to handle. I'll remove this TODO in the follow up PR I have queued up on my workstation to follow this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client side issue makes sense; if anything tests would serve to protect client changes that introduce such a bug.
When uploading a file where we are able to rewind to an arbitrary offset, we can be more optimistic in the way we send requests to GCS.
Add new code middleware to allow PUTing an entire file to GCS in a single request, and using query resumable session to recover from the specific offset in the case of retryable error.
Benchmark Results
Methodology
Generate a random file on disk of size
128KiB..2GiB
from/dev/urandom
, then upload the generated file usingStorage.createFrom(BlobInfo, Path)
.Perform each 4096 times.
Run on a c2-standard-60 instance is us-central1 against a regional bucket located in us-central1.
Results
The following summary of throughput in MiB/s as observed between the existing implementation, and the new implementation proposed in this PR.
Comparison
When comparing the new implementation to the existing implementation we get the following improvement to throughput (higher is better):