-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add retry wrapper to emulator #37
feat: add retry wrapper to emulator #37
Conversation
Codecov Report
@@ Coverage Diff @@
## main #37 +/- ##
==========================================
+ Coverage 97.06% 97.10% +0.03%
==========================================
Files 28 28
Lines 2935 3003 +68
==========================================
+ Hits 2849 2916 +67
- Misses 86 87 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
The emulator wraps each RPC with a "retry test", a mechanism to inject failures and exercise the retry path in the client libraries. This change adds the wrapper to the existing RPCs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests! Thanks for adding them!
* ci: disable codecov uploads (#3) * impl: add bidi streaming read patches (#1) Add the bi-di streaming read RPCs as patches. In follow up PRs we can implement the bidi functions. These patches may need updates from time to time, as the line number hints change. * chore: update patches to meet C++ requirements (#4) I would prefer if we use the same patches across all the SDKs. C++ has more strict requirements to generate the GAPICs, gRPC, and proto libraries. * chore: update proto patches circa 2024-06-13 (#6) * chore: update patches c.20240612 * generate storage/v2 files * generate google/ files * Revert "generate google/ files" This reverts commit 51217d21f933892bfca252d6c3c6294f68f53d2e. * Revert "generate storage/v2 files" This reverts commit c57c13a4bb8a9f4b8bd8f2e54232b4befe2de742. * fix venv and update protos * update readme * feat: BidiRead initial scaffolding n ranges in 1 stream (#7) * feat: BidiRead initial scaffolding n ranges in 1 stream * attempt using a priorityQueue with bidireads * feat: add crc32c checksum to BidiRead (#9) * test: BidiRead object not found error handling (#10) * feat: BidiReadObjectError part 1 object not found * use grpc.StatusCode.value * for discussion, return in error details * object resolution errors, not found and precondition failed * remove extra line change * feat: pack BidiReadObjectError in error details for out of range (#11) * feat: pack BidiReadObjectError in error details for out of range * review comments * fix(grpc_server): Higher fidelity BidiReadObject. (#12) * fix(grpc_server): Higher fidelity BidiReadObject. The emulator behaved slightly differently than the real implementation. This brings its behavior on first message closer to the spec. * feat(appendable): Appendable object proto support (#13) Patch the BidiWriteObject protos to support appendable object BidiWriteObject RPCs. While I'm here, update `setup.py` to pin the correct version of grpcio-tools for proto compilation, and have `update-protos.sh` use the already `.gitignore`'d directory `.googleapis` as the base if none is specified. * fix: Correctly handle aborts with multiple ranges. (#16) We never polled the `responses` queue after an `"abort"` message was dequeued. That can stop us from ever joining `gather_thread` if another range enqueues a response first, because the queue capacity is fixed. We want to keep the queue capacity fixed so that we can increase the likelihood of interleaving responses from multiple ranges. Fix by draining the queue in the `finally` block before reraising. Also, correct `abort` to `abort_with_status`: if the error is a `grpc.Status`, we're meant to call `context.abort_with_status` rather than `context.abort`. * feat(grpc_retry): support retry bidi read object for conformance test. (#15) * pull last changes * add retry for bidi-read * remove venv * add abort * fix: Correctly handle aborts with multiple ranges. (#16) We never polled the `responses` queue after an `"abort"` message was dequeued. That can stop us from ever joining `gather_thread` if another range enqueues a response first, because the queue capacity is fixed. We want to keep the queue capacity fixed so that we can increase the likelihood of interleaving responses from multiple ranges. Fix by draining the queue in the `finally` block before reraising. Also, correct `abort` to `abort_with_status`: if the error is a `grpc.Status`, we're meant to call `context.abort_with_status` rather than `context.abort`. * push chris patch * modify tests * pull last changes * remove venv * remove lint issues * remove lint issues1 * remove lint issues2 * bring broken_stream down for better readability * remove venv --------- Co-authored-by: Chris Carlon <carlon@google.com> * fix: Support client cancellation in BidiReadObject (#17) Client cancellation surfaces as an exception in `next(request_iterator)`. That wasn't handled before, so gather_thread simply failed and never put the `"terminate"` action on the response queue. * feat: Handle appendable objects in BidiWriteObject (#18) Appendable objects extend BidiWriteObject calls. This is a refactor and a mostly-accurate implementation of appendable objects in the testbench. There's a key difference - this simulates true appendable using resumable uploads, which are not visible in ListObjects or similar calls in the same way that appendable objects are. That's sufficient for current testing. * fix: Correct BidiReadObject early termination. (#19) When a stream ends with early termination, but we still return some bytes, the response should be consistent with the actual bytes returned. Specifically, we adjust the read_limit to match the actual range returned, and adjust the range_end bool to indicate that we know there is more data to request. * fix: Don't raise a bare string (#21) This was a bug - I should have raised a RuntimeError. It doesn't come up because it's meant to be an impossible codepath, but it's nicer to raise a real error in that case. * fix: Allow BidiWriteObjectRequest with no data. (#22) BidiWriteObjectRequest is allowed to contain no checksummed_data. * build: update cloudbuild for private images (#23) * chore: merge from public circa 2024-10-24 (#25) * chore: merge from public circa 2024-10-24 * chore: remove Notification, HMAC, SA grpc support (#693) * chore: remove Notification, HMAC, SA grpc support * fix lint * feat: add write stall support (#684) * add code for write stall * fix test * remove unnecessary files * remove unnecessary files * write test * undo test changes to remove unnecessary changes * Update test_testbench_retry.py * add test * remove .idea files * write stall changes * remove .idea file * Update test_testbench_retry.py * test * stall once for identiacal req * add comment * remove .idea file * fix unit test * fix unit test * test changes * test changes * review comments * remove .idea files * lint fixes * lint fixes * lint fixes * lint fixes * lint fixes * code patch fix * support full uploads * remove unnecessary things * remove unnecessary things * remove unnecessary things * adding comment * lint fix * lint fix * stall should not happen if uploaded less amount of data then stall size * stall should not happen if uploaded less amount of data then stall size * remove last two commit changes * remove env files * lint fix * lint fix * review comment and adding scenario where upload size is less then stall byte size in single shot * lint fix * lint fix * chore: remove Notification, HMAC, SA grpc support (#693) --------- Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Tulsi Shah <46474643+Tulsishah@users.noreply.github.com> * test: enable GCS bucket logs (#26) * use gcs for logs instead * enable both gcs bucket and monitoring * feat(append): Import new proto patch. (#29) Internally generated the public proto as if all Bidi APIs were external, and regenerated this patch against .googleapis. Kept everything in one patch for simplicity. * feat: Only RAPID supports appendable objects. (#30) * feat: Only RAPID supports appendable objects. This restriction is subject to change. * address lint issues * feat(append): Support instructions for redirect. (#31) Appendable objects support a redirection protocol. Support instructions to force redirection errors. The redirection string itself is opaque, so for caller convenience we allow any hyphen-separated string of lowercase alphabetic characters. * feat(append): Redirect-with-handle instructions. (#32) Redirection may or may not include a write_handle. This supports instructions for returning a redirect that _does_ include the write_handle. * feat(append): Update Object with finalize_time. (#33) finalize_time can be used to identify unfinalized objects, i.e. appendable objects. * feat(append): Appendable object creation. (#34) Insert appendable objects into metadata immediately upon creation. This is not quite full appendable object semantics, but it's sufficient to test preconditions. * fix: Correct finalize condition. (#35) Uploads sometimes have nonzero metadata, e.g. when a test upload specifies generation -1. We really wanted to finalize appendable uploads, so fix the condition to be explicit. * feat(append): Add tests for finalizing appends. (#36) This surfaced a bug when checking that the live version is finalized before allowing an overwrite: I had incorrectly applied De Morgan's Law. * fix: signal terminate in BidiReadObject (#37) * docs: how to merge from public upstream (#40) * docs: how to merge from public upstream * update contents * feat: Support error code for single shot upload (#699) (#39) * support error for single shot upload * remove .idea files * lint fix * review comments * review comments * review comments * lint fix * lint fix * lint fix Co-authored-by: Tulsi Shah <46474643+Tulsishah@users.noreply.github.com> * chore: Timeout with cancellation in BidiReadObject (#42) We want to limit the total amount of time any one request can monopolize a thread on the gRPC server, so we have a 10 second timer to terminate the request. However, the existing code ends the stream with OK, which is misleading to clients. By switching to cancellation, we no longer need to set a timeout on the gather thread join (since it observes an error from next(request_iterator)) and we return a CANCELLED error which correctly indicates to client code that they probably wrote a bug. Now, tests with bugs that keep the stream open for more than 10 seconds look like: ``` === RUN TestRetryConformance/grpc-1-[return-reset-connection_return-reset-connection]-storage.objects.get-1 retry_conformance_test.go:696: want success, got rpc error: code = Canceled desc = CANCELLED ``` Tests pass when code correctly cleans up channels/RPCs. N.B. if we ever need longer than 10 second RPCs and we _don't_ want them to end in a CANCELLED error, we will have to revisit this. * fix: Disable pipelining in BidiReadObject. (#43) The real implementation of BidiReadObject may pipeline requests, concurrently serving reads issued in separate messages. Unfortunately, as best we can tell, gRPC in synchronous Python makes it hard to abort the next(request_iterator) call when there's an error with a concurrent read. So we go back to a previous iteration of this code, which handled messages in batches. We no longer need a response queue or a thread pool. * chore: Update bidi proto patch. (#44) Deprecated fields have been removed from the proto we will upstream, and I renamed read_limit in ReadRange to read_length (since it's a length!). Verified with testbench unit tests and Go prelaunch SDK emulator tests. * Revert "docs: how to merge from public upstream (#40)" This reverts commit e96974d. * Revert "test: enable GCS bucket logs (#26)" This reverts commit cc704ab. * Revert "build: update cloudbuild for private images (#23)" This reverts commit 86db6be. * Revert "feat: Only RAPID supports appendable objects. (#30)" This reverts commit 47c63fa. * update for public merge --------- Co-authored-by: Carlos O'Ryan <coryan@google.com> Co-authored-by: Chris Carlon <carlon@google.com> Co-authored-by: shubham-diwakar <hastagmadrista@gmail.com> Co-authored-by: Frank Natividad <franknatividad@google.com> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Tulsi Shah <46474643+Tulsishah@users.noreply.github.com> Co-authored-by: Frank Natividad <frankyn@users.noreply.github.com>
The emulator wraps each RPC with a "retry test", a mechanism to inject
failures and exercise the retry path in the client libraries. This
change adds the wrapper to the existing RPCs.
Part of the work for #24