Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add write stall support #684

Merged
merged 45 commits into from
Oct 24, 2024
Merged

feat: add write stall support #684

merged 45 commits into from
Oct 24, 2024

Conversation

Tulsishah
Copy link
Contributor

@Tulsishah Tulsishah commented Oct 10, 2024

Fixes #686

It's a good idea to open an issue first for discussion.

  • Tests pass
  • Appropriate changes to README are included in PR

Copy link

🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use automerge label. Good luck human!

-- conventional-commit-lint bot
https://conventionalcommits.org/

testbench/common.py Outdated Show resolved Hide resolved
tests/test_testbench_retry.py Show resolved Hide resolved
tests/test_testbench_retry.py Show resolved Hide resolved
testbench/rest_server.py Show resolved Hide resolved
testbench/common.py Show resolved Hide resolved
tests/test_testbench_retry.py Show resolved Hide resolved
Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few suggestions, generally looking good!

testbench/common.py Show resolved Hide resolved
testbench/rest_server.py Outdated Show resolved Hide resolved
testbench/rest_server.py Show resolved Hide resolved
tests/test_testbench_retry.py Show resolved Hide resolved
@Tulsishah Tulsishah changed the title Write stall support feat: add write stall support Oct 18, 2024
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.75%. Comparing base (61bce3c) to head (7b7a2fe).
Report is 50 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #684      +/-   ##
==========================================
+ Coverage   98.69%   98.75%   +0.06%     
==========================================
  Files          50       50              
  Lines        8492     8869     +377     
==========================================
+ Hits         8381     8759     +378     
+ Misses        111      110       -1     
Flag Coverage Δ
unittests 98.75% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Tulsishah Tulsishah marked this pull request as ready for review October 18, 2024 04:39
@Tulsishah Tulsishah requested a review from a team as a code owner October 18, 2024 04:39
@Tulsishah
Copy link
Contributor Author

Few suggestions, generally looking good!

Done. PTAL

@Tulsishah Tulsishah requested a review from tritone October 21, 2024 05:23
Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

testbench/rest_server.py Show resolved Hide resolved
tests/test_testbench_retry.py Outdated Show resolved Hide resolved
@tritone tritone merged commit dc200a3 into googleapis:main Oct 24, 2024
13 checks passed
cojenco added a commit that referenced this pull request Jan 6, 2025
* chore: merge from public circa 2024-10-24

* chore: remove Notification, HMAC, SA grpc support (#693)

* chore: remove Notification, HMAC, SA grpc support

* fix lint

* feat: add write stall support  (#684)

* add code for write stall

* fix test

* remove unnecessary files

* remove unnecessary files

* write test

* undo test changes to remove unnecessary changes

* Update test_testbench_retry.py

* add test

* remove .idea files

* write stall changes

* remove .idea file

* Update test_testbench_retry.py

* test

* stall once for identiacal req

* add comment

* remove .idea file

* fix unit test

* fix unit test

* test changes

* test changes

* review comments

* remove .idea files

* lint fixes

* lint fixes

* lint fixes

* lint fixes

* lint fixes

* code patch fix

* support full uploads

* remove unnecessary things

* remove unnecessary things

* remove unnecessary things

* adding comment

* lint fix

* lint fix

* stall should not happen if uploaded less amount of data then stall size

* stall should not happen if uploaded less amount of data then stall size

* remove last two commit changes

* remove env files

* lint fix

* lint fix

* review comment and adding scenario where upload size is less then stall byte size in single shot

* lint fix

* lint fix

* chore: remove Notification, HMAC, SA grpc support (#693)

---------

Co-authored-by: Mend Renovate <bot@renovateapp.com>
Co-authored-by: Tulsi Shah <46474643+Tulsishah@users.noreply.github.com>
danielduhh pushed a commit that referenced this pull request Jan 7, 2025
* ci: disable codecov uploads (#3)

* impl: add bidi streaming read patches (#1)

Add the bi-di streaming read RPCs as patches. In follow up PRs we can
implement the bidi functions. These patches may need updates from time
to time, as the line number hints change.

* chore: update patches to meet C++ requirements (#4)

I would prefer if we use the same patches across all the SDKs. C++ has
more strict requirements to generate the GAPICs, gRPC, and proto
libraries.

* chore: update proto patches circa 2024-06-13 (#6)

* chore: update patches c.20240612

* generate storage/v2 files

* generate google/ files

* Revert "generate google/ files"

This reverts commit 51217d21f933892bfca252d6c3c6294f68f53d2e.

* Revert "generate storage/v2 files"

This reverts commit c57c13a4bb8a9f4b8bd8f2e54232b4befe2de742.

* fix venv and update protos

* update readme

* feat: BidiRead initial scaffolding n ranges in 1 stream (#7)

* feat: BidiRead initial scaffolding n ranges in 1 stream

* attempt using a priorityQueue with bidireads

* feat: add crc32c checksum to BidiRead (#9)

* test: BidiRead object not found error handling (#10)

* feat: BidiReadObjectError part 1 object not found

* use grpc.StatusCode.value

* for discussion, return in error details

* object resolution errors, not found and precondition failed

* remove extra line change

* feat: pack BidiReadObjectError in error details for out of range (#11)

* feat: pack BidiReadObjectError in error details for out of range

* review comments

* fix(grpc_server): Higher fidelity BidiReadObject. (#12)

* fix(grpc_server): Higher fidelity BidiReadObject.

The emulator behaved slightly differently than the real implementation.
This brings its behavior on first message closer to the spec.

* feat(appendable): Appendable object proto support (#13)

Patch the BidiWriteObject protos to support appendable object
BidiWriteObject RPCs.

While I'm here, update `setup.py` to pin the correct version of
grpcio-tools for proto compilation, and have `update-protos.sh` use the
already `.gitignore`'d directory `.googleapis` as the base if none is
specified.

* fix: Correctly handle aborts with multiple ranges. (#16)

We never polled the `responses` queue after an `"abort"` message was
dequeued. That can stop us from ever joining `gather_thread` if another
range enqueues a response first, because the queue capacity is fixed. We
want to keep the queue capacity fixed so that we can increase the
likelihood of interleaving responses from multiple ranges.

Fix by draining the queue in the `finally` block before reraising.

Also, correct `abort` to `abort_with_status`: if the error is a `grpc.Status`,
we're meant to call `context.abort_with_status` rather than
`context.abort`.

* feat(grpc_retry): support retry bidi read object for conformance test. (#15)

* pull last changes

* add retry for bidi-read

* remove venv

* add abort

* fix: Correctly handle aborts with multiple ranges. (#16)

We never polled the `responses` queue after an `"abort"` message was
dequeued. That can stop us from ever joining `gather_thread` if another
range enqueues a response first, because the queue capacity is fixed. We
want to keep the queue capacity fixed so that we can increase the
likelihood of interleaving responses from multiple ranges.

Fix by draining the queue in the `finally` block before reraising.

Also, correct `abort` to `abort_with_status`: if the error is a `grpc.Status`,
we're meant to call `context.abort_with_status` rather than
`context.abort`.

* push chris patch

* modify tests

* pull last changes

* remove venv

* remove lint issues

* remove lint issues1

* remove lint issues2

* bring broken_stream down for better readability

* remove venv

---------

Co-authored-by: Chris Carlon <carlon@google.com>

* fix: Support client cancellation in BidiReadObject (#17)

Client cancellation surfaces as an exception in
`next(request_iterator)`. That wasn't handled before, so gather_thread
simply failed and never put the `"terminate"` action on the response
queue.

* feat: Handle appendable objects in BidiWriteObject (#18)

Appendable objects extend BidiWriteObject calls. This is a refactor and
a mostly-accurate implementation of appendable objects in the testbench.

There's a key difference - this simulates true appendable using
resumable uploads, which are not visible in ListObjects or similar calls
in the same way that appendable objects are. That's sufficient for
current testing.

* fix: Correct BidiReadObject early termination. (#19)

When a stream ends with early termination, but we still return some
bytes, the response should be consistent with the actual bytes returned.

Specifically, we adjust the read_limit to match the actual range
returned, and adjust the range_end bool to indicate that we know there
is more data to request.

* fix: Don't raise a bare string (#21)

This was a bug - I should have raised a RuntimeError. It doesn't come up
because it's meant to be an impossible codepath, but it's nicer to raise
a real error in that case.

* fix: Allow BidiWriteObjectRequest with no data. (#22)

BidiWriteObjectRequest is allowed to contain no checksummed_data.

* build: update cloudbuild for private images (#23)

* chore: merge from public circa 2024-10-24 (#25)

* chore: merge from public circa 2024-10-24

* chore: remove Notification, HMAC, SA grpc support (#693)

* chore: remove Notification, HMAC, SA grpc support

* fix lint

* feat: add write stall support  (#684)

* add code for write stall

* fix test

* remove unnecessary files

* remove unnecessary files

* write test

* undo test changes to remove unnecessary changes

* Update test_testbench_retry.py

* add test

* remove .idea files

* write stall changes

* remove .idea file

* Update test_testbench_retry.py

* test

* stall once for identiacal req

* add comment

* remove .idea file

* fix unit test

* fix unit test

* test changes

* test changes

* review comments

* remove .idea files

* lint fixes

* lint fixes

* lint fixes

* lint fixes

* lint fixes

* code patch fix

* support full uploads

* remove unnecessary things

* remove unnecessary things

* remove unnecessary things

* adding comment

* lint fix

* lint fix

* stall should not happen if uploaded less amount of data then stall size

* stall should not happen if uploaded less amount of data then stall size

* remove last two commit changes

* remove env files

* lint fix

* lint fix

* review comment and adding scenario where upload size is less then stall byte size in single shot

* lint fix

* lint fix

* chore: remove Notification, HMAC, SA grpc support (#693)

---------

Co-authored-by: Mend Renovate <bot@renovateapp.com>
Co-authored-by: Tulsi Shah <46474643+Tulsishah@users.noreply.github.com>

* test: enable GCS bucket logs (#26)

* use gcs for logs instead

* enable both gcs bucket and monitoring

* feat(append): Import new proto patch. (#29)

Internally generated the public proto as if all Bidi APIs were external,
and regenerated this patch against .googleapis.

Kept everything in one patch for simplicity.

* feat: Only RAPID supports appendable objects. (#30)

* feat: Only RAPID supports appendable objects.

This restriction is subject to change.

* address lint issues

* feat(append): Support instructions for redirect. (#31)

Appendable objects support a redirection protocol. Support instructions
to force redirection errors. The redirection string itself is opaque, so
for caller convenience we allow any hyphen-separated string of lowercase
alphabetic characters.

* feat(append): Redirect-with-handle instructions. (#32)

Redirection may or may not include a write_handle. This supports
instructions for returning a redirect that _does_ include the
write_handle.

* feat(append): Update Object with finalize_time. (#33)

finalize_time can be used to identify unfinalized objects, i.e.
appendable objects.

* feat(append): Appendable object creation. (#34)

Insert appendable objects into metadata immediately upon creation.

This is not quite full appendable object semantics, but it's sufficient
to test preconditions.

* fix: Correct finalize condition. (#35)

Uploads sometimes have nonzero metadata, e.g. when a test upload
specifies generation -1. We really wanted to finalize appendable
uploads, so fix the condition to be explicit.

* feat(append): Add tests for finalizing appends. (#36)

This surfaced a bug when checking that the live version is finalized
before allowing an overwrite: I had incorrectly applied De Morgan's Law.

* fix: signal terminate in BidiReadObject (#37)

* docs: how to merge from public upstream (#40)

* docs: how to merge from public upstream

* update contents

* feat: Support error code for single shot upload (#699) (#39)

* support error for single shot upload

* remove .idea files

* lint fix

* review comments

* review comments

* review comments

* lint fix

* lint fix

* lint fix

Co-authored-by: Tulsi Shah <46474643+Tulsishah@users.noreply.github.com>

* chore: Timeout with cancellation in BidiReadObject (#42)

We want to limit the total amount of time any one request can monopolize
a thread on the gRPC server, so we have a 10 second timer to terminate
the request. However, the existing code ends the stream with OK, which
is misleading to clients.

By switching to cancellation, we no longer need to set a timeout on the
gather thread join (since it observes an error from
next(request_iterator)) and we return a CANCELLED error which correctly
indicates to client code that they probably wrote a bug.

Now, tests with bugs that keep the stream open for more than 10 seconds
look like:

```
=== RUN   TestRetryConformance/grpc-1-[return-reset-connection_return-reset-connection]-storage.objects.get-1
    retry_conformance_test.go:696: want success, got rpc error: code = Canceled desc = CANCELLED
```

Tests pass when code correctly cleans up channels/RPCs.

N.B. if we ever need longer than 10 second RPCs and we _don't_ want them
to end in a CANCELLED error, we will have to revisit this.

* fix: Disable pipelining in BidiReadObject. (#43)

The real implementation of BidiReadObject may pipeline requests,
concurrently serving reads issued in separate messages. Unfortunately,
as best we can tell, gRPC in synchronous Python makes it hard to abort
the next(request_iterator) call when there's an error with a concurrent
read.

So we go back to a previous iteration of this code, which handled
messages in batches. We no longer need a response queue or a thread
pool.

* chore: Update bidi proto patch. (#44)

Deprecated fields have been removed from the proto we will upstream, and
I renamed read_limit in ReadRange to read_length (since it's a length!).

Verified with testbench unit tests and Go prelaunch SDK emulator tests.

* Revert "docs: how to merge from public upstream (#40)"

This reverts commit e96974d.

* Revert "test: enable GCS bucket logs (#26)"

This reverts commit cc704ab.

* Revert "build: update cloudbuild for private images (#23)"

This reverts commit 86db6be.

* Revert "feat: Only RAPID supports appendable objects. (#30)"

This reverts commit 47c63fa.

* update for public merge

---------

Co-authored-by: Carlos O'Ryan <coryan@google.com>
Co-authored-by: Chris Carlon <carlon@google.com>
Co-authored-by: shubham-diwakar <hastagmadrista@gmail.com>
Co-authored-by: Frank Natividad <franknatividad@google.com>
Co-authored-by: Mend Renovate <bot@renovateapp.com>
Co-authored-by: Tulsi Shah <46474643+Tulsishah@users.noreply.github.com>
Co-authored-by: Frank Natividad <frankyn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support stalls for resumable write
3 participants