Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add throttling counter in gcsio and refactor retrying #32428

Merged
merged 20 commits into from
Sep 18, 2024

Conversation

shunping
Copy link
Contributor

@shunping shunping commented Sep 11, 2024

Implement an on_error callback where throttling counter is incremented when GCS throttling happens.

  • A new retry object is needed for invoking a custom on_error callback. We derive this retry object based on the default retry object from gcs client library.
  • Default retry object is replaced with the new retry object in all API calls.
  • Both copy_batch and delete_batch are reimplemented without client.batch() so retry will happen if a retriable exception is caught.
  • Two new pipeline options are introduced here: --no_gcsio_throttling_counter and --enable-gcsio-blob-generation.

This is a second attempt for this feature.

  • We use the on_error() callback here, because there is no other callback when a retry happens.
  • Additionally, when on_error() is invoked, only the raised exception is provided as an argument. To find out the exact time caused by GCS throttling (i.e. the sleep time due to throttling), we use inspect to get the sleep time from the caller of on_error().

The previous PR is at #31584.

@shunping shunping changed the title Gcsio throttling counter Add throttling counter in gcsio Sep 11, 2024
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label python.
R: @ahmedabu98 for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@shunping shunping marked this pull request as draft September 12, 2024 03:35
- Remove extra retries for copy, delete, _gcs_object.
- Remove the use of client.batch() as the function has no built-in
  retry.
@shunping
Copy link
Contributor Author

R: @andrewsg @Abacn

@shunping shunping marked this pull request as ready for review September 16, 2024 18:10
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@shunping shunping changed the title Add throttling counter in gcsio Add throttling counter in gcsio and refactor retrying Sep 16, 2024
Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the uploader / downloader part look good to me. Had an efficiency concern on removing batch operations. Should be easy to resolve if you agree.

sdks/python/apache_beam/io/gcp/gcsio.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/io/gcp/gcsio_retry.py Show resolved Hide resolved
Additionally, the variable name for the new retry object is changed.

Add a new pipeline option to enable the use of blob generation to
mitigate race conditions (at the expense of more http requests)
Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just had a minor comment

sdks/python/apache_beam/io/gcp/gcsio.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/io/gcp/gcsio.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/io/gcp/gcsio.py Show resolved Hide resolved
Copy link

codecov bot commented Sep 18, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 58.88%. Comparing base (e259f47) to head (2316d59).
Report is 41 commits behind head on master.

Files with missing lines Patch % Lines
sdks/python/apache_beam/io/gcp/gcsio.py 72.72% 6 Missing ⚠️
sdks/python/apache_beam/io/gcp/gcsio_retry.py 83.87% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #32428      +/-   ##
============================================
+ Coverage     58.86%   58.88%   +0.02%     
  Complexity     3072     3072              
============================================
  Files          1128     1129       +1     
  Lines        173921   173968      +47     
  Branches       3328     3328              
============================================
+ Hits         102374   102442      +68     
+ Misses        68204    68183      -21     
  Partials       3343     3343              
Flag Coverage Δ
python 81.57% <80.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shunping
Copy link
Contributor Author

The failed YAML Xlang Direct test is unrelated to the code change here.

@Abacn Abacn merged commit eb8639b into apache:master Sep 18, 2024
92 of 93 checks passed
reeba212 pushed a commit to reeba212/beam that referenced this pull request Dec 4, 2024
* Add retry instance that records throttling metric.

* Use retry with throttling counters by default. Add pipeline option.

* Fix lint

* Fix broken tests.

* Retrieve a more accurate throttling time from the caller frame.

* Apply yapf and linter

* Refactoring copy and delete

- Remove extra retries for copy, delete, _gcs_object.
- Remove the use of client.batch() as the function has no built-in
  retry.

* Fix a typo and apply yapf

* Use counter instead of counters in pipeline option.

Additionally, the variable name for the new retry object is changed.

Add a new pipeline option to enable the use of blob generation to
mitigate race conditions (at the expense of more http requests)

* Parameterize existing tests for the new pipeline options.

* Apply yapf

* Fix a typo.

* Revert the change of copy_batch and delete_batch and add warning in their docstring.

* Fix lint

* Minor change according to code review.

* Restore the previous tox.ini that got accidentally changed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants