-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[6.3.0] Use failure_rate instead of failure count for circuit breaker #18559
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Copy of bazelbuild#18120: I accidentally closed bazelbuild#18120 during rebase and doesn't have permission to reopen. We have noticed that any problems with the remote cache have a detrimental effect on build times. On investigation we found that the interface for the circuit breaker was left unimplemented. To address this issue, implemented a failure circuit breaker, which includes three new Bazel flags: 1) experimental_circuitbreaker_strategy, 2) experimental_remote_failure_threshold, and 3) experimental_emote_failure_window. In this implementation, I have implemented failure strategy for circuit breaker and used failure count to trip the circuit. Reasoning behind using failure count instead of failure rate : To measure failure rate I also need the success count. While both the failure and success count need to be an AtomicInteger as both will be modified concurrently by multiple threads. Even though getAndIncrement is very light weight operation, at very high request it might contribute to latency. Reasoning behind using failure circuit breaker : A new instance of Retrier.CircuitBreaker is created for each build. Therefore, if the circuit breaker trips during a build, the remote cache will be disabled for that build. However, it will be enabled again for the next build as a new instance of Retrier.CircuitBreaker will be created. If needed in the future we may add cool down strategy also. e.g. failure_and_cool_down_startegy. closes bazelbuild#18136 Closes bazelbuild#18359. PiperOrigin-RevId: 536349954 Change-Id: I5e1c57d4ad0ce07ddc4808bf1f327bc5df6ce704
iancha1992
added
awaiting-review
PR is awaiting review from an assigned reviewer
team-Remote-Exec
Issues and PRs for the Execution (Remote) team
labels
Jun 8, 2023
coeuvre
approved these changes
Jun 13, 2023
iancha1992
removed
the
awaiting-review
PR is awaiting review from an assigned reviewer
label
Jun 13, 2023
copybara-service bot
pushed a commit
that referenced
this pull request
Jul 24, 2023
Baseline: 758b44d Release Notes: + Automatic code cleanup. (#18417) + Update CODEOWNERS for 6.3.0 (#18369) + Overrides specified by non-root modules no longer cause an error, and are silently ignored instead. They were originally treated as an error to allow for the future possibility of overrides in the transitive dependency graph working together; but we've deemed that infeasible (and even if it was, it'd be so complicated and confusing to users that it would not be a good addition). (#18388) + Add implementation deps support for Objective-C (#18372) + Update release notes scripts (#18400) + Prevent CredentialHelperEnvironment crash when invoking Bazel outside of a workspace. (#18430) + Use wall-time for credential helper invalidation (#18413) + blaze_util_posix: handle killpg failures (#18403) + Pass version to java_runtimes created by local_java_repository (#18415) + Add jsonproto option to query --output flag (#18438) + Don't eagerly flatten a `NestedSet` in `RepoMappingManifestAction` (#18419) + rules_go & rules_python are failing in Downstream CI with Bazel@HEAD (#18447) + Move credential helper setup into remote_helpers.sh so it can be reused by other shell tests. (#18453) + Wire credential helper to repository fetching. (#18429) + Updates/fixes to relnotes script (#18470) + Report percentual download progress in repository rules (#18471) + Support remote symlink outputs when building without the bytes. (#18476) + Enrich local BEP upload errors with file path and digest possible. (#18481) + Set `GTEST_SHARD_STATUS_FILE` in test setup (#18482) + Fix relnotes script (#18491) + Fix Xcode 14.3 compatibility (#18490) + Fix #18493. (#18514) + Extend the credential helper default timeout to 10s. (#18527) + Fix formatting of release notes (#18534) + Use extension rather than local names in ModuleExtensionMetadata (#18536) + [credentialhelper] Ignore all errors when writing stdin (#18540) + Improve error on invalid `-//foo` and `-@repo//foo` options (#18516) + Implement failure circuit breaker (#18541) + Actually check `TEST_SHARD_STATUS_FILE` has been touched (#18418) + Ignore hash string casing (#18414) + Error if repository name isn't supplied (#18425) + Track repo rule label attributes after the first non-existent one (#18412) + Add ServerCapabilities into RemoteExecutionClient (#18442) + RemoteExecutionService: support output_symlinks in ActionResult (#18441) + RemoteExecutionService: Action.Command to set output_paths (#18440) + Use local_termination_grace_seconds when testing LinuxSandbox availability (#18568) + Fix dangling string literal in `extension_metadata` docs (#18598) + Include actual MODULE.bazel location in stack traces (#18612) + Make cpp file extensions case sensitive again (#18552) + Fix error when script is run after the final tag is created. (#18638) + Fix WORKSPACE toolchain resolution with `--enable_bzlmod` (#18649) + Add `ActionExecutionMetadata` as a parameter to `ActionInputPrefetcher#prefetchFiles`. (#18656) + Use failure_rate instead of failure count for circuit breaker (#18559) + Update ignored_error logic for circuit_breaker (#18662) + Don't rewind the build if invocation id stays the same (#18670) + Fix potential memory leak in UI (#18659) + Test that a credential helper can supply credentials for bzlmod. (#18663) + Add flag --experimental_collect_code_coverage_for_generated_files. (#18664) + Options specified on the pseudo-command `common` in `.rc` files are now ignored by commands that do not support them as long as they are valid options for *any* Bazel command. Previously, commands that did not support all options given for `common` would fail to run. These previous semantics of `common` are now available via the new `always` pseudo-command. Closes #18130. (#18609) + Fix split post-processing of LLVM-based coverage (#18737) + Allow module extension usages to be isolated (#18727) + BEGIN_PUBLIC (#18729) + Declare credential helpers to be a stable feature. (#18752) + Add a new provider for injecting native libs in android_binary (#18753) + Properly handle invalid credential files (#18779) + The REPO.bazel and MODULE.bazel files are now also considered workspace boundary markers. (#18787) + Report remote execution messages as events (#18780) + Fail on isolated extension usages without imports (#18793) + Add changes to cc_shared_library from head to 6.3 (#18606) + Remove option to disable FJP. (#18791) + Update to latest turbine version (#18803) + None. None (#18808) + Wait for outputs downloads before emitting local BEP events that reference these outputs. (#18815) + Perform builtins injection for WORKSPACE-loaded bzl files. (#18819) + Fix non-declared symlink issue for local actions when BwoB. (#18817) + Make grep_includes optional inside cc_common.register_linkstamp_compile_action (#18823) + add feature on windows toolchain with right tag (#18654) + coverage_common.instrumented_files_info now has a metadata_files argument (#18838) + Download directory output for test actions (#18846) + Teach DexMapper to not separate synthetic classes from their context … (#18853) + **[Incompatible]** query --output=proto --order_output=deps now returns targets in topological order (previously there was no ordering). (#18870) + Revert "Don't eagerly flatten a `NestedSet` in `RepoMappingManifestAction` (#18419)" (#18886) + Additional source inputs can now be specified for compilation in cc_library targets using the additional_compiler_inputs attribute, and these inputs can be used in the $(location) function. Fixes #18766. (#18882) + Open-source Google test `ConvenienceSymlinkTest` (#18890) + Update Error Prone to 2.20.0 (#18885) + Check if json.gz files exist, not the gcov version. (#18889) + Lockfile updates (#18894) + handle exception instead of crashing (#18895) + Add a new provider for passing dex related artifacts in android_binary (#18899) + Prevent most side effects of yanked modules (#18908) + Restore the classic desugar tool in the Bazel 6.3.0 branch so that the Bazel Android tools can be built for 6.3.0 without breaking backwards compatibility (#18909) + Update java_tools to v12.5 (#18868) + Add ActionCacheStatistics to BEP (#18914) + Adjust --top_level_targets_for_symlinks (#18916) + Track dev/non-dev `use_extension` calls (#18918) + Overrides specified by non-root modules no longer cause an error, and are silently ignored instead. They were originally treated as an error to allow for the future possibility of overrides in the transitive dependency graph working together; but we've deemed that infeasible (and even if it was, it'd be so complicated and confusing to users that it would not be a good addition). (#18921) + Rollforward of https://github.com/bazelbuild/bazel/commit/482d2be27ab… (#18773) + Update Android tools to 0.27.2 for fixes to DexMapper for https://gith... (#18891) + Report dev/non-dev deps imported via non-dev/dev usages (#18922) + Add reverted 'isolate' changes (#18928) + Identify isolated extensions by exported name (#18923) + test-setup.sh: Attempt to raise the original signal once more (#18932) + Ignore broken classic desugar tests (#18933) + Disable UseCorrectAssertInTests by default (#18948) + Fix VS 2022 autodetection (#18960) + Fix absolute file paths showing up in lockfiles (#18993) + Add support for isolated extension usages to the lockfile (#19008) Acknowledgements: This release contains contributions from many people at Google, as well as amishra-u, Andreas Herrmann, Andy Hamon, andyrinne12, Benjamin Lee, Benjamin Peterson, Brentley Jones, Chirag Ramani, Christopher Rydell, Daniel Wagner-Hall, Ed Schouten, Fabian Brandstetter, Fabian Meumertzheim, Greg, Ivan Golub, Jon Landis, JY Lin, Kai Zhang, Keith Smiley, kotlaja, lripoche, oquenchil, Pavan Singh, Rasrack, Son Luong Ngoc, Takeo Sawada, Vertexwahn, Xùdōng Yáng, Yannic.
iancha1992
pushed a commit
that referenced
this pull request
Jul 24, 2023
Baseline: 758b44d Release Notes: + Automatic code cleanup. (#18417) + Update CODEOWNERS for 6.3.0 (#18369) + Overrides specified by non-root modules no longer cause an error, and are silently ignored instead. They were originally treated as an error to allow for the future possibility of overrides in the transitive dependency graph working together; but we've deemed that infeasible (and even if it was, it'd be so complicated and confusing to users that it would not be a good addition). (#18388) + Add implementation deps support for Objective-C (#18372) + Update release notes scripts (#18400) + Prevent CredentialHelperEnvironment crash when invoking Bazel outside of a workspace. (#18430) + Use wall-time for credential helper invalidation (#18413) + blaze_util_posix: handle killpg failures (#18403) + Pass version to java_runtimes created by local_java_repository (#18415) + Add jsonproto option to query --output flag (#18438) + Don't eagerly flatten a `NestedSet` in `RepoMappingManifestAction` (#18419) + rules_go & rules_python are failing in Downstream CI with Bazel@HEAD (#18447) + Move credential helper setup into remote_helpers.sh so it can be reused by other shell tests. (#18453) + Wire credential helper to repository fetching. (#18429) + Updates/fixes to relnotes script (#18470) + Report percentual download progress in repository rules (#18471) + Support remote symlink outputs when building without the bytes. (#18476) + Enrich local BEP upload errors with file path and digest possible. (#18481) + Set `GTEST_SHARD_STATUS_FILE` in test setup (#18482) + Fix relnotes script (#18491) + Fix Xcode 14.3 compatibility (#18490) + Fix #18493. (#18514) + Extend the credential helper default timeout to 10s. (#18527) + Fix formatting of release notes (#18534) + Use extension rather than local names in ModuleExtensionMetadata (#18536) + [credentialhelper] Ignore all errors when writing stdin (#18540) + Improve error on invalid `-//foo` and `-@repo//foo` options (#18516) + Implement failure circuit breaker (#18541) + Actually check `TEST_SHARD_STATUS_FILE` has been touched (#18418) + Ignore hash string casing (#18414) + Error if repository name isn't supplied (#18425) + Track repo rule label attributes after the first non-existent one (#18412) + Add ServerCapabilities into RemoteExecutionClient (#18442) + RemoteExecutionService: support output_symlinks in ActionResult (#18441) + RemoteExecutionService: Action.Command to set output_paths (#18440) + Use local_termination_grace_seconds when testing LinuxSandbox availability (#18568) + Fix dangling string literal in `extension_metadata` docs (#18598) + Include actual MODULE.bazel location in stack traces (#18612) + Make cpp file extensions case sensitive again (#18552) + Fix error when script is run after the final tag is created. (#18638) + Fix WORKSPACE toolchain resolution with `--enable_bzlmod` (#18649) + Add `ActionExecutionMetadata` as a parameter to `ActionInputPrefetcher#prefetchFiles`. (#18656) + Use failure_rate instead of failure count for circuit breaker (#18559) + Update ignored_error logic for circuit_breaker (#18662) + Don't rewind the build if invocation id stays the same (#18670) + Fix potential memory leak in UI (#18659) + Test that a credential helper can supply credentials for bzlmod. (#18663) + Add flag --experimental_collect_code_coverage_for_generated_files. (#18664) + Options specified on the pseudo-command `common` in `.rc` files are now ignored by commands that do not support them as long as they are valid options for *any* Bazel command. Previously, commands that did not support all options given for `common` would fail to run. These previous semantics of `common` are now available via the new `always` pseudo-command. Closes #18130. (#18609) + Fix split post-processing of LLVM-based coverage (#18737) + Allow module extension usages to be isolated (#18727) + BEGIN_PUBLIC (#18729) + Declare credential helpers to be a stable feature. (#18752) + Add a new provider for injecting native libs in android_binary (#18753) + Properly handle invalid credential files (#18779) + The REPO.bazel and MODULE.bazel files are now also considered workspace boundary markers. (#18787) + Report remote execution messages as events (#18780) + Fail on isolated extension usages without imports (#18793) + Add changes to cc_shared_library from head to 6.3 (#18606) + Remove option to disable FJP. (#18791) + Update to latest turbine version (#18803) + None. None (#18808) + Wait for outputs downloads before emitting local BEP events that reference these outputs. (#18815) + Perform builtins injection for WORKSPACE-loaded bzl files. (#18819) + Fix non-declared symlink issue for local actions when BwoB. (#18817) + Make grep_includes optional inside cc_common.register_linkstamp_compile_action (#18823) + add feature on windows toolchain with right tag (#18654) + coverage_common.instrumented_files_info now has a metadata_files argument (#18838) + Download directory output for test actions (#18846) + Teach DexMapper to not separate synthetic classes from their context … (#18853) + **[Incompatible]** query --output=proto --order_output=deps now returns targets in topological order (previously there was no ordering). (#18870) + Revert "Don't eagerly flatten a `NestedSet` in `RepoMappingManifestAction` (#18419)" (#18886) + Additional source inputs can now be specified for compilation in cc_library targets using the additional_compiler_inputs attribute, and these inputs can be used in the $(location) function. Fixes #18766. (#18882) + Open-source Google test `ConvenienceSymlinkTest` (#18890) + Update Error Prone to 2.20.0 (#18885) + Check if json.gz files exist, not the gcov version. (#18889) + Lockfile updates (#18894) + handle exception instead of crashing (#18895) + Add a new provider for passing dex related artifacts in android_binary (#18899) + Prevent most side effects of yanked modules (#18908) + Restore the classic desugar tool in the Bazel 6.3.0 branch so that the Bazel Android tools can be built for 6.3.0 without breaking backwards compatibility (#18909) + Update java_tools to v12.5 (#18868) + Add ActionCacheStatistics to BEP (#18914) + Adjust --top_level_targets_for_symlinks (#18916) + Track dev/non-dev `use_extension` calls (#18918) + Overrides specified by non-root modules no longer cause an error, and are silently ignored instead. They were originally treated as an error to allow for the future possibility of overrides in the transitive dependency graph working together; but we've deemed that infeasible (and even if it was, it'd be so complicated and confusing to users that it would not be a good addition). (#18921) + Rollforward of https://github.com/bazelbuild/bazel/commit/482d2be27ab… (#18773) + Update Android tools to 0.27.2 for fixes to DexMapper for https://gith... (#18891) + Report dev/non-dev deps imported via non-dev/dev usages (#18922) + Add reverted 'isolate' changes (#18928) + Identify isolated extensions by exported name (#18923) + test-setup.sh: Attempt to raise the original signal once more (#18932) + Ignore broken classic desugar tests (#18933) + Disable UseCorrectAssertInTests by default (#18948) + Fix VS 2022 autodetection (#18960) + Fix absolute file paths showing up in lockfiles (#18993) + Add support for isolated extension usages to the lockfile (#19008) Acknowledgements: This release contains contributions from many people at Google, as well as amishra-u, Andreas Herrmann, Andy Hamon, andyrinne12, Benjamin Lee, Benjamin Peterson, Brentley Jones, Chirag Ramani, Christopher Rydell, Daniel Wagner-Hall, Ed Schouten, Fabian Brandstetter, Fabian Meumertzheim, Greg, Ivan Golub, Jon Landis, JY Lin, Kai Zhang, Keith Smiley, kotlaja, lripoche, oquenchil, Pavan Singh, Rasrack, Son Luong Ngoc, Takeo Sawada, Vertexwahn, Xùdōng Yáng, Yannic.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Continuation of #18359
I ran multiple experiment and tried to find optimal failure threshold and failure window interval with different remote_timeout, for healthy remote cache, semi-healthy (overloaded) remote cache and unhealthy remote cache.
As I described here even with healthy remote cache there was 5-10% circuit trip and we were not getting the best result.
Issue related to the failure count:
Finding a configuration that works well for both healthy and unhealthy remote caches was not feasible. Therefore, changed the approach to use the failure rate, and easily found a configuration that worked effectively in both scenarios.
Closes #18539
commit 10fb5f6