-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor Update: Add out_of_range to ignored failure list for circuit_breaker #18583
Conversation
@coeuvre Can you please review? Minor update in circuit_breaker logic. |
src/main/java/com/google/devtools/build/lib/remote/circuitbreaker/CircuitBreakerFactory.java
Outdated
Show resolved
Hide resolved
|
||
public static final ImmutableSet<Class<? extends Exception>> DEFAULT_IGNORED_ERRORS = | ||
ImmutableSet.of(CacheNotFoundException.class); | ||
public static final Predicate<? super Exception> DEFAULT_IGNORED_ERRORS = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fundamentally the same as before. Since we already have the logic to check whether Bazel should retry on certain errors in Retrier
, we want to reuse that instead of repeating the logic here.
Currently, we always call circuitBreaker.recordFailure(e)
if Bazel encountered an error. If we move the call after the shouldRetry check, we know errors are not ignored when they are arrived in recordFailure
.
This makes CircuitBreaker no longer need to check whether it should ignore certain errors and makes it possible to use the CircuitBreaker for HTTP payload.
Sorry if I didn't make it clear before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback.
You mean call recordFailure if error is not one of these CANCELLED, UNKNOWN, DEADLINE_EXCEEDED, ABORTED, INTERNAL, UNAVAILABLE, RESOURCE_EXHAUSTED
.
If I take diff with available grpc errors then leftover grpc errors are INVALID_ARGUMENT
, NOT_FOUND
, ALREADY_EXISTS
, PERMISSION_DENIED
, UNAUTHENTICATED
, FAILED_PRECONDITION
OUT_OF_RANGE
, and UNIMPLEMENTED
At highlevel it seems that left over errors can be treated as ignored error. But I believe being explicit could be better. And here I only want NOT_FOUND
and OUT_OF_RANGE
to marked as ignored errror.
Also ignored errors are marked as success call for circuit breaker logic. code. The diff is not yet merged so I had to create pull request against master:HEAD.
I had two options then
- Add boolean argument
isErrorIgnored
as argument for recordFailure. - Or call
recordSuccess
on ignored errors.
What do you say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the logic to ignore failure to Retrier.
…dvanced/performance/iteration-speed PiperOrigin-RevId: 538270596 Change-Id: Id8c88674133417357efb4d2c34c661a017427744
…y need it. `OutputGroupInfo` always uses `Location.BUILTIN`. This change saves 8 bytes per `OutputGroupInfo` instance. There are likely more classes that don't need a `location` field (see existing `TODO` in `NativeInfo`), but I'm just focusing on `OutputGroupInfo` for now. PiperOrigin-RevId: 538327560 Change-Id: I6b10e72ae70a621cfe5bebf31d12ae4b88564d62
PiperOrigin-RevId: 538400959 Change-Id: I11d46083e6a161b96f8bb8588191a3ff99418883
PiperOrigin-RevId: 538401036 Change-Id: I244bfb99bd21c21367ac88ca197a916c1d034088
Keeping only import paths directories in correct order. Those consist only of "binfiles"/"genfiles", each "_virtual_imports" directory (if the proto_library using it is needed) and ".". Listing every .proto file on the command line had performance implications. PiperOrigin-RevId: 538456587 Change-Id: Iefe48db0d4efd1e7f8c0266595fd12aef5f71a7c
For an exported rule `foo_library`, `repr(foo_library)` now evaluates to `<rule foo_library>`, not `<rule>`, matching the behavior of native rules more closely. Fixes bazelbuild#17483 Closes bazelbuild#18392. PiperOrigin-RevId: 538458291 Change-Id: I331955655756a3c235be0a93f1394716ddf9849d
Also update all references in builtins to the symbol exported in this file. PiperOrigin-RevId: 538467997 Change-Id: I3eb93d8a8fe90691595207998322597836ec05c5
See: bazelbuild/examples#201 Closes bazelbuild#18581. PiperOrigin-RevId: 538469631 Change-Id: Ib0cfc6de5d9f0d2cc4fc045b47c2d8ce5a6b358e
…ule actions. The motivation behind this change is to eventually promote the warning to an error in the output case; this avoids surprising behavior such as described in bazelbuild#18579. I'm less confident that the input case (source directories) can be similarly promoted to an error, but I'm still extending the warning to all actions for symmetry. PiperOrigin-RevId: 538470433 Change-Id: Ic6c8a4cb353dda5f6397d286fe2e6faee28f896c
Update all references in builtins to the symbol exported in this file. PiperOrigin-RevId: 538479229 Change-Id: If4dc4eb4cb6943632c1b7e5908fe9ed61679bc09
… try) Rolling forward bazelbuild@975866a (which was rollbacked at bazelbuild@d51c75f) with fixes. - Introduce a `rules_java_builtin` repo in WORKSPACE prefix to avoid conflict with user defined rules_java. - `@bazel_tools//tools/jdk/*.bzl` loads from `rules_java_builtin` through repo-mappings. - `@local_jdk` was overridden in `jdk.WORKSPACE` to add repo_mapping for `rules_java`. - `jdk.WORKSPACE` explicitly loads from `rules_java_builtin` for JDK definitions and java toolchain definitions. - Allow using `__SKIP_WORKSPACE_PREFIX__` and `__SKIP_WORKSPACE_SUFFIX__` in WORKSPACE comment. - Fixed many tests by adjusting the WORKSPACE file content. - Re-export more symbols from `@bazel_tools` to be backward-compatible. Fixes bazelbuild#18551 Related: - bazelbuild#18373 - bazelbuild/rules_java#110 - bazelbuild#18423 Closes bazelbuild#18558. PiperOrigin-RevId: 538483417 Change-Id: I5223eec2c4b10131fc8c5b342237385ff2f56413
… case. This flag can set a by-mnemonic list of actions to allow locally, `--strategy_regexp` doesn't match against mnemonic but against the message. PiperOrigin-RevId: 538484992 Change-Id: Ib80302d38212515bb2e5dbd09ffa884fdaa12776
…eCommand. PiperOrigin-RevId: 538498468 Change-Id: I88cb9114bb370f42ecdeaac5b592b3a4e9ab4f08
…ency is copied correctly. Without this, toolchain dependencies do not get the same execution platform as the original target, and toolchain's exec dependencies are not necessarily executable in the original target's actions. PiperOrigin-RevId: 538541526 Change-Id: Icdcaa75c84d3a0f3f952aa491026e72ef88be099
Continuation of bazelbuild#18359 I ran multiple experiment and tried to find optimal failure threshold and failure window interval with different remote_timeout, for healthy remote cache, semi-healthy (overloaded) remote cache and unhealthy remote cache. As I described [here](bazelbuild#18359 (comment)) even with healthy remote cache there was 5-10% circuit trip and we were not getting the best result. Issue related to the failure count: 1. When the remote cache is healthy, builds are fast, and Bazel makes a high number of calls to the buildfarm. As a result, even with a moderate failure rate, the failure count may exceed the threshold. 2. Additionally, write calls, which have a higher probability of failure compared to other calls, are batched immediately after the completion of an action's build. This further increases the chances of breaching the failure threshold within the defined window interval. 3. On the other hand, when the remote cache is unhealthy or semi-healthy, builds are significantly slowed down, and Bazel makes fewer calls to the remote cache. Finding a configuration that works well for both healthy and unhealthy remote caches was not feasible. Therefore, changed the approach to use the failure rate, and easily found a configuration that worked effectively in both scenarios. Closes bazelbuild#18539. PiperOrigin-RevId: 538588379 Change-Id: I64a49eeeb32846d41d54ca3b637ded3085588528
When the digest size exceeds the max configured digest size by remote-cache, an "out_of_range" error is returned. These errors should not be considered as API failures for the circuit breaker logic, as they do not indicate any issues with the remote-cache service.
Related PRs
#18359
#18539