-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow remote retry max delay to be user configurable #16058
Conversation
71a4cbe
to
b14af12
Compare
@tjgq could I trouble you for a quick review of this configuration option? 🙇 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay. This PR seems reasonable to me, I have just a small comment.
@@ -154,6 +154,7 @@ public static class ExponentialBackoff implements Backoff { | |||
*/ | |||
ExponentialBackoff(Duration initial, Duration max, double multiplier, double jitter, | |||
int maxAttempts) { | |||
Preconditions.checkArgument(max.compareTo(initial) > 0, "max must be > initial"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a rule, Bazel should not crash due to malformed input. To avoid additional validation logic, how about we use max(initial, remoteRetryMaxDelay)
as the argument to ExponentialBackoff
below?
b14af12
to
10886cf
Compare
Thanks for checking this out, @tjgq! I made your suggested change, preventing Bazel from crashing on an invalid value here. |
Friendly ping, @tjgq |
Sorry for the delay, importing it now. |
@bazel-io flag |
@bazel-io fork 6.2.0 |
This introduces a new option `--remote_retry_max_delay` can be used to change the existing maximum exponential backoff interval used when retrying remote requests. Before this change, there was a hardcoded value controlling this maximum exponential backoff interval, set to `5s`. Rational `remote_retries` is useful in masking over temporary disruptions to a remote cluster. If a cluster experiences temporary downtime, it is useful to allow bazel clients to wait for a period of time for the cluster to recover before bailing and giving up. If users cannot configure the maximum exponential backoff delay, one must set a large number for `remote_retries`, each retry eventually waiting for up to 5s. This allows the bazel client to wait for a reasonable amount of time for the cluster to recover. The problem here is that under certain cluster failure modes, requests may not be handled and failed quickly, rather they may wait until `remote_timeout` before failing. A large `remote_timeout` combined with a large `remote_retries` could lead to waiting for a very long time before finally bailing on a given action. If a user can bump the `remote_retry_max_delay`, they can control the retry waiting semantics to their own needs. Closes bazelbuild#16058. PiperOrigin-RevId: 523680725 Change-Id: I21daba78b91d3157362ca85bb7b1cbbef8a94bb3
* Allow remote retry max delay to be user configurable This introduces a new option `--remote_retry_max_delay` can be used to change the existing maximum exponential backoff interval used when retrying remote requests. Before this change, there was a hardcoded value controlling this maximum exponential backoff interval, set to `5s`. Rational `remote_retries` is useful in masking over temporary disruptions to a remote cluster. If a cluster experiences temporary downtime, it is useful to allow bazel clients to wait for a period of time for the cluster to recover before bailing and giving up. If users cannot configure the maximum exponential backoff delay, one must set a large number for `remote_retries`, each retry eventually waiting for up to 5s. This allows the bazel client to wait for a reasonable amount of time for the cluster to recover. The problem here is that under certain cluster failure modes, requests may not be handled and failed quickly, rather they may wait until `remote_timeout` before failing. A large `remote_timeout` combined with a large `remote_retries` could lead to waiting for a very long time before finally bailing on a given action. If a user can bump the `remote_retry_max_delay`, they can control the retry waiting semantics to their own needs. Closes #16058. PiperOrigin-RevId: 523680725 Change-Id: I21daba78b91d3157362ca85bb7b1cbbef8a94bb3 * Replace RemoteDurationConverter with RemoteTimeoutConverter --------- Co-authored-by: Joel Jeske <joel.jeske@robinhood.com>
This introduces a new option `--remote_retry_max_delay` can be used to change the existing maximum exponential backoff interval used when retrying remote requests. Before this change, there was a hardcoded value controlling this maximum exponential backoff interval, set to `5s`. Rational `remote_retries` is useful in masking over temporary disruptions to a remote cluster. If a cluster experiences temporary downtime, it is useful to allow bazel clients to wait for a period of time for the cluster to recover before bailing and giving up. If users cannot configure the maximum exponential backoff delay, one must set a large number for `remote_retries`, each retry eventually waiting for up to 5s. This allows the bazel client to wait for a reasonable amount of time for the cluster to recover. The problem here is that under certain cluster failure modes, requests may not be handled and failed quickly, rather they may wait until `remote_timeout` before failing. A large `remote_timeout` combined with a large `remote_retries` could lead to waiting for a very long time before finally bailing on a given action. If a user can bump the `remote_retry_max_delay`, they can control the retry waiting semantics to their own needs. Closes bazelbuild#16058. PiperOrigin-RevId: 523680725 Change-Id: I21daba78b91d3157362ca85bb7b1cbbef8a94bb3
This introduces a new option
--remote_retry_max_delay
can be used to change the existing maximum exponential backoff interval used when retrying remote requests. Before this change, there was a hardcoded value controlling this maximum exponential backoff interval, set to5s
.Rational
remote_retries
is useful in masking over temporary disruptions to a remote cluster. If a cluster experiences temporary downtime, it is useful to allow bazel clients to wait for a period of time for the cluster to recover before bailing and giving up. If users cannot configure the maximum exponential backoff delay, one must set a large number forremote_retries
, each retry eventually waiting for up to 5s. This allows the bazel client to wait for a reasonable amount of time for the cluster to recover.The problem here is that under certain cluster failure modes, requests may not be handled and failed quickly, rather they may wait until
remote_timeout
before failing. A largeremote_timeout
combined with a largeremote_retries
could lead to waiting for a very long time before finally bailing on a given action.If a user can bump the
remote_retry_max_delay
, they can control the retry waiting semantics to their own needs.