-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[automatic failover] Add dual thresholds (min num of failures + failure rate) capabililty to circuit breaker #4295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[automatic failover] Add dual thresholds (min num of failures + failure rate) capabililty to circuit breaker #4295
Conversation
…aiters()' in 'TrackingConnectionPool' (redis#4270) - remove the check for number of waitiers in TrackingConnectionPool
…redis#4268) - set maxtotal connections for echoStrategy
…cuitBreakerFailoverBase.clusterFailover' (redis#4275) * - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover - improve thread safety with provider initialization * - formatting
* - minor optimizations on fail fast * - volatile failfast
* - replace minConsecutiveSuccessCount with numberOfRetries - add retries into healtCheckImpl - apply changes to strategy implementations config classes - fix unit tests * - fix typo * - fix failing tests * - add tests for retry logic * - formatting * - format * - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies - new types probecontext, ProbePolicy, HealthProbeContext - add delayer executor pool to healthcheckımpl - adjustments on worker pool of healthCheckImpl for shared use of workers * - format * - expand comment with example case * - drop pooled executor for delays * - polish * - fix tests * - formatting * - checking failing tests * - fix test * - fix flaky tests * - fix flaky test * - add tests for builtin probing policies * - fix flaky test
* - move failover provider to mcf * - make iterateActiveCluster package private
redis#4291) * User-provided ssl config for lag-aware health check * ssl scenario test for lag-aware healthcheck * format * format * address review comments - use getters instead of fields
…#4293) * - implement max failover attempt - add tests * - fix user receive the intended exception * -clean+format * - java doc for exceptions * format * - more tests on excaption types in max failover attempts mechanism * format * fix failing timing in test * disable health checks * rename to switchToHealthyCluster * format
…t breaker executor - Map config to resilience4j via CircuitBreakerThresholdsAdapter - clean up/simplfy config: drop slow-call and window type - Add thresholdMinNumOfFailures; update some of the defaults - Update provider to use thresholds adapter - Update docs; align examples with new defaults - Add tests for 0% rate, edge thresholds
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds dual-threshold failover capability to Jedis multi-cluster circuit breaker functionality. The circuit breaker now requires both a minimum number of failures AND a failure rate threshold to be exceeded before triggering failover, providing more robust failover control.
Key changes:
- Introduces dual threshold logic requiring both minimum failures and failure rate criteria to be met
- Simplifies configuration by removing deprecated circuit breaker settings and updating defaults
- Adds new adapter class to handle threshold mapping to resilience4j configuration
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
MultiClusterClientConfig.java |
Updated configuration with new dual threshold properties and simplified circuit breaker settings |
CircuitBreakerThresholdsAdapter.java |
New adapter class for mapping configuration to resilience4j circuit breaker settings |
CircuitBreakerCommandExecutor.java |
Added dual threshold checking logic before triggering failover |
CircuitBreakerFailoverBase.java |
Base class with shared threshold validation logic |
JedisFailoverThresholdsExceededException.java |
New exception type for threshold-based failover |
| Various test files | Updated tests to remove deprecated configuration options and add threshold testing |
docs/failover.md |
Updated documentation with new defaults and simplified configuration |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverConnectionProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java
Outdated
Show resolved
Hide resolved
…Adapter.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- simplfy executer and cbfailoverconnprovider - adjust config getters - fix failing tests due to COUNT_BASED -> TIME_BASED - new tests for thresholds calculations and impact on circuit state transitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java
Outdated
Show resolved
Hide resolved
…o ali/aa-failover-cbthresholds
- add more test on threshold calculations - enable command line arg for overwriting surefire.excludedGroups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/main/java/redis/clients/jedis/mcf/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
…components (redis#4298) * - set & test default values * - format * - fix tests failing due to changing defaults
ggivo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still going trough the complete review, have some initial questions
src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java
Outdated
Show resolved
Hide resolved
| Metrics metrics = cluster.getCircuitBreaker().getMetrics(); | ||
| // ATTENTION: this is to increment fails in regard to the current call that is failing, | ||
| // DO NOT remove the increment, it will change the behaviour in case of initial requests to | ||
| // cluster fail | ||
| int fails = metrics.getNumberOfFailedCalls() + (lastFailRecorded ? 0 : 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get this one.
In which case, metrics are not increased yet?
src/test/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsTest.java
Outdated
Show resolved
Hide resolved
src/test/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsTest.java
Outdated
Show resolved
Hide resolved
src/test/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the new properties need to be renamed to align the naming with the design:
circuitBreakerMinNumOfFailures(int) -> minNumOfFailuresThreshold
circuitBreakerFailureRateThreshold(float) -> failureRateThreshold
circuitBreakerSlidingWindowSize(int) -> failureDetectionWindowSize
We need to drop also circuitBreaker prefix from others.
Probably with follow-up PR?
8bbfd71 to
e22cd06
Compare
ggivo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…4306) * [automatic failover] Set and test default values for failover config&components (#4298) * - set & test default values * - format * - fix tests failing due to changing defaults * [automatic failover] Add dual thresholds (min num of failures + failure rate) capabililty to circuit breaker (#4295) * [automatic failover] Remove the check for 'GenericObjectPool.getNumWaiters()' in 'TrackingConnectionPool' (#4270) - remove the check for number of waitiers in TrackingConnectionPool * [automatic failover] Configure max total connections for EchoStrategy (#4268) - set maxtotal connections for echoStrategy * [automatic failover] Replace 'CircuitBreaker' with 'Cluster' for 'CircuitBreakerFailoverBase.clusterFailover' (#4275) * - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover - improve thread safety with provider initialization * - formatting * [automatic failover] Minor optimizations on fast failover (#4277) * - minor optimizations on fail fast * - volatile failfast * [automatic failover] Implement health check retries (#4273) * - replace minConsecutiveSuccessCount with numberOfRetries - add retries into healtCheckImpl - apply changes to strategy implementations config classes - fix unit tests * - fix typo * - fix failing tests * - add tests for retry logic * - formatting * - format * - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies - new types probecontext, ProbePolicy, HealthProbeContext - add delayer executor pool to healthcheckımpl - adjustments on worker pool of healthCheckImpl for shared use of workers * - format * - expand comment with example case * - drop pooled executor for delays * - polish * - fix tests * - formatting * - checking failing tests * - fix test * - fix flaky tests * - fix flaky test * - add tests for builtin probing policies * - fix flaky test * [automatic failover] Move failover provider to mcf (#4294) * - move failover provider to mcf * - make iterateActiveCluster package private * [automatic failover] Add SSL configuration support to LagAwareStrategy (#4291) * User-provided ssl config for lag-aware health check * ssl scenario test for lag-aware healthcheck * format * format * address review comments - use getters instead of fields * [automatic failover] Implement max number of failover attempts (#4293) * - implement max failover attempt - add tests * - fix user receive the intended exception * -clean+format * - java doc for exceptions * format * - more tests on excaption types in max failover attempts mechanism * format * fix failing timing in test * disable health checks * rename to switchToHealthyCluster * format * - Add dual-threshold (min failures + failure rate) failover to circuit breaker executor - Map config to resilience4j via CircuitBreakerThresholdsAdapter - clean up/simplfy config: drop slow-call and window type - Add thresholdMinNumOfFailures; update some of the defaults - Update provider to use thresholds adapter - Update docs; align examples with new defaults - Add tests for 0% rate, edge thresholds * polish * Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * - fix typo * - fix min total calls calculation * format * - merge issues fixed * fix javadoc ref * - move threshold evaluations to failoverbase - simplfy executer and cbfailoverconnprovider - adjust config getters - fix failing tests due to COUNT_BASED -> TIME_BASED - new tests for thresholds calculations and impact on circuit state transitions * - avoid facilitating actual CBConfig type in tests * Update src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Trigger workflows * - evaluate only in failure recorded and failover immediately - add more test on threshold calculations - enable command line arg for overwriting surefire.excludedGroups * format * check pom * - fix error prone test * [automatic failover] Set and test default values for failover config&components (#4298) * - set & test default values * - format * - fix tests failing due to changing defaults * - fix flaky test * - remove unnecessary checks for failover attempt * - clean and trim adapter class - add docs and more explanantion * fix javadoc issue * - switch to all_succes to fix flaky timing * - fix issue in CircuitBreakerFailoverConnectionProvider * introduce ReflectionTestUtil --------- Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [automatic failover] feat: Add MultiDbClient with multi-endpoint failover and circuit breaker support (#4300) * feat: introduce ResilientRedisClient with multi-endpoint failover support Add ResilientRedisClient extending UnifiedJedis with automatic failover capabilities across multiple weighted Redis endpoints. Includes circuit breaker pattern, health monitoring, and configurable retry logic for high-availability Redis deployments. * format * mark ResilientRedisClientTest as integration one * fix test - make sure endpoint is healthy before activating it * Rename ResilientClient to align with design - ResilientClient -> MultiDbClient (builder, tests, etc) * Rename setActiveEndpoint to setActiveDatabaseEndpoint * Rename clusterSwitchListener to databaseSwitchListener * Rename multiClusterConfig to multiDbConfig * fix api doc's error * fix compilation error after rebase * format * fix example in javadoc * Update ActiveActiveFailoverTest scenariou test to use builder's # Conflicts: # src/test/java/redis/clients/jedis/scenario/ActiveActiveFailoverTest.java * rename setActiveDatabaseEndpoint -. setActiveDatabase * is healthy throw exception if cluster does not exists * format * [automatic failover]Use Endpoint interface instead HostAndPort in multi db (#4302) [clean up] Use Endpoint interface where possible * - fix variable name type * fix typo in variable name * - fix flaky test --------- Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ure rate) capabililty to circuit breaker (#4295) * [automatic failover] Remove the check for 'GenericObjectPool.getNumWaiters()' in 'TrackingConnectionPool' (#4270) - remove the check for number of waitiers in TrackingConnectionPool * [automatic failover] Configure max total connections for EchoStrategy (#4268) - set maxtotal connections for echoStrategy * [automatic failover] Replace 'CircuitBreaker' with 'Cluster' for 'CircuitBreakerFailoverBase.clusterFailover' (#4275) * - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover - improve thread safety with provider initialization * - formatting * [automatic failover] Minor optimizations on fast failover (#4277) * - minor optimizations on fail fast * - volatile failfast * [automatic failover] Implement health check retries (#4273) * - replace minConsecutiveSuccessCount with numberOfRetries - add retries into healtCheckImpl - apply changes to strategy implementations config classes - fix unit tests * - fix typo * - fix failing tests * - add tests for retry logic * - formatting * - format * - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies - new types probecontext, ProbePolicy, HealthProbeContext - add delayer executor pool to healthcheckımpl - adjustments on worker pool of healthCheckImpl for shared use of workers * - format * - expand comment with example case * - drop pooled executor for delays * - polish * - fix tests * - formatting * - checking failing tests * - fix test * - fix flaky tests * - fix flaky test * - add tests for builtin probing policies * - fix flaky test * [automatic failover] Move failover provider to mcf (#4294) * - move failover provider to mcf * - make iterateActiveCluster package private * [automatic failover] Add SSL configuration support to LagAwareStrategy (#4291) * User-provided ssl config for lag-aware health check * ssl scenario test for lag-aware healthcheck * format * format * address review comments - use getters instead of fields * [automatic failover] Implement max number of failover attempts (#4293) * - implement max failover attempt - add tests * - fix user receive the intended exception * -clean+format * - java doc for exceptions * format * - more tests on excaption types in max failover attempts mechanism * format * fix failing timing in test * disable health checks * rename to switchToHealthyCluster * format * - Add dual-threshold (min failures + failure rate) failover to circuit breaker executor - Map config to resilience4j via CircuitBreakerThresholdsAdapter - clean up/simplfy config: drop slow-call and window type - Add thresholdMinNumOfFailures; update some of the defaults - Update provider to use thresholds adapter - Update docs; align examples with new defaults - Add tests for 0% rate, edge thresholds * polish * Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * - fix typo * - fix min total calls calculation * format * - merge issues fixed * fix javadoc ref * - move threshold evaluations to failoverbase - simplfy executer and cbfailoverconnprovider - adjust config getters - fix failing tests due to COUNT_BASED -> TIME_BASED - new tests for thresholds calculations and impact on circuit state transitions * - avoid facilitating actual CBConfig type in tests * Update src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Trigger workflows * - evaluate only in failure recorded and failover immediately - add more test on threshold calculations - enable command line arg for overwriting surefire.excludedGroups * format * check pom * - fix error prone test * [automatic failover] Set and test default values for failover config&components (#4298) * - set & test default values * - format * - fix tests failing due to changing defaults * - fix flaky test * - remove unnecessary checks for failover attempt * - clean and trim adapter class - add docs and more explanantion * fix javadoc issue * - switch to all_succes to fix flaky timing * - fix issue in CircuitBreakerFailoverConnectionProvider * introduce ReflectionTestUtil --------- Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
Introduce dual-threshold circuit-breaker failover for Jedis multi‑cluster, simplify configuration, and align docs/tests. Failover now occurs only when BOTH of the following are exceeded: minimum number of failures and failure‑rate percentage.
Key changes
Behavior
Failover is triggered when BOTH conditions hold:
Notes
Migration
Use these settings:
Removed settings (no longer applicable):
If you previously relied on slow-call based opening, migrate to failure-based thresholds or external performance monitors.
Validation