Skip to content

Conversation

@atakavci
Copy link
Contributor

@atakavci atakavci commented Sep 26, 2025

Summary

Introduce dual-threshold circuit-breaker failover for Jedis multi‑cluster, simplify configuration, and align docs/tests. Failover now occurs only when BOTH of the following are exceeded: minimum number of failures and failure‑rate percentage.

Key changes

  • Dual-threshold enforcement
    • New evaluation in CircuitBreakerFailoverBase: evaluateThresholds(cluster) checks metrics and opens the CB when thresholds are exceeded; clusterFailover(...) then forces open and switches clusters.
    • CircuitBreakerCommandExecutor and CircuitBreakerFailoverConnectionProvider now call evaluateThresholds before execution and on tracked exceptions to keep decisions consistent and deterministic.
  • New CircuitBreakerThresholdsAdapter
    • Maps MultiClusterClientConfig → resilience4j CircuitBreakerConfig.
    • Special handling:
      • failureRateThreshold = 0.0f maps to 100.0f (resilience4j limitation).
      • minimumNumberOfCalls is derived from configured minFailures and rate; when rate == 0.0f, sets Integer.MAX_VALUE to prevent the CB from auto-opening on its own.
      • Uses TIME_BASED window; size comes from config.
  • MultiClusterClientConfig simplification and new defaults
    • Added circuitBreakerMinNumOfFailures (default 1000) and validation that rejects both minFailures==0 and rate==0 simultaneously.
    • Defaults updated: failureRateThreshold=10.0f, slidingWindowSize=2.
    • Removed slow-call knobs (duration and rate) and sliding-window type/min-calls from the public API.
  • MultiClusterPooledConnectionProvider
    • Builds the CircuitBreaker using CircuitBreakerThresholdsAdapter.
    • Exposes cluster getters for getCircuitBreakerMinNumOfFailures() and getCircuitBreakerFailureRateThreshold().
  • Documentation
    • Updated failover.md to the new defaults and simplified table; removed slow-call and sliding window type/min-calls; clarified the dual-threshold model.
  • Tests
    • New CircuitBreakerFailoverBaseTests: matrix coverage of minFailures × failure-rate using real CB metrics with mocked wiring.
    • New/updated CircuitBreakerThresholdsTest: exercises real provider + real executor + real CB/Retry with only the connection pool mocked (no network), validating threshold behavior and failover switching.
    • Integration tests (FailoverIntegrationTest, ActiveActiveLocalFailoverTest) adjusted to the new configuration style and defaults.

Behavior

Failover is triggered when BOTH conditions hold:

  • failures >= circuitBreakerMinNumOfFailures
  • (failed / (failed + successful)) × 100 >= circuitBreakerFailureRateThreshold

Notes

  • With failureRateThreshold=0.0f, adapter maps to 100.0f and sets minimumNumberOfCalls=Integer.MAX_VALUE so resilience4j does not auto-open; Jedis still enforces the dual-threshold policy in the executor/base.
  • Automatic transition from OPEN to HALF_OPEN remains disabled; failback is handled by existing health‑check/failback mechanisms.

Migration

Use these settings:

  • circuitBreakerMinNumOfFailures(int)
  • circuitBreakerFailureRateThreshold(float)
  • circuitBreakerSlidingWindowSize(int) // time-based seconds

Removed settings (no longer applicable):

  • slidingWindowType, slidingWindowMinCalls
  • slowCallDurationThreshold, slowCallRateThreshold

If you previously relied on slow-call based opening, migrate to failure-based thresholds or external performance monitors.

Validation

  • Unit/parameterized tests cover edge thresholds (0%, 100%), small/large sample sizes.
  • Integration tests updated to the new API; docs reflect defaults and examples.
  • CircuitBreakerThresholdsTest now validates end‑to‑end interaction among provider + executor + circuit breaker without using network calls.

atakavci and others added 10 commits September 3, 2025 14:03
…aiters()' in 'TrackingConnectionPool' (redis#4270)

- remove the check for number of waitiers in TrackingConnectionPool
…cuitBreakerFailoverBase.clusterFailover' (redis#4275)

* - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover
- improve thread safety with provider initialization

* - formatting
* - minor optimizations on fail fast

* -  volatile failfast
* - replace minConsecutiveSuccessCount with numberOfRetries
- add retries into healtCheckImpl
- apply changes to strategy implementations config classes
- fix unit tests

* - fix typo

* - fix failing tests

* - add tests for retry logic

* - formatting

* - format

* - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies
- new types probecontext, ProbePolicy, HealthProbeContext
- add delayer executor pool to healthcheckımpl
-  adjustments on  worker pool of healthCheckImpl for shared use of workers

* - format

* - expand comment with example case

* - drop pooled executor for delays

* - polish

* - fix tests

* - formatting

* - checking failing tests

* - fix test

* - fix flaky tests

* - fix flaky test

* - add tests for builtin probing policies

* - fix flaky test
* - move failover provider to mcf

* - make iterateActiveCluster package private
redis#4291)

* User-provided ssl config for lag-aware health check

* ssl scenario test for lag-aware healthcheck

* format

* format

* address review comments

  - use getters instead of fields
…#4293)

* - implement max failover attempt
- add tests

* - fix user receive the intended exception

* -clean+format

* - java doc for exceptions

* format

* - more tests on excaption types in max failover attempts mechanism

* format

* fix failing timing in test

* disable health checks

* rename to switchToHealthyCluster

* format
…t breaker executor

- Map config to resilience4j via CircuitBreakerThresholdsAdapter
- clean up/simplfy config: drop slow-call and window type
- Add thresholdMinNumOfFailures; update some of the defaults
- Update provider to use thresholds adapter
- Update docs; align examples with new defaults
- Add tests for 0% rate, edge thresholds
@atakavci atakavci self-assigned this Sep 26, 2025
@jit-ci
Copy link

jit-ci bot commented Sep 26, 2025

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds dual-threshold failover capability to Jedis multi-cluster circuit breaker functionality. The circuit breaker now requires both a minimum number of failures AND a failure rate threshold to be exceeded before triggering failover, providing more robust failover control.

Key changes:

  • Introduces dual threshold logic requiring both minimum failures and failure rate criteria to be met
  • Simplifies configuration by removing deprecated circuit breaker settings and updating defaults
  • Adds new adapter class to handle threshold mapping to resilience4j configuration

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
MultiClusterClientConfig.java Updated configuration with new dual threshold properties and simplified circuit breaker settings
CircuitBreakerThresholdsAdapter.java New adapter class for mapping configuration to resilience4j circuit breaker settings
CircuitBreakerCommandExecutor.java Added dual threshold checking logic before triggering failover
CircuitBreakerFailoverBase.java Base class with shared threshold validation logic
JedisFailoverThresholdsExceededException.java New exception type for threshold-based failover
Various test files Updated tests to remove deprecated configuration options and add threshold testing
docs/failover.md Updated documentation with new defaults and simplified configuration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

atakavci and others added 2 commits September 26, 2025 20:26
…Adapter.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@atakavci atakavci added the breakingchange Pull request that has breaking changes. Must include the breaking behavior in release notes. label Sep 26, 2025
- simplfy executer and cbfailoverconnprovider
- adjust config getters
- fix failing tests due to COUNT_BASED -> TIME_BASED
- new tests for thresholds calculations and impact on circuit state transitions
@atakavci atakavci requested a review from Copilot September 29, 2025 21:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@atakavci atakavci changed the base branch from feature/automatic-failover-2 to feature/automatic-failover-3 September 30, 2025 07:57
@atakavci atakavci marked this pull request as draft September 30, 2025 11:10
@atakavci atakavci marked this pull request as ready for review September 30, 2025 11:11
- add more test on threshold calculations
- enable command line arg for overwriting surefire.excludedGroups
@atakavci atakavci requested a review from Copilot October 1, 2025 18:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still going trough the complete review, have some initial questions

Comment on lines +867 to +871
Metrics metrics = cluster.getCircuitBreaker().getMetrics();
// ATTENTION: this is to increment fails in regard to the current call that is failing,
// DO NOT remove the increment, it will change the behaviour in case of initial requests to
// cluster fail
int fails = metrics.getNumberOfFailedCalls() + (lastFailRecorded ? 0 : 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get this one.
In which case, metrics are not increased yet?

Copy link
Collaborator

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the new properties need to be renamed to align the naming with the design:

circuitBreakerMinNumOfFailures(int) -> minNumOfFailuresThreshold
circuitBreakerFailureRateThreshold(float) -> failureRateThreshold
circuitBreakerSlidingWindowSize(int) -> failureDetectionWindowSize

We need to drop also circuitBreaker prefix from others.
Probably with follow-up PR?

@ggivo ggivo force-pushed the feature/automatic-failover-3 branch from 8bbfd71 to e22cd06 Compare October 3, 2025 09:04
Copy link
Collaborator

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@atakavci atakavci merged commit 11308f0 into redis:feature/automatic-failover-3 Oct 3, 2025
11 checks passed
atakavci added a commit that referenced this pull request Oct 6, 2025
…4306)

* [automatic failover] Set and test default values for failover config&components (#4298)

* - set & test default values

* - format

* - fix tests failing due to changing defaults

* [automatic failover] Add dual thresholds (min num of failures + failure rate) capabililty to circuit breaker (#4295)

* [automatic failover] Remove the check for 'GenericObjectPool.getNumWaiters()' in 'TrackingConnectionPool' (#4270)

- remove the check for number of waitiers in TrackingConnectionPool

* [automatic failover] Configure max total connections for EchoStrategy (#4268)

- set maxtotal connections for echoStrategy

* [automatic failover] Replace 'CircuitBreaker' with 'Cluster' for 'CircuitBreakerFailoverBase.clusterFailover' (#4275)

* - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover
- improve thread safety with provider initialization

* - formatting

* [automatic failover] Minor optimizations on fast failover (#4277)

* - minor optimizations on fail fast

* -  volatile failfast

* [automatic failover] Implement health check retries (#4273)

* - replace minConsecutiveSuccessCount with numberOfRetries
- add retries into healtCheckImpl
- apply changes to strategy implementations config classes
- fix unit tests

* - fix typo

* - fix failing tests

* - add tests for retry logic

* - formatting

* - format

* - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies
- new types probecontext, ProbePolicy, HealthProbeContext
- add delayer executor pool to healthcheckımpl
-  adjustments on  worker pool of healthCheckImpl for shared use of workers

* - format

* - expand comment with example case

* - drop pooled executor for delays

* - polish

* - fix tests

* - formatting

* - checking failing tests

* - fix test

* - fix flaky tests

* - fix flaky test

* - add tests for builtin probing policies

* - fix flaky test

* [automatic failover] Move failover provider to mcf (#4294)

* - move failover provider to mcf

* - make iterateActiveCluster package private

* [automatic failover]  Add SSL configuration support to LagAwareStrategy  (#4291)

* User-provided ssl config for lag-aware health check

* ssl scenario test for lag-aware healthcheck

* format

* format

* address review comments

  - use getters instead of fields

* [automatic failover] Implement max number of failover attempts (#4293)

* - implement max failover attempt
- add tests

* - fix user receive the intended exception

* -clean+format

* - java doc for exceptions

* format

* - more tests on excaption types in max failover attempts mechanism

* format

* fix failing timing in test

* disable health checks

* rename to switchToHealthyCluster

* format

* - Add dual-threshold (min failures + failure rate) failover to circuit breaker executor
- Map config to resilience4j via CircuitBreakerThresholdsAdapter
- clean up/simplfy config: drop slow-call and window type
- Add thresholdMinNumOfFailures; update some of the defaults
- Update provider to use thresholds adapter
- Update docs; align examples with new defaults
- Add tests for 0% rate, edge thresholds

* polish

* Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* - fix typo

* - fix min total calls calculation

* format

* - merge issues fixed

* fix javadoc ref

* - move threshold evaluations to failoverbase
- simplfy executer and cbfailoverconnprovider
- adjust config getters
- fix failing tests due to COUNT_BASED -> TIME_BASED
- new tests for thresholds calculations and impact on circuit state transitions

* - avoid facilitating actual CBConfig type in tests

* Update src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Trigger workflows

* - evaluate only in failure recorded and failover immediately
- add more test on threshold calculations
- enable command line arg for overwriting surefire.excludedGroups

* format

* check pom

* - fix error prone test

* [automatic failover] Set and test default values for failover config&components (#4298)

* - set & test default values

* - format

* - fix tests failing due to changing defaults

* - fix flaky test

* - remove unnecessary checks for failover attempt

* - clean and trim adapter class
- add docs and more explanantion

* fix javadoc issue

* - switch to all_succes to fix flaky timing

* - fix issue in CircuitBreakerFailoverConnectionProvider

* introduce ReflectionTestUtil

---------

Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [automatic failover] feat: Add MultiDbClient with multi-endpoint failover and circuit breaker support (#4300)

* feat: introduce ResilientRedisClient with multi-endpoint failover support

Add ResilientRedisClient extending UnifiedJedis with automatic failover
capabilities across multiple weighted Redis endpoints. Includes circuit
breaker pattern, health monitoring, and configurable retry logic for
high-availability Redis deployments.

* format

* mark ResilientRedisClientTest as integration one

* fix test
  - make sure endpoint is healthy before activating it

* Rename ResilientClient to align with design

 - ResilientClient -> MultiDbClient (builder, tests, etc)

* Rename setActiveEndpoint to setActiveDatabaseEndpoint

* Rename clusterSwitchListener to databaseSwitchListener

* Rename multiClusterConfig to multiDbConfig

* fix api doc's error

* fix compilation error after rebase

* format

* fix example in javadoc

* Update ActiveActiveFailoverTest scenariou test to use builder's

# Conflicts:
#	src/test/java/redis/clients/jedis/scenario/ActiveActiveFailoverTest.java

* rename setActiveDatabaseEndpoint -. setActiveDatabase

* is healthy throw exception if cluster does not exists

* format

* [automatic failover]Use Endpoint interface instead HostAndPort in multi db (#4302)

[clean up] Use Endpoint interface where possible

* - fix variable name type

* fix typo in variable name

* - fix flaky test

---------

Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
ggivo added a commit that referenced this pull request Oct 6, 2025
…ure rate) capabililty to circuit breaker (#4295)

* [automatic failover] Remove the check for 'GenericObjectPool.getNumWaiters()' in 'TrackingConnectionPool' (#4270)

- remove the check for number of waitiers in TrackingConnectionPool

* [automatic failover] Configure max total connections for EchoStrategy (#4268)

- set maxtotal connections for echoStrategy

* [automatic failover] Replace 'CircuitBreaker' with 'Cluster' for 'CircuitBreakerFailoverBase.clusterFailover' (#4275)

* - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover
- improve thread safety with provider initialization

* - formatting

* [automatic failover] Minor optimizations on fast failover (#4277)

* - minor optimizations on fail fast

* -  volatile failfast

* [automatic failover] Implement health check retries (#4273)

* - replace minConsecutiveSuccessCount with numberOfRetries
- add retries into healtCheckImpl
- apply changes to strategy implementations config classes
- fix unit tests

* - fix typo

* - fix failing tests

* - add tests for retry logic

* - formatting

* - format

* - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies
- new types probecontext, ProbePolicy, HealthProbeContext
- add delayer executor pool to healthcheckımpl
-  adjustments on  worker pool of healthCheckImpl for shared use of workers

* - format

* - expand comment with example case

* - drop pooled executor for delays

* - polish

* - fix tests

* - formatting

* - checking failing tests

* - fix test

* - fix flaky tests

* - fix flaky test

* - add tests for builtin probing policies

* - fix flaky test

* [automatic failover] Move failover provider to mcf (#4294)

* - move failover provider to mcf

* - make iterateActiveCluster package private

* [automatic failover]  Add SSL configuration support to LagAwareStrategy  (#4291)

* User-provided ssl config for lag-aware health check

* ssl scenario test for lag-aware healthcheck

* format

* format

* address review comments

  - use getters instead of fields

* [automatic failover] Implement max number of failover attempts (#4293)

* - implement max failover attempt
- add tests

* - fix user receive the intended exception

* -clean+format

* - java doc for exceptions

* format

* - more tests on excaption types in max failover attempts mechanism

* format

* fix failing timing in test

* disable health checks

* rename to switchToHealthyCluster

* format

* - Add dual-threshold (min failures + failure rate) failover to circuit breaker executor
- Map config to resilience4j via CircuitBreakerThresholdsAdapter
- clean up/simplfy config: drop slow-call and window type
- Add thresholdMinNumOfFailures; update some of the defaults
- Update provider to use thresholds adapter
- Update docs; align examples with new defaults
- Add tests for 0% rate, edge thresholds

* polish

* Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* - fix typo

* - fix min total calls calculation

* format

* - merge issues fixed

* fix javadoc ref

* - move threshold evaluations to failoverbase
- simplfy executer and cbfailoverconnprovider
- adjust config getters
- fix failing tests due to COUNT_BASED -> TIME_BASED
- new tests for thresholds calculations and impact on circuit state transitions

* - avoid facilitating actual CBConfig type in tests

* Update src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Trigger workflows

* - evaluate only in failure recorded and failover immediately
- add more test on threshold calculations
- enable command line arg for overwriting surefire.excludedGroups

* format

* check pom

* - fix error prone test

* [automatic failover] Set and test default values for failover config&components (#4298)

* - set & test default values

* - format

* - fix tests failing due to changing defaults

* - fix flaky test

* - remove unnecessary checks for failover attempt

* - clean and trim adapter class
- add docs and more explanantion

* fix javadoc issue

* - switch to all_succes to fix flaky timing

* - fix issue in CircuitBreakerFailoverConnectionProvider

* introduce ReflectionTestUtil

---------

Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breakingchange Pull request that has breaking changes. Must include the breaking behavior in release notes. feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants