| 
 | 1 | +---  | 
 | 2 | +sidebar_position: 6  | 
 | 3 | +title: Circuit Breaker User Guide  | 
 | 4 | +---  | 
 | 5 | + | 
 | 6 | +# Circuit Breaker User Guide  | 
 | 7 | + | 
 | 8 | +## What is a Circuit Breaker?  | 
 | 9 | + | 
 | 10 | +Circuit breaking is a critical pattern for building resilient microservice applications. It helps prevent cascading failures by limiting the impact of service failures, latency spikes, and other network issues.  | 
 | 11 | + | 
 | 12 | +In Kmesh, circuit breakers:  | 
 | 13 | +- Limit concurrent connections and requests to a service  | 
 | 14 | +- Automatically detect failing services  | 
 | 15 | +- Prevent overload by rejecting excess traffic  | 
 | 16 | +- Allow controlled recovery after failures  | 
 | 17 | + | 
 | 18 | +## When to Use Circuit Breakers  | 
 | 19 | + | 
 | 20 | +Consider implementing circuit breakers when:  | 
 | 21 | +- You need to protect services from being overwhelmed  | 
 | 22 | +- You want to improve system resilience during partial outages  | 
 | 23 | +- You need to control retry behavior and prevent cascading failures  | 
 | 24 | +- You want to implement graceful degradation  | 
 | 25 | + | 
 | 26 | +## Circuit Breaker Configuration  | 
 | 27 | + | 
 | 28 | +Kmesh circuit breakers can be configured using DestinationRules:  | 
 | 29 | + | 
 | 30 | +```yaml  | 
 | 31 | +apiVersion: networking.istio.io/v1alpha3  | 
 | 32 | +kind: DestinationRule  | 
 | 33 | +metadata:  | 
 | 34 | +  name: my-service-circuit-breaker  | 
 | 35 | +spec:  | 
 | 36 | +  host: my-service  | 
 | 37 | +  trafficPolicy:  | 
 | 38 | +    connectionPool:  | 
 | 39 | +      tcp:  | 
 | 40 | +        maxConnections: 100           # Maximum number of TCP connections  | 
 | 41 | +        connectTimeout: 30ms          # TCP connection timeout  | 
 | 42 | +      http:  | 
 | 43 | +        http1MaxPendingRequests: 1    # Maximum number of pending HTTP requests  | 
 | 44 | +        maxRequestsPerConnection: 1    # Maximum number of requests per connection  | 
 | 45 | +        maxRetries: 3                 # Maximum number of retries  | 
 | 46 | +    outlierDetection:  | 
 | 47 | +      consecutive5xxErrors: 5         # Number of 5xx errors before ejection  | 
 | 48 | +      interval: 10s                   # Interval for checking errors  | 
 | 49 | +      baseEjectionTime: 30s           # Minimum ejection duration  | 
 | 50 | +      maxEjectionPercent: 100         # Maximum percentage of instances to eject  | 
 | 51 | +```  | 
 | 52 | +
  | 
 | 53 | +## Key Configuration Parameters  | 
 | 54 | +
  | 
 | 55 | +### Connection Pool Settings  | 
 | 56 | +
  | 
 | 57 | +| Parameter | Description | Default |  | 
 | 58 | +|-----------|-------------|---------|  | 
 | 59 | +| `maxConnections` | Maximum number of TCP connections | 1024 |  | 
 | 60 | +| `connectTimeout` | TCP connection timeout | 30ms |  | 
 | 61 | +| `http1MaxPendingRequests` | Maximum number of pending HTTP requests | 1024 |  | 
 | 62 | +| `maxRequestsPerConnection` | Maximum number of requests per connection | No limit |  | 
 | 63 | +| `maxRetries` | Maximum number of retries | 3 |  | 
 | 64 | + | 
 | 65 | +### Outlier Detection Settings  | 
 | 66 | + | 
 | 67 | +| Parameter | Description | Default |  | 
 | 68 | +|-----------|-------------|---------|  | 
 | 69 | +| `consecutive5xxErrors` | Number of 5xx errors before ejection | 5 |  | 
 | 70 | +| `interval` | Interval for checking errors | 10s |  | 
 | 71 | +| `baseEjectionTime` | Minimum ejection duration | 30s |  | 
 | 72 | +| `maxEjectionPercent` | Maximum percentage of instances to eject | 10% |  | 
 | 73 | + | 
 | 74 | +## Step-by-Step Implementation Guide  | 
 | 75 | + | 
 | 76 | +### 1. Identify Services Needing Protection  | 
 | 77 | + | 
 | 78 | +First, identify which services in your mesh need circuit breaker protection. Good candidates include:  | 
 | 79 | +- Services with limited capacity  | 
 | 80 | +- Critical services that must remain available  | 
 | 81 | +- Services with known stability issues  | 
 | 82 | + | 
 | 83 | +### 2. Create a Circuit Breaker Configuration  | 
 | 84 | + | 
 | 85 | +Create a YAML file with your circuit breaker configuration:  | 
 | 86 | + | 
 | 87 | +```yaml  | 
 | 88 | +# circuit-breaker.yaml  | 
 | 89 | +apiVersion: networking.istio.io/v1alpha3  | 
 | 90 | +kind: DestinationRule  | 
 | 91 | +metadata:  | 
 | 92 | +  name: my-service-circuit-breaker  | 
 | 93 | +spec:  | 
 | 94 | +  host: my-service  | 
 | 95 | +  trafficPolicy:  | 
 | 96 | +    connectionPool:  | 
 | 97 | +      tcp:  | 
 | 98 | +        maxConnections: 100  | 
 | 99 | +      http:  | 
 | 100 | +        http1MaxPendingRequests: 10  | 
 | 101 | +        maxRequestsPerConnection: 10  | 
 | 102 | +    outlierDetection:  | 
 | 103 | +      consecutive5xxErrors: 5  | 
 | 104 | +      interval: 10s  | 
 | 105 | +      baseEjectionTime: 30s  | 
 | 106 | +```  | 
 | 107 | + | 
 | 108 | +### 3. Apply the Configuration  | 
 | 109 | + | 
 | 110 | +Apply the configuration to your cluster:  | 
 | 111 | + | 
 | 112 | +```bash  | 
 | 113 | +kubectl apply -f circuit-breaker.yaml  | 
 | 114 | +```  | 
 | 115 | + | 
 | 116 | +### 4. Verify the Configuration  | 
 | 117 | + | 
 | 118 | +Check that your circuit breaker has been correctly applied:  | 
 | 119 | + | 
 | 120 | +```bash  | 
 | 121 | +kubectl get destinationrule my-service-circuit-breaker -o yaml  | 
 | 122 | +```  | 
 | 123 | + | 
 | 124 | +### 5. Test the Circuit Breaker  | 
 | 125 | + | 
 | 126 | +You can test your circuit breaker using a load testing tool like Fortio:  | 
 | 127 | + | 
 | 128 | +```bash  | 
 | 129 | +# Install Fortio if needed  | 
 | 130 | +kubectl apply -f https://raw.githubusercontent.com/kmesh-net/kmesh/main/samples/fortio.yaml  | 
 | 131 | +
  | 
 | 132 | +# Run a load test  | 
 | 133 | +kubectl exec -it $(kubectl get pod -l app=fortio -o jsonpath={.items[0].metadata.name}) \  | 
 | 134 | +  -c fortio -- fortio load -c 20 -qps 0 -n 50 http://my-service:8000/  | 
 | 135 | +```  | 
 | 136 | + | 
 | 137 | +When the circuit breaker trips, you'll see HTTP 503 errors in the Fortio output.  | 
 | 138 | + | 
 | 139 | +## Monitoring Circuit Breakers  | 
 | 140 | + | 
 | 141 | +### Using Prometheus and Grafana  | 
 | 142 | + | 
 | 143 | +If you have Prometheus and Grafana set up with Kmesh, you can monitor circuit breaker metrics:  | 
 | 144 | + | 
 | 145 | +1. Open Grafana dashboard  | 
 | 146 | +2. Look for metrics with the following patterns:  | 
 | 147 | +   - `istio_requests_total` (filtered by response code 503)  | 
 | 148 | +   - `istio_request_duration_milliseconds` (to monitor latency)  | 
 | 149 | + | 
 | 150 | +### Using kubectl  | 
 | 151 | + | 
 | 152 | +You can also check circuit breaker status using kubectl:  | 
 | 153 | + | 
 | 154 | +```bash  | 
 | 155 | +# Get circuit breaker metrics from Istio proxy  | 
 | 156 | +kubectl exec $(kubectl get pod -l app=my-client -o jsonpath={.items[0].metadata.name}) \  | 
 | 157 | +  -c istio-proxy -- pilot-agent request GET stats | grep circuit_breakers  | 
 | 158 | +```  | 
 | 159 | + | 
 | 160 | +## Best Practices  | 
 | 161 | + | 
 | 162 | +1. **Start Conservative**: Begin with higher limits and gradually tighten them  | 
 | 163 | +2. **Test Thoroughly**: Test your circuit breakers under load before deploying to production  | 
 | 164 | +3. **Monitor Behavior**: Keep track of circuit breaker activations to fine-tune settings  | 
 | 165 | +4. **Layer Protection**: Use circuit breakers alongside retries, timeouts, and fallbacks  | 
 | 166 | +5. **Document Settings**: Keep a record of circuit breaker settings for each service  | 
 | 167 | + | 
 | 168 | +## Troubleshooting  | 
 | 169 | + | 
 | 170 | +| Problem | Possible Cause | Solution |  | 
 | 171 | +|---------|---------------|----------|  | 
 | 172 | +| Too many 503 errors | Circuit breaker thresholds too low | Increase connection pool limits |  | 
 | 173 | +| Circuit breaker not triggering | Thresholds too high | Lower connection pool limits |  | 
 | 174 | +| Intermittent failures | Outlier detection not configured properly | Adjust consecutive error settings |  | 
 | 175 | +| Slow recovery | Base ejection time too long | Reduce baseEjectionTime |  | 
 | 176 | + | 
 | 177 | +## Further Reading  | 
 | 178 | + | 
 | 179 | +- [Detailed Circuit Breaker Documentation](/docs/transpot-layer/circuit-breaker.md)  | 
 | 180 | +- [Kmesh Performance Monitoring](/docs/performance/monitoring.md)  | 
 | 181 | +- [Service Metrics with Grafana](/docs/transpot-layer/service-metrics.md)   | 
0 commit comments