|
| 1 | +--- |
| 2 | +sidebar_position: 6 |
| 3 | +title: Circuit Breaker User Guide |
| 4 | +--- |
| 5 | + |
| 6 | +# Circuit Breaker User Guide |
| 7 | + |
| 8 | +## What is a Circuit Breaker? |
| 9 | + |
| 10 | +Circuit breaking is a critical pattern for building resilient microservice applications. It helps prevent cascading failures by limiting the impact of service failures, latency spikes, and other network issues. |
| 11 | + |
| 12 | +In Kmesh, circuit breakers: |
| 13 | +- Limit concurrent connections and requests to a service |
| 14 | +- Automatically detect failing services |
| 15 | +- Prevent overload by rejecting excess traffic |
| 16 | +- Allow controlled recovery after failures |
| 17 | + |
| 18 | +## When to Use Circuit Breakers |
| 19 | + |
| 20 | +Consider implementing circuit breakers when: |
| 21 | +- You need to protect services from being overwhelmed |
| 22 | +- You want to improve system resilience during partial outages |
| 23 | +- You need to control retry behavior and prevent cascading failures |
| 24 | +- You want to implement graceful degradation |
| 25 | + |
| 26 | +## Circuit Breaker Configuration |
| 27 | + |
| 28 | +Kmesh circuit breakers can be configured using DestinationRules: |
| 29 | + |
| 30 | +```yaml |
| 31 | +apiVersion: networking.istio.io/v1alpha3 |
| 32 | +kind: DestinationRule |
| 33 | +metadata: |
| 34 | + name: my-service-circuit-breaker |
| 35 | +spec: |
| 36 | + host: my-service |
| 37 | + trafficPolicy: |
| 38 | + connectionPool: |
| 39 | + tcp: |
| 40 | + maxConnections: 100 # Maximum number of TCP connections |
| 41 | + connectTimeout: 30ms # TCP connection timeout |
| 42 | + http: |
| 43 | + http1MaxPendingRequests: 1 # Maximum number of pending HTTP requests |
| 44 | + maxRequestsPerConnection: 1 # Maximum number of requests per connection |
| 45 | + maxRetries: 3 # Maximum number of retries |
| 46 | + outlierDetection: |
| 47 | + consecutive5xxErrors: 5 # Number of 5xx errors before ejection |
| 48 | + interval: 10s # Interval for checking errors |
| 49 | + baseEjectionTime: 30s # Minimum ejection duration |
| 50 | + maxEjectionPercent: 100 # Maximum percentage of instances to eject |
| 51 | +``` |
| 52 | +
|
| 53 | +## Key Configuration Parameters |
| 54 | +
|
| 55 | +### Connection Pool Settings |
| 56 | +
|
| 57 | +| Parameter | Description | Default | |
| 58 | +|-----------|-------------|---------| |
| 59 | +| `maxConnections` | Maximum number of TCP connections | 1024 | |
| 60 | +| `connectTimeout` | TCP connection timeout | 30ms | |
| 61 | +| `http1MaxPendingRequests` | Maximum number of pending HTTP requests | 1024 | |
| 62 | +| `maxRequestsPerConnection` | Maximum number of requests per connection | No limit | |
| 63 | +| `maxRetries` | Maximum number of retries | 3 | |
| 64 | + |
| 65 | +### Outlier Detection Settings |
| 66 | + |
| 67 | +| Parameter | Description | Default | |
| 68 | +|-----------|-------------|---------| |
| 69 | +| `consecutive5xxErrors` | Number of 5xx errors before ejection | 5 | |
| 70 | +| `interval` | Interval for checking errors | 10s | |
| 71 | +| `baseEjectionTime` | Minimum ejection duration | 30s | |
| 72 | +| `maxEjectionPercent` | Maximum percentage of instances to eject | 10% | |
| 73 | + |
| 74 | +## Step-by-Step Implementation Guide |
| 75 | + |
| 76 | +### 1. Identify Services Needing Protection |
| 77 | + |
| 78 | +First, identify which services in your mesh need circuit breaker protection. Good candidates include: |
| 79 | +- Services with limited capacity |
| 80 | +- Critical services that must remain available |
| 81 | +- Services with known stability issues |
| 82 | + |
| 83 | +### 2. Create a Circuit Breaker Configuration |
| 84 | + |
| 85 | +Create a YAML file with your circuit breaker configuration: |
| 86 | + |
| 87 | +```yaml |
| 88 | +# circuit-breaker.yaml |
| 89 | +apiVersion: networking.istio.io/v1alpha3 |
| 90 | +kind: DestinationRule |
| 91 | +metadata: |
| 92 | + name: my-service-circuit-breaker |
| 93 | +spec: |
| 94 | + host: my-service |
| 95 | + trafficPolicy: |
| 96 | + connectionPool: |
| 97 | + tcp: |
| 98 | + maxConnections: 100 |
| 99 | + http: |
| 100 | + http1MaxPendingRequests: 10 |
| 101 | + maxRequestsPerConnection: 10 |
| 102 | + outlierDetection: |
| 103 | + consecutive5xxErrors: 5 |
| 104 | + interval: 10s |
| 105 | + baseEjectionTime: 30s |
| 106 | +``` |
| 107 | + |
| 108 | +### 3. Apply the Configuration |
| 109 | + |
| 110 | +Apply the configuration to your cluster: |
| 111 | + |
| 112 | +```bash |
| 113 | +kubectl apply -f circuit-breaker.yaml |
| 114 | +``` |
| 115 | + |
| 116 | +### 4. Verify the Configuration |
| 117 | + |
| 118 | +Check that your circuit breaker has been correctly applied: |
| 119 | + |
| 120 | +```bash |
| 121 | +kubectl get destinationrule my-service-circuit-breaker -o yaml |
| 122 | +``` |
| 123 | + |
| 124 | +### 5. Test the Circuit Breaker |
| 125 | + |
| 126 | +You can test your circuit breaker using a load testing tool like Fortio: |
| 127 | + |
| 128 | +```bash |
| 129 | +# Install Fortio if needed |
| 130 | +kubectl apply -f https://raw.githubusercontent.com/kmesh-net/kmesh/main/samples/fortio.yaml |
| 131 | +
|
| 132 | +# Run a load test |
| 133 | +kubectl exec -it $(kubectl get pod -l app=fortio -o jsonpath={.items[0].metadata.name}) \ |
| 134 | + -c fortio -- fortio load -c 20 -qps 0 -n 50 http://my-service:8000/ |
| 135 | +``` |
| 136 | + |
| 137 | +When the circuit breaker trips, you'll see HTTP 503 errors in the Fortio output. |
| 138 | + |
| 139 | +## Monitoring Circuit Breakers |
| 140 | + |
| 141 | +### Using Prometheus and Grafana |
| 142 | + |
| 143 | +If you have Prometheus and Grafana set up with Kmesh, you can monitor circuit breaker metrics: |
| 144 | + |
| 145 | +1. Open Grafana dashboard |
| 146 | +2. Look for metrics with the following patterns: |
| 147 | + - `istio_requests_total` (filtered by response code 503) |
| 148 | + - `istio_request_duration_milliseconds` (to monitor latency) |
| 149 | + |
| 150 | +### Using kubectl |
| 151 | + |
| 152 | +You can also check circuit breaker status using kubectl: |
| 153 | + |
| 154 | +```bash |
| 155 | +# Get circuit breaker metrics from Istio proxy |
| 156 | +kubectl exec $(kubectl get pod -l app=my-client -o jsonpath={.items[0].metadata.name}) \ |
| 157 | + -c istio-proxy -- pilot-agent request GET stats | grep circuit_breakers |
| 158 | +``` |
| 159 | + |
| 160 | +## Best Practices |
| 161 | + |
| 162 | +1. **Start Conservative**: Begin with higher limits and gradually tighten them |
| 163 | +2. **Test Thoroughly**: Test your circuit breakers under load before deploying to production |
| 164 | +3. **Monitor Behavior**: Keep track of circuit breaker activations to fine-tune settings |
| 165 | +4. **Layer Protection**: Use circuit breakers alongside retries, timeouts, and fallbacks |
| 166 | +5. **Document Settings**: Keep a record of circuit breaker settings for each service |
| 167 | + |
| 168 | +## Troubleshooting |
| 169 | + |
| 170 | +| Problem | Possible Cause | Solution | |
| 171 | +|---------|---------------|----------| |
| 172 | +| Too many 503 errors | Circuit breaker thresholds too low | Increase connection pool limits | |
| 173 | +| Circuit breaker not triggering | Thresholds too high | Lower connection pool limits | |
| 174 | +| Intermittent failures | Outlier detection not configured properly | Adjust consecutive error settings | |
| 175 | +| Slow recovery | Base ejection time too long | Reduce baseEjectionTime | |
| 176 | + |
| 177 | +## Further Reading |
| 178 | + |
| 179 | +- [Detailed Circuit Breaker Documentation](/docs/transpot-layer/circuit-breaker.md) |
| 180 | +- [Kmesh Performance Monitoring](/docs/performance/monitoring.md) |
| 181 | +- [Service Metrics with Grafana](/docs/transpot-layer/service-metrics.md) |
0 commit comments