Skip to content

Commit ead3c4a

Browse files
author
Anya
committed
Add circuit breaker user guide
1 parent 86e7a0c commit ead3c4a

File tree

1 file changed

+181
-0
lines changed

1 file changed

+181
-0
lines changed
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
---
2+
sidebar_position: 6
3+
title: Circuit Breaker User Guide
4+
---
5+
6+
# Circuit Breaker User Guide
7+
8+
## What is a Circuit Breaker?
9+
10+
Circuit breaking is a critical pattern for building resilient microservice applications. It helps prevent cascading failures by limiting the impact of service failures, latency spikes, and other network issues.
11+
12+
In Kmesh, circuit breakers:
13+
- Limit concurrent connections and requests to a service
14+
- Automatically detect failing services
15+
- Prevent overload by rejecting excess traffic
16+
- Allow controlled recovery after failures
17+
18+
## When to Use Circuit Breakers
19+
20+
Consider implementing circuit breakers when:
21+
- You need to protect services from being overwhelmed
22+
- You want to improve system resilience during partial outages
23+
- You need to control retry behavior and prevent cascading failures
24+
- You want to implement graceful degradation
25+
26+
## Circuit Breaker Configuration
27+
28+
Kmesh circuit breakers can be configured using DestinationRules:
29+
30+
```yaml
31+
apiVersion: networking.istio.io/v1alpha3
32+
kind: DestinationRule
33+
metadata:
34+
name: my-service-circuit-breaker
35+
spec:
36+
host: my-service
37+
trafficPolicy:
38+
connectionPool:
39+
tcp:
40+
maxConnections: 100 # Maximum number of TCP connections
41+
connectTimeout: 30ms # TCP connection timeout
42+
http:
43+
http1MaxPendingRequests: 1 # Maximum number of pending HTTP requests
44+
maxRequestsPerConnection: 1 # Maximum number of requests per connection
45+
maxRetries: 3 # Maximum number of retries
46+
outlierDetection:
47+
consecutive5xxErrors: 5 # Number of 5xx errors before ejection
48+
interval: 10s # Interval for checking errors
49+
baseEjectionTime: 30s # Minimum ejection duration
50+
maxEjectionPercent: 100 # Maximum percentage of instances to eject
51+
```
52+
53+
## Key Configuration Parameters
54+
55+
### Connection Pool Settings
56+
57+
| Parameter | Description | Default |
58+
|-----------|-------------|---------|
59+
| `maxConnections` | Maximum number of TCP connections | 1024 |
60+
| `connectTimeout` | TCP connection timeout | 30ms |
61+
| `http1MaxPendingRequests` | Maximum number of pending HTTP requests | 1024 |
62+
| `maxRequestsPerConnection` | Maximum number of requests per connection | No limit |
63+
| `maxRetries` | Maximum number of retries | 3 |
64+
65+
### Outlier Detection Settings
66+
67+
| Parameter | Description | Default |
68+
|-----------|-------------|---------|
69+
| `consecutive5xxErrors` | Number of 5xx errors before ejection | 5 |
70+
| `interval` | Interval for checking errors | 10s |
71+
| `baseEjectionTime` | Minimum ejection duration | 30s |
72+
| `maxEjectionPercent` | Maximum percentage of instances to eject | 10% |
73+
74+
## Step-by-Step Implementation Guide
75+
76+
### 1. Identify Services Needing Protection
77+
78+
First, identify which services in your mesh need circuit breaker protection. Good candidates include:
79+
- Services with limited capacity
80+
- Critical services that must remain available
81+
- Services with known stability issues
82+
83+
### 2. Create a Circuit Breaker Configuration
84+
85+
Create a YAML file with your circuit breaker configuration:
86+
87+
```yaml
88+
# circuit-breaker.yaml
89+
apiVersion: networking.istio.io/v1alpha3
90+
kind: DestinationRule
91+
metadata:
92+
name: my-service-circuit-breaker
93+
spec:
94+
host: my-service
95+
trafficPolicy:
96+
connectionPool:
97+
tcp:
98+
maxConnections: 100
99+
http:
100+
http1MaxPendingRequests: 10
101+
maxRequestsPerConnection: 10
102+
outlierDetection:
103+
consecutive5xxErrors: 5
104+
interval: 10s
105+
baseEjectionTime: 30s
106+
```
107+
108+
### 3. Apply the Configuration
109+
110+
Apply the configuration to your cluster:
111+
112+
```bash
113+
kubectl apply -f circuit-breaker.yaml
114+
```
115+
116+
### 4. Verify the Configuration
117+
118+
Check that your circuit breaker has been correctly applied:
119+
120+
```bash
121+
kubectl get destinationrule my-service-circuit-breaker -o yaml
122+
```
123+
124+
### 5. Test the Circuit Breaker
125+
126+
You can test your circuit breaker using a load testing tool like Fortio:
127+
128+
```bash
129+
# Install Fortio if needed
130+
kubectl apply -f https://raw.githubusercontent.com/kmesh-net/kmesh/main/samples/fortio.yaml
131+
132+
# Run a load test
133+
kubectl exec -it $(kubectl get pod -l app=fortio -o jsonpath={.items[0].metadata.name}) \
134+
-c fortio -- fortio load -c 20 -qps 0 -n 50 http://my-service:8000/
135+
```
136+
137+
When the circuit breaker trips, you'll see HTTP 503 errors in the Fortio output.
138+
139+
## Monitoring Circuit Breakers
140+
141+
### Using Prometheus and Grafana
142+
143+
If you have Prometheus and Grafana set up with Kmesh, you can monitor circuit breaker metrics:
144+
145+
1. Open Grafana dashboard
146+
2. Look for metrics with the following patterns:
147+
- `istio_requests_total` (filtered by response code 503)
148+
- `istio_request_duration_milliseconds` (to monitor latency)
149+
150+
### Using kubectl
151+
152+
You can also check circuit breaker status using kubectl:
153+
154+
```bash
155+
# Get circuit breaker metrics from Istio proxy
156+
kubectl exec $(kubectl get pod -l app=my-client -o jsonpath={.items[0].metadata.name}) \
157+
-c istio-proxy -- pilot-agent request GET stats | grep circuit_breakers
158+
```
159+
160+
## Best Practices
161+
162+
1. **Start Conservative**: Begin with higher limits and gradually tighten them
163+
2. **Test Thoroughly**: Test your circuit breakers under load before deploying to production
164+
3. **Monitor Behavior**: Keep track of circuit breaker activations to fine-tune settings
165+
4. **Layer Protection**: Use circuit breakers alongside retries, timeouts, and fallbacks
166+
5. **Document Settings**: Keep a record of circuit breaker settings for each service
167+
168+
## Troubleshooting
169+
170+
| Problem | Possible Cause | Solution |
171+
|---------|---------------|----------|
172+
| Too many 503 errors | Circuit breaker thresholds too low | Increase connection pool limits |
173+
| Circuit breaker not triggering | Thresholds too high | Lower connection pool limits |
174+
| Intermittent failures | Outlier detection not configured properly | Adjust consecutive error settings |
175+
| Slow recovery | Base ejection time too long | Reduce baseEjectionTime |
176+
177+
## Further Reading
178+
179+
- [Detailed Circuit Breaker Documentation](/docs/transpot-layer/circuit-breaker.md)
180+
- [Kmesh Performance Monitoring](/docs/performance/monitoring.md)
181+
- [Service Metrics with Grafana](/docs/transpot-layer/service-metrics.md)

0 commit comments

Comments
 (0)