-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Bulkhead
To limit the resources consumable by the governed actions, such that a fault 'storm' cannot cause a cascading failure also bringing down other operations.
When a process begins to fault, it can build up a large number of requests, all potentially failing slowly in parallel. If unconstrained, these can chew up ever greater resource (CPU/threads) in the host, degrading capability or eventually causing outright failure.
In a variant, a faulted downstream system can lead to a 'backing up' of large numbers of requests in its consumers. If ungoverned, these 'backed-up' calls can in turn consume all resources in the consumer, leading to a cascading failure of upstream services.
A bulkhead is a portion of a ship which can be isolated from others, such that if it fails (holes), the whole ship does not sink.
Similarly, a bulkhead isolation policy assigns operations to constrained resource pools, such that one faulting channel of actions cannot swamp all resource (threads/CPU/whatever) in a system, and bring down other operations with it. The impact of a faulting system is isolated to the resource-pool to which it is limited; other threads/pools/capacity remain to continue serving other calls.
BulkheadPolicy bulkhead = Policy
.Bulkhead(int maxParallelization[, int maxQueuingActions][, Action<Context> onBulkheadRejected]);
BulkheadPolicy bulkhead = Policy
.BulkheadAsync(int maxParallelization[, int maxQueuingActions][, Func<Context, Task> onBulkheadRejectedAsync]);
Parameters:
- maxParallelization: the maximum parallelization of executions through the bulkhead;
- maxQueuingActions (optional): the maximum number of actions that may be queuing (waiting to acquire an execution slot) at any time.
- onBulkheadRejected/Async (optional): an action to run, if the bulkhead rejects execution
Throws:
-
BulkheadRejectedException
, when an execution is rejected due to bulkhead and queue capacity being exceeded
A useful way to envisage the policy is that separate bulkheads place calls into separate thread pools of the defined size.
This is not the implementation (we take the view we cannot better the thread-pool algorithms of the Base Class Library). Instead, as latest Hystrix, the bulkhead is implemented as a max parallelization semaphore on actions through the bulkhead (leaving .NET's own algorithms to manage the allocation of executions to threads).
When a call is placed, the bulkhead policy:
- determines if there is an execution slot within the bulkhead, and executes immediately if so
- if not, determines if there is still space in the queue; when not, a
BulkheadRejectedException
is thrown.
The policy itself does not place calls onto threads; it assumes upstream systems have already placed calls into threads, but limits their parallelization of execution.
The bulkhead's primary goal is to act like nightclub bouncer: to ensure that the maximum capacity of the 'club' inside is never exceeded. At the same time, just as for a nightclub bouncer, a secondary goal is to ensure the inside of the club is always at maximise utilisation.
To achieve this, it makes sense to have a queue of 'ready punters' (on the pavement outside the nightclub, if you like), waiting to take execution slots within the bulkhead as soon as one becomes free. This is the maxQueuingActions
.
For guidance on setting maxQueuingActions
, see Configuration recommendations below.
An optional onBulkheadRejected
/ onBulkheadRejectedAsync
delegate allows specific code to be executed (for example for logging) when the bulkhead rejects an execution.
Bulkhead policies expose two state properties for reporting/health-monitoring:
-
BulkheadAvailableCount
: the number of execution slots available in the bulkhead at present -
QueueAvailableCount
: the number of spaces available in the queue for a bulkhead execution slot
Note: Code such as this is not necessary:
if (bulkhead.BulkheadAvailableCount + bulkhead.QueueAvailableCount > 0)
{
bulkhead.Execute(...); // place call
}
It is sufficient to place the call bulkhead.Execute(...)
, and the bulkhead will decide for itself whether the action can be executed or queued. In addition, users should be clear that the above code does not guarantee the bulkhead will not block the execution: in a highly concurrent environment, state could change between evaluating the if condition and executing the action. However, a code pattern such as above can be used to reduce the number of BulkheadRejectedException
s thrown while the bulkhead is at capacity, if this is a performance concern.
A bulkhead policy acts both as an isolation unit, and (intentionally) as a load-shedder. To preserve the health of the underlying machine, the bulkhead intentionally sheds load when its capacity and queue are exhausted.
Bulkheads work best when used in combination with some kind of automated horizontal scaling. You need either to be able to be tolerant of bulkhead rejections (of asking users or processes to 'come back later'); or to use the bulkhead rejections (or pattern of same) as a trigger for automated horizontal scaling.
The capacity to set for a bulkhead will depend critically on:
- whether it governs an I/O-bound or CPU-bound operation
- what other actions are being supported (which can also be expected to operate simultaneously at load) by the underlying application or host hardware/VM
- what automated horizontal scaling you have available.
The recommended approach for configuring bulkhead capacity is to configure based on load/saturation-testing in or on a replication of your production environment.
It is important however not to set the bulkhead capacity for an individual operation near the peak manageable load, as if that process was running in isolation. Setting the bulkhead capacity at this level would provide no protection for other processes running simultaneously: the first process would have been precisely permitted (when it faults) to saturate all available resource, degrading others.
If your orientation is for maximum resilience at high volumes, and adequate automated horizontal scaling to support this is available, an application running four customer-critical processes, all expected to run simultaneously at load, might, for example, allocate bulkheads restricting these to less than a quarter of the host's capacity for each process in isolation. Such a stability-orientation prefers to trigger horizontal scaling - to reduce overall latency for customers by preserving the health of underlying individual hosts.
Alternatively, you may have an orientation seeking to trade or contain the cost of horizontal scaling, for slightly greater risk, by sharing a bulkhead across calls. Sharing a BulkheadPolicy
instance across calls allows the group of calls to share the capacity: this can provide more flexibility (and more efficient use of resource) if the different calls can be expected, for example, to have different peak hours, at the expense that one stream of calls has more potential to degrade others.
Equally, you can assign relative priority to different actions by assigning relatively-sized bulkheads to different customer operations. For example, you might define greater bulkhead capacity for critical customer operations such as checking out and paying, than for (say) retrieving and showing other customers' recommendations - if these were all in the same app.
Finally, remember also that partitioning at a software level within the application (as a BulkheadPolicy
) is only one level at which isolation for stability may be provided. For instance, you might partition your systems also at the server level, reserving some servers purely for administrative functions, so that there vital administrative functions are always available even if consumer load crashes. See Michael Nygard: Release It! for further information.
For **CPU-bound **work (such as, say, resizing uploaded customer images), it makes sense to configure a bulkhead capacity (considering a call in isolation) in close proportion to the number of processors in the host system, just as you would for a Parallel.For
. Limiting parallelism close to the number of processors prevents undue context-switching: there is usually a sweet spot for performance at or just above the number of processors in the host.
For async operations governing I/O bound work, the picture is more nuanced. A number of calls at any one time may be in the (non-thread/CPU-consuming) await
phase of an async
/await
, and thus it is recommended to set bulkhead capacity (considering a call in isolation) at a significantly higher level than pure thread capacity. This allows .NET to optimise the use of threads between calls engaged and not engaged in actual activity. Optimum configuration will depend on the responsiveness of the governed calls (the amount of time any call spends in await
); there is no real short-cut to performance-tuning for the characteristics of your individual system.
This is a feature where we expect users' individual configurations to vary according to the actions they are governing: to help other users, we would be interested to hear your stories!
maxQueuingActions
provides flexibility by allowing you to limit parallelization, but not immediately reject executions. It also helps maximise throughput by providing for next actions to be ready-and-waiting, as soon as a bulkhead execution slot becomes available.
It is advisable however not to set maxQueuingActions
high, for the following reasons:
In the (current) sync implementation, a queueing item will be blocking a thread. For this reason, we recommend setting maxQueuingActions
only at 0 or 1 in the current synchronous case.
(Future releases of Polly may explore an alternative policy for scheduling synchronous work, such as a SchedulerPolicy
which would schedule work on an underlying TaskScheduler
. This would allow sync work to 'queue' outside a bulkhead without occupying a thread, but requires a new syntax which bridges the existing synchronous Execute()
and async ExecuteAsync()
overloads.)
Ideal configuration may depend on the characteristics of the governed calls, and should be established through experiment. For very low latency, high throughput calls, it may make sense to allow a slightly higher queue level such as 4, so that next actions are always immediately available to fill a bulkhead slot as it becomes available.
There is little point however in setting higher values: preference should be given to getting the primary capacity of the bulkhead right, and then rejecting calls so as to trigger horizontal scaling.
The internal operation of BulkheadPolicy
is thread-safe: multiple calls may safely be placed concurrently through a policy instance.
BulkheadPolicy
instances may be re-used across multiple call sites.
When a bulkhead instance is re-used across call sites, the call sites share the capacity of the bulkhead (allowing you to group actions into bulkheads), rather than each receiving a bulkhead of the given capacity.
When reusing policies, use an ExecutionKey
to distinguish different call-site usages within logging and metrics.
- Home
- Polly RoadMap
- Contributing
- Transient fault handling and proactive resilience engineering
- Supported targets
- Retry
- Circuit Breaker
- Advanced Circuit Breaker
- Timeout
- Bulkhead
- Cache
- Rate-Limit
- Fallback
- PolicyWrap
- NoOp
- PolicyRegistry
- Polly and HttpClientFactory
- Asynchronous action execution
- Handling InnerExceptions and AggregateExceptions
- Statefulness of policies
- Keys and Context Data
- Non generic and generic policies
- Polly and interfaces
- Some policy patterns
- Debugging with Polly in Visual Studio
- Unit-testing with Polly
- Polly concept and architecture
- Polly v6 breaking changes
- Polly v7 breaking changes
- DISCUSSION PROPOSAL- Polly eventing and metrics architecture