-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem? Please describe.
Since we shipped https://github.com/akkadotnet/Akka.Hosting/releases/tag/1.5.47-beta1, we've essentially moved Akka.NET's health check implementation from https://github.com/petabridge/akkadotnet-healthcheck to Akka.Hosting - a net positive change that greatly reduces the amount of configuration overhead and installed NuGet packages needed to power Akka.NET's health check system.
However, we are missing one beloved feature from Akka.HealthChecks - the Akka.Persistence checks, which were implemented using the infamous SuicideProbe:
We need to bring some semblance of this functionality back into the picture in Akka.Hosting's health check implementation.
Describe the solution you'd like
The SuicideProbe, despite its amusing and delightful name, was a bit problematic:
- Accidentally polluted journals / snapshot stores - took several rounds of bug-fixing to get right;
- Resulted in writeable / billable units for customers running on cloud providers - health checks should run as close to zero cost as possible;
- Was a rather complex and somewhat fragile piece of infrastructure; and
- Like most of the health checks in Akka.HealthCheck, it was too aggressive - a single persist / recover failure could trigger a liveness check failure. We tried adjusting this via how its parent, the
AkkaPersistenceLivenessProbehandled retires, but that proved to be a bit unwieldy too.
So what I'm proposing is we add a new virtual method to the AsyncWriteJournal and SnapshotStore base classes:
enum HealthCheckResult{
Healthy = 0,
Degraded = 1, // transient failures
Unhealthy = 2 // irrecoverable failures
}
public virtual Task<HealthCheckResult> CheckHealthAsync(CancellationToken ct = default);I think the default base class implementation could just use the CircuitBreaker's Open/Closed status, since that would be a reliable method for determining whether or not the plugin was struggling to perform its work over a recent period of time - AND we could reset the healthcheck status from Degraded --> Healthy when the CircuitBreaker resets.
Describe alternatives you've considered
In some specific Akka.Persistence plugins, such as Akka.Persistence.Sql, you could implementing something akin to the EF Core health checks, which try to open a connection using the provided connection string:
Additional context
Having the base class implementation be virtual, rather than abstract, ensures that this won't be a breaking change that requires all plugins to be recompiled AND ensures that we can do something useful with the private CircuitBreaker fields.