-
-
Notifications
You must be signed in to change notification settings - Fork 81
feat: check database and erpc capabilities on health check #691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Introduces `Supavisor.Health`, which provides a function that runs health checks. Added two checks: - Acceptable ERPC latencies: fails if a node has high latency to all other nodes through :erpc. Doesn't run if in a 1 or 2 node cluster. Fails if all requests have latency over 500ms or fail. - Database reachable: fails if can't run a simple query in the database. Calls this function on the health check endpoint, and return 503 if health checks are failing. After some time, if the condition persists, the infrastructure should restart the instance.
v0idpwn
commented
Jul 1, 2025
chasers
requested changes
Jul 1, 2025
Otherwise looks great |
chasers
reviewed
Jul 1, 2025
chasers
approved these changes
Jul 2, 2025
abc3
reviewed
Jul 2, 2025
abc3
reviewed
Jul 2, 2025
🔥 |
Merged
v0idpwn
added a commit
that referenced
this pull request
Jul 29, 2025
### Features - **Authentication cleartext password support** - Added support for cleartext password authentication method (#707) - **Runtime-configurable connection retries** - Support for runtime configuration of connection retries and infinite retries (#705) - **Enhanced health checks** - Check database and eRPC capabilities during health check operations (#691) - **More consistency with postgres on auth errors** - Improves errors in some client libraries (#711) ### Performance Improvements - **Optimized ranch usage** - Supavisor now uses a constant number of ranch instances for improved performance and resource management when hosting a large number of pools (#706) ### Monitoring - **New OS memory metrics** - gives a more accurate picture of memory usage (#704) - **Add a promex plugin for cluster metrics** - for tracking latency and connection status (#690) - **Client connection lifetime metrics** - adds a metric about how long each connection is connected for (#688) - **Process monitoring** - Log when large process heaps and long message queues (#689) ### Bug Fixes - **Client handler query cancellation** - Fixed handling of `:cancel_query` when state is `:idle` (#692) ### Migration Notes - Instances running a small number of pools may see an increase in memory usage. This can be mitigated by changing the ranch shard or the acceptor counts. - If using any of the new used ports, may need to change the defaults - Review monitoring dashboards and include new metrics
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduces
Supavisor.Health
, which provides a function that runs health checks.Added two checks:
Calls this function on the health check endpoint, and return 503 if health checks are failing. After some time, if the condition persists, the infrastructure should restart the instance.