feat: check database and erpc capabilities on health check #691

v0idpwn · 2025-07-01T20:13:41Z

Introduces Supavisor.Health, which provides a function that runs health checks.

Added two checks:

Acceptable ERPC latencies: fails if a node has high latency to all other nodes through :erpc. Doesn't run if in a 1 or 2 node cluster. Fails if all requests have latency over 500ms or fail.
Database reachable: fails if can't run a simple query in the database.

Calls this function on the health check endpoint, and return 503 if health checks are failing. After some time, if the condition persists, the infrastructure should restart the instance.

Introduces `Supavisor.Health`, which provides a function that runs health checks. Added two checks: - Acceptable ERPC latencies: fails if a node has high latency to all other nodes through :erpc. Doesn't run if in a 1 or 2 node cluster. Fails if all requests have latency over 500ms or fail. - Database reachable: fails if can't run a simple query in the database. Calls this function on the health check endpoint, and return 503 if health checks are failing. After some time, if the condition persists, the infrastructure should restart the instance.

lib/supavisor/health.ex

chasers · 2025-07-01T21:27:22Z

Otherwise looks great

lib/supavisor/health.ex

lib/supavisor_web/controllers/tenant_controller.ex

lib/supavisor/health.ex

abc3 · 2025-07-02T15:26:25Z

🔥

### Features - **Authentication cleartext password support** - Added support for cleartext password authentication method (#707) - **Runtime-configurable connection retries** - Support for runtime configuration of connection retries and infinite retries (#705) - **Enhanced health checks** - Check database and eRPC capabilities during health check operations (#691) - **More consistency with postgres on auth errors** - Improves errors in some client libraries (#711) ### Performance Improvements - **Optimized ranch usage** - Supavisor now uses a constant number of ranch instances for improved performance and resource management when hosting a large number of pools (#706) ### Monitoring - **New OS memory metrics** - gives a more accurate picture of memory usage (#704) - **Add a promex plugin for cluster metrics** - for tracking latency and connection status (#690) - **Client connection lifetime metrics** - adds a metric about how long each connection is connected for (#688) - **Process monitoring** - Log when large process heaps and long message queues (#689) ### Bug Fixes - **Client handler query cancellation** - Fixed handling of `:cancel_query` when state is `:idle` (#692) ### Migration Notes - Instances running a small number of pools may see an increase in memory usage. This can be mitigated by changing the ranch shard or the acceptor counts. - If using any of the new used ports, may need to change the defaults - Review monitoring dashboards and include new metrics

v0idpwn requested a review from a team as a code owner July 1, 2025 20:13

v0idpwn added 3 commits July 1, 2025 17:14

fix comment

4c081db

simplify code

70247e7

fix typespec

99e594c

v0idpwn commented Jul 1, 2025

View reviewed changes

lib/supavisor/health.ex Show resolved Hide resolved

github-advanced-security bot found potential problems Jul 1, 2025

View reviewed changes

lib/supavisor/health.ex Fixed Show fixed Hide fixed

make credo happy

891db37

chasers requested changes Jul 1, 2025

View reviewed changes

lib/supavisor/health.ex Outdated Show resolved Hide resolved

chasers reviewed Jul 1, 2025

View reviewed changes

lib/supavisor/health.ex Show resolved Hide resolved

chasers approved these changes Jul 2, 2025

View reviewed changes

abc3 reviewed Jul 2, 2025

View reviewed changes

lib/supavisor/health.ex Outdated Show resolved Hide resolved

abc3 reviewed Jul 2, 2025

View reviewed changes

lib/supavisor_web/controllers/tenant_controller.ex Outdated Show resolved Hide resolved

v0idpwn added 5 commits July 2, 2025 10:35

erpc:call -> erpc:multicall

ffa9b9f

feat: service unavailable page, showing failed checks

2f7e25e

fix: log failed erpc ping requests

647b02f

fix test name

3b142b3

styles: move comment, pipe

63a3ac5

github-advanced-security bot found potential problems Jul 2, 2025

View reviewed changes

lib/supavisor/health.ex Fixed Show fixed Hide fixed

reduce nesting (credo)

ab13cb6

v0idpwn merged commit 89db1b2 into main Jul 2, 2025
19 of 22 checks passed

v0idpwn deleted the feat/health-check branch July 2, 2025 15:27

v0idpwn mentioned this pull request Jul 28, 2025

chore: bump to v2.6.0 #712

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: check database and erpc capabilities on health check #691

feat: check database and erpc capabilities on health check #691

Uh oh!

v0idpwn commented Jul 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chasers commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abc3 commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: check database and erpc capabilities on health check #691

feat: check database and erpc capabilities on health check #691

Uh oh!

Conversation

v0idpwn commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chasers commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abc3 commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

v0idpwn commented Jul 1, 2025 •

edited

Loading