Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Add health-check (liveness and readiness) probes for Zebra #8830

Open
gustavovalverde opened this issue Sep 2, 2024 · 2 comments
Open
Labels
A-rpc Area: Remote Procedure Call interfaces C-enhancement Category: This is an improvement C-feature Category: New features I-usability Zebra is hard to understand or use P-High 🔥 S-needs-triage Status: A bug report needs triage

Comments

@gustavovalverde
Copy link
Member

Motivation

If we're deploying some/several Zebra nodes, and we need to continuously confirm if those nodes are running and behaving as expected, we should have a way to validate the nodes are live and ready, which would be defined as:

  • Readiness: Zebra is up, it's able to connect to other nodes, and respond to requests. If
  • Liveness: The node is running, and it's synced and/or progressing with the sync. Ideally we should have states for this:
    • Ready to sync
    • Syncing
    • Synced

This would make nodes management easier, and more automatic. Here is an explanation on how this would work with Kubernetes, but this applies to most container orchestration tools.

Specifications

Any solution should consider this endpoints will be publicly available and thus their security implications should be thoroughly evaluated.

Requirements and examples from other projects:

Complex Code or Requirements

  • If using the RPC endpoints, then we should enable those in our nodes, and not allow public access to them, but just from our infrastructure. For example, we might be able to ping getblockchaininfo (sync progress) or getinfo (liveness) instead; and we could add more fields.
  • This wouldn't be a solution for everyone, but it would be a starting point

Testing

No response

Related Work

This was previously requested and partially done:

@gustavovalverde gustavovalverde added C-enhancement Category: This is an improvement S-needs-triage Status: A bug report needs triage I-usability Zebra is hard to understand or use A-rpc Area: Remote Procedure Call interfaces C-feature Category: New features P-High 🔥 labels Sep 2, 2024
@github-project-automation github-project-automation bot moved this to New in Zebra Sep 2, 2024
@mpguerra mpguerra moved this from New to Sprint Backlog in Zebra Sep 3, 2024
@mpguerra mpguerra removed this from Zebra Sep 9, 2024
@mpguerra
Copy link
Contributor

@gustavovalverde did we do anything about this in the end?

@gustavovalverde
Copy link
Member Author

@mpguerra no, but we should. Even more considering a few community members are running Kubernetes clusters with Zebra.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rpc Area: Remote Procedure Call interfaces C-enhancement Category: This is an improvement C-feature Category: New features I-usability Zebra is hard to understand or use P-High 🔥 S-needs-triage Status: A bug report needs triage
Projects
None yet
Development

No branches or pull requests

2 participants