Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthchecker for the guardians #1535

Open
xmonader opened this issue May 21, 2024 · 2 comments
Open

Healthchecker for the guardians #1535

xmonader opened this issue May 21, 2024 · 2 comments
Assignees
Milestone

Comments

@xmonader
Copy link
Contributor

xmonader commented May 21, 2024

A separate component to be ran by the guardians to execute healthchecks / benchmarks on a VM along with ZOS checks on the node

Spawner

Another component that's supposed to launch (deploy) these VMs with healthchecks on a predefined farms or even randomly on farms (needs to happen on all nodes in the farm)

Healthchecker VM

Should execute benchmark tests for
1- CPU
2- Disk
3- Network

And exposes them on an endpoint or a way to notify some other component

Aggregator/Collector

Should be able to collect test results from the VMs deployed either via pulling or via webhook

Syncing the results

That information from each guardian should be propagated to other guardians so that can be using maybe etcd or some other component, not sure if introducing a tendermint cluster would be useful, given thees results are collected every 3 mins

@AbdelrahmanElawady
Copy link
Contributor

Some questions and notes regarding the structure of various components after discussion with @ashraffouda :

VMs

We think designing the VMs to push benchmarks results to some aggregator instead of exposing them to a polling aggregator will make it easier to handle as we won't maintain a list of IPs in the aggregator or some sort of service discovery, it just waits for a request with benchmark results.

It will need of course secure communication so no other actor can send an invalid benchmark results.

DB

It's not clear which type of data we will keep so which database we will be using. Will we store all benchmarks over a period of time in a time-series way or we just store latest, or last 3 results? It is not clear yet.

Spawner

Will it spawn a number of VMs then exit leaving the VMs running? or it will spawn and wait for the benchmarks to run once then delete the VMs?

@xmonader
Copy link
Contributor Author

xmonader commented Aug 1, 2024

@xmonader xmonader modified the milestones: 3.15, 3.16 Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants