Healthchecker for the guardians #1535

xmonader · 2024-05-21T08:25:41Z

A separate component to be ran by the guardians to execute healthchecks / benchmarks on a VM along with ZOS checks on the node

Spawner

Another component that's supposed to launch (deploy) these VMs with healthchecks on a predefined farms or even randomly on farms (needs to happen on all nodes in the farm)

Healthchecker VM

Should execute benchmark tests for
1- CPU
2- Disk
3- Network

And exposes them on an endpoint or a way to notify some other component

Aggregator/Collector

Should be able to collect test results from the VMs deployed either via pulling or via webhook

Syncing the results

That information from each guardian should be propagated to other guardians so that can be using maybe etcd or some other component, not sure if introducing a tendermint cluster would be useful, given thees results are collected every 3 mins

AbdelrahmanElawady · 2024-06-25T13:21:44Z

Some questions and notes regarding the structure of various components after discussion with @ashraffouda :

VMs

We think designing the VMs to push benchmarks results to some aggregator instead of exposing them to a polling aggregator will make it easier to handle as we won't maintain a list of IPs in the aggregator or some sort of service discovery, it just waits for a request with benchmark results.

It will need of course secure communication so no other actor can send an invalid benchmark results.

DB

It's not clear which type of data we will keep so which database we will be using. Will we store all benchmarks over a period of time in a time-series way or we just store latest, or last 3 results? It is not clear yet.

Spawner

Will it spawn a number of VMs then exit leaving the VMs running? or it will spawn and wait for the benchmarks to run once then delete the VMs?

xmonader · 2024-08-01T11:35:07Z

Benchmarking flist https://github.com/threefoldtech/tf-images/pull/272/files

https://github.com/threefoldtech/guardians_healthchecker

xmonader assigned ashraffouda May 21, 2024

xmonader added the type_story label May 21, 2024

xmonader added this to the 3.14 milestone May 21, 2024

xmonader mentioned this issue May 21, 2024

gather more info from 3nodes #1453

Closed

xmonader modified the milestones: 3.14, 3.15 Jun 3, 2024

ashraffouda mentioned this issue Jun 24, 2024

Healthchecker for the guardians threefoldtech/guardians_healthchecker#1

Open

AbdelrahmanElawady mentioned this issue Jul 1, 2024

Add benchmark flist threefoldtech/tf-images#272

Merged

xmonader modified the milestones: 3.15, 3.16 Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Healthchecker for the guardians #1535

Healthchecker for the guardians #1535

xmonader commented May 21, 2024 •

edited

Loading

AbdelrahmanElawady commented Jun 25, 2024

xmonader commented Aug 1, 2024

Healthchecker for the guardians #1535

Healthchecker for the guardians #1535

Comments

xmonader commented May 21, 2024 • edited Loading

Spawner

Healthchecker VM

Aggregator/Collector

Syncing the results

AbdelrahmanElawady commented Jun 25, 2024

VMs

DB

Spawner

xmonader commented Aug 1, 2024

xmonader commented May 21, 2024 •

edited

Loading