This repository contains code that invokes the galileo shell to start experiments.
It also includes deployment files for necessary components and describes in detail which other services are
required to start an experiment.
Further, this project aims to unify different sub-projects of edgerun
and make deployment and experiment setup easy.
The main goals and functionalities are:
- A framework for distributed load testing experiments
- Fine-grained telemetry data collection
- HTTP Trace recording for any service
- A container orchestration adaption for ease of use
For everyone that wants to see resource usage and application performance with easy configurable workload creation. All components are tailored and suited to run on low performance devices (i.e., Raspberry Pi) but can run on default server VMs too.
Common questions that can be answered by performing and analyzing Galileo experiments:
- How much CPU usage does my application use?
- What is the average execution time of my application across the cluster?
- What are the differences in terms of resource usage between two nodes hosting the same application?
- What is the impact of having multiple applications running on one node?
All these questions can be easily answered and have a simple flow in common:
- Deploy base infrastructure
- Deploy my application
- Start requests
- Analyze in Jupyter Notebooks
In summary: simple profiling tasks. But, this framework also targets full end-to-end tests to evaluate important cluster components (i.e, load balancer, scheduler and scaling).
Therefore, experiments can be done to evaluate new implementations for the aforementioned components.
The cluster setup consists of the following main components:
- Kubernetes
- Galileo (for clients and experiment shell)
- Telemd
- Controller (i.e., the load balancer)
- MySQL (i.e., MariaDB)
- InfluxDB v2
- Etcd (Kubernetes requires an instance to run!)
Kubernetes is used to host the clients (galileo running in a Pod), telemetry agents (telemd), load balancer and the applications to test.
Redis is used as a pub/sub system through which all data is sent (i.e., telemetry) and recorded by the Galileo Shell (i.e., the program that prepares and executes an experiment).
The Galileo Shell persists data in MariaDB and InfluxDB.
The provided load balancer implementation uses etcd to watch for weights for the round-robin algorithm and galileo uses redis to provide the clients with routing rules (rtbl
).
The figure above depicts all components and also highlights important interactions. Those interactions are in short:
- Client nodes send HTTP requests to the Controller (load balancer), which forwards requests to the worker nodes which host the application pods.
- Client nodes report the results of each request (i.e., trace) via Redis.
- The Go-based load balancer implementation fetches weights and ip addresses from the etcd instance.
- The clients get the routing rules from the Redis instance (set via
rtbl
from Galileo) - Worker nodes report resource usage (i.e., telemd) via Redis, which is saved in InfluxDB
- The Galileo Shell starts the experiments and saves metadata (i.e., the cluster hosts, misc. data) in the MariaDB
This project provides deployment files for the following components:
- Galileo (for clients and experiment shell)
- Telemd
- Controller (i.e., the load balancer)
Which leaves the following components to be additionally deployed:
- Kubernetes
- MySQL (i.e., MariaDB)
- InfluxDB v2
- Etcd (Kubernetes requires an instance to run!)
Deployment files can be found in deployment/kubernetes
.
Note, that we use Kubernetes node labels to schedule the workers (i.e., clients).
On nodes that should act as clients execute the following command:
kubectl label node <node> node-role.kubernetes.io/client=true
The following label is used to identify nodes with hosting capabilities (i.e., workers):
kubectl label node <node> node-role.kubernetes.io/worker=true
If you have multiple zones (i.e., clusters) in which you want to have seperate clients, adapt the zone
arguments (default is main
).
You can easily group your nodes by labelling it with the following command:
kubectl label node <node> ether.edgerun.io/zone=main
Further, telemd
also offers support to monitor GPUs.
Therefore, you have to label your nodes accordingly:
kubectl label node <node> telemd.edgerun.io/mode=[cpu|gpu]
See more information for GPU monitoring in the GPU support branch.
The galileo workers run on each client node and is connected via Redis to receive commands and also the routing rules. Routing rules are simple key-value pairs, whereas the key represents your service name and the value is a list of hosts with the respective weight.
You can easily set these in your program via rtbl.set('service', ['127.0.0.1:8080], [1])
.
Galileo requires the following data components that are either deployed in the cluster or externally:
- Redis (pub/sub for telemetry and traces)
- MySQL (i.e., MariaDB) (persistent storage for experiment metadata)
- InfluxDB v2 (stores runtime data - telemetry and traces)
All connection parameters are set via environment variables.
The extension repository is meant to provide examples on how to implement and use the project to run experiments. It will be continually updated and include new services.
The following table shows all environment variables to be set.
For ease of use a .env
is included which includes all variables (under bin/.env
).
Variable | Default | Description |
---|---|---|
galileo_expdb_driver | mixed | Uses a SQL database to store experiment metadata and InfluxDB to store runtime data (i.e., traces) |
galileo_logging_level | DEBUG | Logger level (DEBUG, INFO, WARN, ERROR) |
galileo_expdb_mysql_host | localhost | MySQL host |
galileo_expdb_mysql_port | 3307 | MySQL port |
galileo_expdb_mysql_db | db | MySQL database |
galileo_expdb_mysql_user | user | MySQL user |
galileo_expdb_mysql_password | password | MySQL password |
galileo_expdb_influxdb_url | http://localhost:8086 | InfluxDB url |
galileo_expdb_influxdb_token | auth-token | InfluxDB authentication token |
galileo_expdb_influxdb_timeout | 10000 | InfluxDB timeout in ms |
galileo_expdb_influxdb_org | org | InfluxDB organization name |
galileo_expdb_influxdb_org_id | org-id | InfluxDB organization ID |
galileo_redis_host | localhost | Redis host |
galileo_redis_password | optional | Redis port |
KUBECONFIG | not set | Path to the kubeconfig |