This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI, TEI-Embedding, TEI-Reranking and other microservices, and PCM.
For monitoring Helm installed OPEA applications, see Helm monitoring option.
git clone https://github.com/opea-project/GenAIInfra.git
cd kubernetes-addons/Observability
Setting up Prometheus and Grafana is essential for monitoring and visualizing your workloads. Follow these steps to get started:
kubectl create ns monitoring
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-stack prometheus-community/kube-prometheus-stack --version 55.5.1 -n monitoring
kubectl get pods -n monitoring
kubectl port-forward service/grafana 3000:80
Open your browser and navigate to http://localhost:3000. Use "admin/prom-operator" as the username and the password to login.
To monitor Gaudi hardware metrics, you can use the following steps:
kubectl create -f https://vault.habana.ai/artifactory/gaudi-metric-exporter/yaml/1.16.2/metric-exporter-daemonset.yaml
kubectl create -f https://vault.habana.ai/artifactory/gaudi-metric-exporter/yaml/1.16.2/metric-exporter-service.yaml
kubectl apply -f ./habana/metric-exporter-serviceMonitor.yaml
# To get the metric endpoints, e.g. to get first endpoint to test
habana_metric_url=`kubectl -n monitoring get ep metric-exporter -o jsonpath="{.subsets[].addresses[0].ip}:{..subsets[].ports[0].port}"`
# Fetch the metrics
curl ${habana_metric_url}/metrics
# you will see the habana metric data like this:
process_resident_memory_bytes 2.9216768e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.71394960963e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.862641152e+09
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 125
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
Manually import the Dashboard-Gaudi-HW.json
file into Grafana
To monitor OPEA application metrics including TGI-gaudi, TEI, TEI-Reranking and other micro services, you can use the following steps:
Install Helm (version >= 3.15) first. Refer to the Helm Installation Guide for more information.
Install OPEA application as described in Helm charts README.
For example, to install ChatQnA, follow ChatQnA helm chart for instructions on deploying it to Kubernetes.
Make sure to enable Helm monitoring option.
Here are few Grafana dashboards for monitoring different aspects of OPEA applications:
queue_size_embedding_rerank_tgi.json
: queue size of TGI-gaudi, TEI-Embedding, TEI-rerankingtgi_grafana.json
:tgi-gaudi
text generation inferencing service utilizationopea-scaling.json
: scaling, request rates and failures for OPEA application megaservice, TEI-reranking, TEI-embedding, and TGI
You can either:
- Import them manually to Grafana,
- Use
update-dashboards.sh
script to add them to Kubernetes as Grafana dashboard configMaps- (Script assumes Prometheus / Grafana to be installed according to above instructions)
- Or create your own dashboards based on them
Note: when dashboard is imported to Grafana, you can directly save changes to it, but those dashboards go away if Grafana is removed / re-installed.
Whereas with dashboard configMaps, Grafana saves changes to a selected file, but you need to remember to re-apply them to Kubernetes / Grafana, for your changes to be there when that dashboard is reloaded.
Please refer to this repo to install Intel® PCM
modify the pcm/pcm-service.yaml
file to set the addresses
kubectl apply -f pcm/pcm-service.yaml
kubectl apply -f pcm/pcm-serviceMonitor.yaml
manually import the pcm-dashboard.json
file into the Grafana
GenAIEval repository includes additional dashboards.