You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a new command and/or a new section to cortex cluster info that aggregates the health of Cortex processes.
A user might have the perception that everything it's okay with the cluster when in fact a specific component might be failing silently. An example would be prometheus not being deployed correctly and therefore preventing the autoscaler and grafana from working correctly.
Here are a few resources that can be scanned to determine overall cluster health.
verify that all of the critical Cortex pods are running
batch, task crons should be running as expected
operator
prometheus
grafana
autoscaler
cluster autoscaler
events in istio resources such as the service and loadbalancer
API autoscaler crons can be rolled into their respective API statuses.
Add a new command and/or a new section to
cortex cluster info
that aggregates the health of Cortex processes.A user might have the perception that everything it's okay with the cluster when in fact a specific component might be failing silently. An example would be prometheus not being deployed correctly and therefore preventing the autoscaler and grafana from working correctly.
Here are a few resources that can be scanned to determine overall cluster health.
API autoscaler crons can be rolled into their respective API statuses.
One potential design can be:
The text was updated successfully, but these errors were encountered: