-
Notifications
You must be signed in to change notification settings - Fork 59
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: updates resources for OO and P-O
* fix: updates resources for OO and p-o Problem: we pull the Prometheus Operator (p-o) deployment from the upstream repo as a dependency. However this manifest sets a very low limit to the p-o resources, this limit is easily hit when the operator is managing multiple Prometheus instances. Solution: remove current limits and load testing on OO and p-o and observe the resources they consume. Establish a baseline for both and then multiply that baseline by 3 and give some headroom Issue https://issues.redhat.com/browse/MON-2648 Closes #166 Co-authored-by: Sunil Thaha <sthaha@redhat.com>
- Loading branch information
1 parent
95fe81a
commit 8658ccf
Showing
4 changed files
with
119 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
|
||
# Procedure to assess resources used by Observability Operator | ||
|
||
1. Provision an OpenShift cluster | ||
|
||
2. Run `oc apply -f hack/olm/catalog-src.yaml` to install the Observability Operator (OO) catalogue. | ||
|
||
3. Using the UI install OO | ||
|
||
4. Scale down the following deployments, so we can remove the currently set limits on OO: | ||
|
||
```bash | ||
# Scale down the cluster version operator | ||
oc -n openshift-cluster-version scale deployment.apps/cluster-version-operator --replicas=0 | ||
# Scale down the OLM operator | ||
oc -n openshift-operator-lifecycle-manager scale deployment.apps/olm-operator --replicas=0 | ||
``` | ||
|
||
5. Edit the OO and Prometheus Operator deployment to remove it's limits with: | ||
|
||
```bash | ||
oc -n openshift-operators patch deployment.apps/observability-operator --type='json' -p='[{"op": "remove", "path": "/spec/template/spec/containers/0/resources/limits"}]' | ||
oc -n openshift-operators patch deployment.apps/observability-operator-prometheus-operator --type='json' -p='[{"op": "remove", "path": "/spec/template/spec/containers/0/resources/limits"}]' | ||
``` | ||
|
||
6. Run the load tests with `./hack/loadtest/test.sh` | ||
|
||
7. Using the OpenShift UI in the Developer tab, navigate to Observe and input the following querries. | ||
1. For memory we should look at `container_memory_rss` as that is the metric used by kubelet to OOM kill the container | ||
2. For CPU we should look at `container_cpu_usage_seconds_total` as that is the metric used by kubelet | ||
|
||
```bash | ||
# PromQL for memory | ||
container_memory_rss{container!~"|POD", namespace="openshift-operators"} | ||
# PromQL for CPU | ||
sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace='openshift-operators'}) by (pod) | ||
``` | ||
|
||
8. Take for both OO and Prometheus Operator measurements of their preformance | ||
1. Establish a baseline for both CPU and memory (minimum they consume), those will be our `requests` | ||
2. Multiply that value by 3 and validate that it fits the intervals of values observed, those will be our `limits` | ||
3. Give some extra head room to `limits` to anticipate feature growth |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -e -u -o pipefail | ||
trap cleanup INT | ||
|
||
# Functions that given a number it creates a namespace | ||
# and in that namespace it creates a monitoring stack | ||
create_monitoring_stack() { | ||
|
||
local stack_number=$1; shift | ||
local ms_name=stack-$stack_number | ||
local namespace=loadtest-$stack_number | ||
|
||
monitoring_stack=$(cat <<- EOF | ||
apiVersion: monitoring.rhobs/v1alpha1 | ||
kind: MonitoringStack | ||
metadata: | ||
name: ${ms_name} | ||
namespace: ${namespace} | ||
labels: | ||
load-test: test | ||
spec: | ||
logLevel: debug | ||
retention: 15d | ||
resourceSelector: | ||
matchLabels: | ||
load-test-instance: ${ms_name} | ||
EOF | ||
) | ||
|
||
kubectl create namespace "$namespace" | ||
echo "$monitoring_stack" | kubectl -n "$namespace" apply -f - | ||
} | ||
|
||
cleanup() { | ||
echo "INFO: cleaning up all namespaces" | ||
kubectl delete ns loadtest-{1..10} | ||
} | ||
|
||
main() { | ||
# Goal: create 10 monitoring stack CRs, wait for OO to | ||
# reconcile and then clean-up | ||
|
||
echo "INFO: Running load test" | ||
for ((i=1; i<=10; i++)); do | ||
create_monitoring_stack "$i" | ||
done | ||
|
||
# Give some time for OO to reconcile all the MS | ||
# and create the necessary resources | ||
local timeout=180 | ||
echo "INFO: sleeping for $timeout" | ||
sleep "$timeout" | ||
|
||
cleanup | ||
} | ||
|
||
main "$@" |