[issue#306] Add missing ClusterRoles #465

elfiesmelfie · 2023-09-14T15:51:35Z

The cluster-monitoring-operator is required for STF to install. It creates the required alertmanager-main and prometheus-k8s. ClusterRoles, and STF relies on these being present. These are not present when using CRC, so ClusterRoles need to be explicitly created.

The names of the ClusterRoles have been updated, in case there is some conflict when cluster-monitoring-operator is installed after STF.

This is a workaround for not having cluster-monitoring-operator installed: #306

resolves #306

elfiesmelfie · 2023-09-14T17:42:19Z

The Jenkins job timed out 😞

leifmadsen · 2023-09-14T17:55:50Z

The Jenkins job timed out 😞

There is a bug. It times out on the validation because all components are not operational. Here is the first error I found in the STO logs:

 TASK [Bind the local prometheus SA to prometheus cluster role (for oauth perms)] ******************************** 
fatal: [localhost]: FAILED! => {"changed": false, "error": 422, "msg": "Failed to patch object: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"ClusterRoleBinding.rbac.authorization.k8s.io \\\\\"prometheus-k8s-service-telemetry\\\\\" is invalid: roleRef: Invalid value: rbac.RoleRef{APIGroup:\\\\\"rbac.authorization.k8s.io\\\\\", Kind:\\\\\"ClusterRole\\\\\", Name:\\\\\"prometheus-stf\\\\\"}: cannot change roleRef\",\"reason\":\"Invalid\",\"details\":{\"name\":\"prometheus-k8s-service-telemetry\",\"group\":\"rbac.authorization.k8s.io\",\"kind\":\"ClusterRoleBinding\",\"causes\":[{\"reason\":\"FieldValueInvalid\",\"message\":\"Invalid value: rbac.RoleRef{APIGroup:\\\\\"rbac.authorization.k8s.io\\\\\", Kind:\\\\\"ClusterRole\\\\\", Name:\\\\\"prometheus-stf\\\\\"}: cannot change roleRef\",\"field\":\"roleRef\"}]},\"code\":422}\\n'", "reason": "Unprocessable Entity", "status": 422}

roles/servicetelemetry/tasks/component_alertmanager.yml

roles/servicetelemetry/tasks/component_prometheus.yml

csibbitt

Since we're mucking about in here, I'd like to take the opportunity to try to narrow this scope as much as possible. It looks pretty good right now, but I think there are still a few things than can move from the cluster scope to the namespaced scope.

csibbitt · 2023-09-18T16:30:46Z

deploy/role.yaml

@@ -22,6 +22,7 @@ rules:
 - watch
 - update
 - patch
+ - delete


Noting that this should be transitional only and could be removed in a later version.

I was going to file an issue about this, but then I realized that if someone was somehow running STF 1.5.2 and moved to STF 1.5.4 they would need this.

Additionally, the latest changes I landed in 5d0210d will continually require the delete RBAC.

We could refine this though and break out the clusterroles and clusterrolebindings though to have the delete resource only on the clusterrolebindings. I'm also curious if we could further lock this down to just resources we manage and not all ClusterRoles and ClusterRoleBindings.

I am not going to do any of this at this point though, but I'm certainly interested about doing another RBAC focused review, and if we can make the scope even more restrictive and controlled for only objects we actually need to adjust and own.

roles/servicetelemetry/tasks/component_prometheus.yml

roles/servicetelemetry/tasks/component_alertmanager.yml

leifmadsen · 2023-09-19T00:36:48Z

Requesting final review prior to merge!

leifmadsen · 2023-09-19T03:05:28Z

CI timed out again. I think the Jenkins-based CI systems are just overwhelmed.

leifmadsen · 2023-09-21T00:13:11Z

I'm skipping out on approving this since I landed a fair number of changes here, and it'd effectively be self-approving.

Deferring to Chris and Victoria on this one, but I think we've all probably looked at this enough that if CI passes, landing it isn't going to be a problem for anyone. Thanks everyone for the patience and quick turn around on the reviews!

The cluster-monitoring-operator is required for STF to install. It creates the required alertmanager-main and prometheus-k8s. ClusterRoles, and STF relies on these being present. These are not present when using CRC, so ClusterRoles need to be explicitly created. The names of the ClusterRoles have been updated, in case there is some conflict when cluster-monitoring-operator is installed after STF. This is a workaround for not having cluster-monitoring-operator installed: #306 resolves #306

Fix up the RBAC changes to fully get prometheus-stf working and decoupled from prometheus-k8s. Changes to using a separate prometheus-stf ClusterRole, ClusterRoleBinding, and ServiceAccount, along with a Role and RoleBinding, all using prometheus-stf as the ServiceAccount. Also updates the Alertmanager configuration to use alertmanager-stf instead of alertmanager-main.

* Refactor smoketest script Perform a bit of smoketest refactoring and fix up a few bugs. * Update alert trigger to use startsAt in order to potentially speed up delivery of the alerts. Failures in the SNMP_WEBHOOK_STATUS seems to be primarily to delayed alert notification through prometheus-snmp-webhook. * Add an alert clean up task as part of the clean up logic at the end. * Update openssl x509 to not use the -in flag which seems unnecessary and on some systems causes a failure. * Add new SMOKETEST_VERBOSE boolean so local testing can skip massive amounts of information dumped to stdout. * Remove curl pod using label selector for slightly cleaner output. * Update failure check to combine RET and SNMP_WEBHOOK_STATUS since testing seems to show changes are slightly more reliable. * Show logs from curl

As part of least priviledge work, remove the nodes/metrics permission as we're not scraping nodes for information. Everything appears to continue working in STF without this permission.

Working on simplifying and reducing our access scope as much as possible. It appears moving SCC RBAC from ClusterRole to Role allows things to continue to work with Prometheus. It's possible further testing may reveal this will need to reverted.

Convert alertmanager-stf Role to ClusterRole as the tokenreviews and subjectaccessreviews resources need to be accessable at the cluster scope.

* Create ClusterRoleBinding and Role for alertmanager Create appropriate ClusterRoleBinding and Role for alertmanager-stf, breaking out SCC into a Role vs ClusterRole to keep things in alignment to prometheus-stf RBAC setup. * Adjust smoketest.sh for SNMP webhook test failures Adjust the smoketest script to also fail when the SNMP webhook test has failed. Add a wait condition for the curl pod to complete so logs can be retrieved. * Add *RoleBinding rescue capabilities If changes happen to the ClusterRoleBinding or RoleBinding then generally the system is not going to allow you to patch the object. Adds block/rescue logic to remove the existing ClusterRoleBinding or RoleBinding before creating it when patching the object fails.

leifmadsen · 2023-09-21T14:54:26Z

LET'S GO!!!!!!!

elfiesmelfie requested review from csibbitt, leifmadsen and vkmc and removed request for csibbitt September 14, 2023 15:52

leifmadsen approved these changes Sep 14, 2023

View reviewed changes

leifmadsen enabled auto-merge (squash) September 14, 2023 17:02

leifmadsen reviewed Sep 14, 2023

View reviewed changes

roles/servicetelemetry/tasks/component_alertmanager.yml Outdated Show resolved Hide resolved

leifmadsen reviewed Sep 14, 2023

View reviewed changes

roles/servicetelemetry/tasks/component_prometheus.yml Outdated Show resolved Hide resolved

leifmadsen self-requested a review September 15, 2023 14:30

leifmadsen disabled auto-merge September 15, 2023 14:33

leifmadsen reviewed Sep 15, 2023

View reviewed changes

roles/servicetelemetry/tasks/component_prometheus.yml Show resolved Hide resolved

leifmadsen self-requested a review September 15, 2023 14:56

csibbitt reviewed Sep 18, 2023

View reviewed changes

leifmadsen requested a review from csibbitt September 19, 2023 00:36

leifmadsen force-pushed the issue/306 branch from ed7bc56 to 3d72920 Compare September 19, 2023 00:45

leifmadsen force-pushed the issue/306 branch from 3d72920 to 3f34a70 Compare September 19, 2023 18:26

leifmadsen requested a review from ayefimov-1 September 21, 2023 00:07

elfiesmelfie and others added 7 commits September 21, 2023 08:38

Fix smoketest to use prometheus-stf for token retrieval

e3306bd

Remove nodes/metrics permission from ClusterRole

557d75f

As part of least priviledge work, remove the nodes/metrics permission as we're not scraping nodes for information. Everything appears to continue working in STF without this permission.

Move SCC RBAC from ClusterRole to Role

a4787a3

Working on simplifying and reducing our access scope as much as possible. It appears moving SCC RBAC from ClusterRole to Role allows things to continue to work with Prometheus. It's possible further testing may reveal this will need to reverted.

Convert alertmanager-stf Role to ClusterRole (#473)

c525fb0

Convert alertmanager-stf Role to ClusterRole as the tokenreviews and subjectaccessreviews resources need to be accessable at the cluster scope.

leifmadsen force-pushed the issue/306 branch from ed684b9 to abe966b Compare September 21, 2023 12:38

csibbitt approved these changes Sep 21, 2023

View reviewed changes

leifmadsen merged commit 805ada4 into master Sep 21, 2023
11 checks passed

leifmadsen deleted the issue/306 branch September 21, 2023 14:54

leifmadsen mentioned this pull request Oct 26, 2023

Import master to stable-1.5 #509

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[issue#306] Add missing ClusterRoles #465

[issue#306] Add missing ClusterRoles #465

elfiesmelfie commented Sep 14, 2023

elfiesmelfie commented Sep 14, 2023

leifmadsen commented Sep 14, 2023

csibbitt left a comment

csibbitt Sep 18, 2023

leifmadsen Sep 21, 2023 •

edited

Loading

leifmadsen commented Sep 19, 2023

leifmadsen commented Sep 19, 2023

leifmadsen commented Sep 21, 2023

leifmadsen commented Sep 21, 2023

@@ @@ -22,6 +22,7 @@ rules: @@
  - watch
  - update
  - patch
+ - delete

[issue#306] Add missing ClusterRoles #465

[issue#306] Add missing ClusterRoles #465

Conversation

elfiesmelfie commented Sep 14, 2023

elfiesmelfie commented Sep 14, 2023

leifmadsen commented Sep 14, 2023

csibbitt left a comment

Choose a reason for hiding this comment

csibbitt Sep 18, 2023

Choose a reason for hiding this comment

leifmadsen Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

leifmadsen commented Sep 19, 2023

leifmadsen commented Sep 19, 2023

leifmadsen commented Sep 21, 2023

leifmadsen commented Sep 21, 2023

leifmadsen Sep 21, 2023 •

edited

Loading