fix openshift values to default with in-cluster prometheus #3721

mittal-ishaan · 2024-10-25T19:25:45Z

What does this PR change?

This PR updates the openshift values to have default properties set to deploy kubecost with in-cluster prometheus. (As this is what most users want)
Adds clarifying comment in the template while binding the cluster role to the kubecost service account
Moves in-cluster prometheus service account namespace name to values to make it more generalized and to cover the case where it can be of any other name.
Add a pre-install hook to the Security Context Constraints (SCC) which gets install when network costs is enabled in openshift environment. This is done to ensure that SSC gets installed before the daemonSet during helm install to prevent potential install failures as network costs daemonSet depends on it.

Does this PR rely on any other PRs?

No

How does this PR impact users? (This is the kind of thing that goes in release notes!)

Added standard configuration template for openshift to use the openshift-monitoring prometheus

Links to Issues or tickets this PR addresses or fixes

What risks are associated with merging this PR? What is required to fully test this PR?

No risk, Testing that it gets deployed on openshift cluster easily and successfully need to be done.

How was this PR tested?

Installing kubecost in openshift environment with network cost enabled without helm hook:

Installing with pre-install helm hook

Resulting kubecost diagnostics page

Have you made an update to documentation? If so, please provide the corresponding PR.

chipzoller

I'm wondering if this is the right approach (on by default) as it deviates from our standard architecture. How extensively has this been tested? Do you have any concerns here, @jessegoodier and @kwombach12 ?

jessegoodier · 2024-10-27T12:40:44Z

I'm wondering if this is the right approach (on by default) as it deviates from our standard architecture. How extensively has this been tested? Do you have any concerns here, @jessegoodier and @kwombach12 ?

This is what users are asking for, and I do think it is the best architecture.

A consistent configuration will reduce support concerns with openshift- in comparison to what we have seen with the various custom configs attempted.

Historically, issues with prometheus are due to misconfiguration, not version.

thomasvn · 2024-10-28T20:06:55Z

cost-analyzer/templates/cost-analyzer-cluster-role-binding-template.yaml

+  # Grant the kubecost service account the cluster-monitoring-view role to enable it to query OpenShift Prometheus.
+  # This is necessary for Kubecost to get access and query the in-cluster Prometheus instance using its service account token.
+  # https://docs.redhat.com/en/documentation/openshift_container_platform/4.2/html/monitoring/cluster-monitoring#monitoring-accessing-prometheus-alerting-ui-grafana-using-the-web-console_accessing-prometheus


Very helpful comment!

thomasvn · 2024-10-28T20:15:49Z

cost-analyzer/values-openshift.yaml

Kubecost is always recommended to be run with the helm chart's bundled Prometheus. This is usually the best user experience and results in the fewest issues.

Although we should certainly document how to run with Openshift's pre-installed Prometheus, I don't think that we should default to that experience.

Additionally, when users choose not to deploy Kubecost's bundled Prometheus they will not get the Kubecost-curated extraScrapeConfigs which, among them, newly include targets for NVIDIA's DCGM Exporter. I am willing to bet this discrepancy will be the root cause of many future support cases.

Can we add create a service monitor with the needed scrape config for DCGM?

Perhaps we should have two value files to avoid requests asking for how to correctly implement using openshift prometheus?

Do we need additional tests to validate the non-bundled prometheus? Are we missing anything today that diagnostics would miss?

thomasvn · 2024-10-28T20:18:06Z

Also any new configurations that you've added into values-openshift.yaml should also be in values.yaml. Helm will always use the values.yaml as the default configuration for the app (docs ref).

For example the following install command will use all default values from the values.yaml in the chart:

helm install kubecost cost-analyzer \
--repo https://kubecost.github.io/cost-analyzer/ \
--namespace kubecost --create-namespace

chipzoller · 2024-10-28T20:33:13Z

cost-analyzer/templates/cost-analyzer-networks-costs-ocp-scc.yaml

+  annotations:
+    helm.sh/hook: pre-install


Keep in mind that some vendor marketplaces and other delivery form factors do not support various Helm hooks and so this is more surface area that will have to be removed when these listings are updated.

ohh, okay understood. Will try to find a better alternative for this then

the hook here is simply fixing a warning during helm install. As long as the vendor marketplaces do not fail when they see the annotation, it should work as you have done it.

AWS Marketplace as one example will fail when it sees a chart with a Helm hook as they are not supported. See here for details. Just saying be mindful that when preparing the AWS Marketplace (possibly others) version, this is something that can't be present.

Helm hooks and the lookup function are not supported.

thank you. I'm good with removing the hood and having the warning. we can add a NOTES to ignore the warning.

removing the hook

chipzoller · 2024-10-29T11:55:13Z

cost-analyzer/values-openshift.yaml

+networkCosts:
+  enabled: true  # Enable network costs.


In my opinion, we shouldn't be enabling optional components by default in OpenShift that aren't enabled by default for other platforms. We should be consistent as far as which components are on by default and those which are not.

agree, my mistake here.

commenting out network costs configuration

chipzoller · 2024-10-29T11:56:03Z

cost-analyzer/values-openshift.yaml

-      # createMonitoringClusterRoleBinding: false  # Create a Cluster Role Binding to allow using in-cluster prometheus or thanos.
-      # createMonitoringResourceReaderRoleBinding: false  # Create a Role and Role Binding to allow in-cluster prometheus or thanos to list and watch resources. This will be necessary if you are not using bundled prometheus and need to add scrape config for resources.
-      # monitoringServiceAccountName: prometheus-k8s  # Name of the service account to bind to the Resource Reader Role Binding.
+      createMonitoringClusterRoleBinding: true  # Create a Cluster Role Binding to allow using in-cluster prometheus or thanos.


Suggested change

createMonitoringClusterRoleBinding: true # Create a Cluster Role Binding to allow using in-cluster prometheus or thanos.

createMonitoringClusterRoleBinding: true # Create a ClusterRoleBinding to allow using in-cluster Prometheus or Thanos.

Etc. elsewhere

chipzoller · 2024-10-29T12:17:42Z

cost-analyzer/values-openshift.yaml

-
+    kubeRBACProxy: true # If true, kubecost will use kube-rbac-proxy to authenticate with in cluster Prometheus for openshift
+  grafana:
+    enabled: false  # If false, Grafana will not be installed


Similarly here we're disabling Grafana by default just for OpenShift whereas it's enabled for all other platform types which creates an inconsistent deployment topology.

Will leave values-openshift.yaml with the existing defaults and create a new file with the values that Ishaan has here. Stay tuned.

Yeah, I think that's a better approach. Definitely glad to see this though! Looking forward to also seeing this in the docs referencing the new values file!

@jessegoodier I like that approach!

cost-analyzer/values-openshift-cluster-prometheus.yaml

thomasvn · 2024-10-30T21:44:43Z

cost-analyzer/values-openshift-cluster-prometheus.yaml

Please also remove any excess comments & unused configs which are present in this file.

mittal-ishaan added 4 commits October 26, 2024 00:42

fix openshift values to default with in-cluster prometheus

e474d46

enabling network cost as default

9b0a5a8

add pre-install helm hook annotation to SecurityContextConstraints

612513f

add pre-install helm hook small explanation comment

4091a6b

chipzoller reviewed Oct 25, 2024

View reviewed changes

jessegoodier added v2.5 enhancement New feature or request labels Oct 27, 2024

thomasvn reviewed Oct 28, 2024

View reviewed changes

chipzoller reviewed Oct 28, 2024

View reviewed changes

chipzoller reviewed Oct 29, 2024

View reviewed changes

mittal-ishaan added 3 commits October 30, 2024 01:28

create a different values file

bf2d526

nit: fix comments

d5e7154

removing pre-hook

e684ad7

thomasvn reviewed Oct 30, 2024

View reviewed changes

cost-analyzer/values-openshift-cluster-prometheus.yaml Outdated Show resolved Hide resolved

thomasvn reviewed Oct 30, 2024

View reviewed changes

remove unneccessary commeennts and configs

3a7e135

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix openshift values to default with in-cluster prometheus #3721

fix openshift values to default with in-cluster prometheus #3721

mittal-ishaan commented Oct 25, 2024 •

edited by jessegoodier

Loading

chipzoller left a comment

jessegoodier commented Oct 27, 2024

thomasvn Oct 28, 2024

thomasvn Oct 28, 2024

chipzoller Oct 28, 2024

jessegoodier Oct 29, 2024

thomasvn commented Oct 28, 2024

chipzoller Oct 28, 2024

mittal-ishaan Oct 29, 2024

jessegoodier Oct 29, 2024

chipzoller Oct 29, 2024

jessegoodier Oct 29, 2024 •

edited

Loading

mittal-ishaan Oct 29, 2024

chipzoller Oct 29, 2024

jessegoodier Oct 29, 2024

mittal-ishaan Oct 29, 2024

chipzoller Oct 29, 2024

chipzoller Oct 29, 2024

jessegoodier Oct 29, 2024 •

edited

Loading

chipzoller Oct 29, 2024

thomasvn Oct 29, 2024

thomasvn Oct 30, 2024

	createMonitoringClusterRoleBinding: true # Create a Cluster Role Binding to allow using in-cluster prometheus or thanos.
	createMonitoringClusterRoleBinding: true # Create a ClusterRoleBinding to allow using in-cluster Prometheus or Thanos.

fix openshift values to default with in-cluster prometheus #3721

Are you sure you want to change the base?

fix openshift values to default with in-cluster prometheus #3721

Conversation

mittal-ishaan commented Oct 25, 2024 • edited by jessegoodier Loading

What does this PR change?

Does this PR rely on any other PRs?

How does this PR impact users? (This is the kind of thing that goes in release notes!)

Links to Issues or tickets this PR addresses or fixes

What risks are associated with merging this PR? What is required to fully test this PR?

How was this PR tested?

Have you made an update to documentation? If so, please provide the corresponding PR.

chipzoller left a comment

Choose a reason for hiding this comment

jessegoodier commented Oct 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasvn commented Oct 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jessegoodier Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jessegoodier Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mittal-ishaan commented Oct 25, 2024 •

edited by jessegoodier

Loading

jessegoodier Oct 29, 2024 •

edited

Loading

jessegoodier Oct 29, 2024 •

edited

Loading