Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix openshift values to default with in-cluster prometheus #3721

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

mittal-ishaan
Copy link
Contributor

@mittal-ishaan mittal-ishaan commented Oct 25, 2024

What does this PR change?

  • This PR updates the openshift values to have default properties set to deploy kubecost with in-cluster prometheus. (As this is what most users want)
  • Adds clarifying comment in the template while binding the cluster role to the kubecost service account
  • Moves in-cluster prometheus service account namespace name to values to make it more generalized and to cover the case where it can be of any other name.
  • Add a pre-install hook to the Security Context Constraints (SCC) which gets install when network costs is enabled in openshift environment. This is done to ensure that SSC gets installed before the daemonSet during helm install to prevent potential install failures as network costs daemonSet depends on it.

Does this PR rely on any other PRs?

No

How does this PR impact users? (This is the kind of thing that goes in release notes!)

Added standard configuration template for openshift to use the openshift-monitoring prometheus

Links to Issues or tickets this PR addresses or fixes

What risks are associated with merging this PR? What is required to fully test this PR?

No risk, Testing that it gets deployed on openshift cluster easily and successfully need to be done.

How was this PR tested?

Installing kubecost in openshift environment with network cost enabled without helm hook:
Screenshot from 2024-10-26 02-54-50
Installing with pre-install helm hook
Screenshot from 2024-10-26 02-59-27
Resulting kubecost diagnostics page
image

Have you made an update to documentation? If so, please provide the corresponding PR.

Copy link
Collaborator

@chipzoller chipzoller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is the right approach (on by default) as it deviates from our standard architecture. How extensively has this been tested? Do you have any concerns here, @jessegoodier and @kwombach12 ?

@jessegoodier
Copy link
Collaborator

I'm wondering if this is the right approach (on by default) as it deviates from our standard architecture. How extensively has this been tested? Do you have any concerns here, @jessegoodier and @kwombach12 ?

This is what users are asking for, and I do think it is the best architecture.

A consistent configuration will reduce support concerns with openshift- in comparison to what we have seen with the various custom configs attempted.

Historically, issues with prometheus are due to misconfiguration, not version.

@jessegoodier jessegoodier added v2.5 enhancement New feature or request labels Oct 27, 2024
Comment on lines +46 to +48
# Grant the kubecost service account the cluster-monitoring-view role to enable it to query OpenShift Prometheus.
# This is necessary for Kubecost to get access and query the in-cluster Prometheus instance using its service account token.
# https://docs.redhat.com/en/documentation/openshift_container_platform/4.2/html/monitoring/cluster-monitoring#monitoring-accessing-prometheus-alerting-ui-grafana-using-the-web-console_accessing-prometheus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very helpful comment!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubecost is always recommended to be run with the helm chart's bundled Prometheus. This is usually the best user experience and results in the fewest issues.

Although we should certainly document how to run with Openshift's pre-installed Prometheus, I don't think that we should default to that experience.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, when users choose not to deploy Kubecost's bundled Prometheus they will not get the Kubecost-curated extraScrapeConfigs which, among them, newly include targets for NVIDIA's DCGM Exporter. I am willing to bet this discrepancy will be the root cause of many future support cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add create a service monitor with the needed scrape config for DCGM?

Perhaps we should have two value files to avoid requests asking for how to correctly implement using openshift prometheus?

Do we need additional tests to validate the non-bundled prometheus? Are we missing anything today that diagnostics would miss?

@thomasvn
Copy link
Member

Also any new configurations that you've added into values-openshift.yaml should also be in values.yaml. Helm will always use the values.yaml as the default configuration for the app (docs ref).

For example the following install command will use all default values from the values.yaml in the chart:

helm install kubecost cost-analyzer \
--repo https://kubecost.github.io/cost-analyzer/ \
--namespace kubecost --create-namespace

Comment on lines 10 to 11
annotations:
helm.sh/hook: pre-install
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that some vendor marketplaces and other delivery form factors do not support various Helm hooks and so this is more surface area that will have to be removed when these listings are updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, okay understood. Will try to find a better alternative for this then

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hook here is simply fixing a warning during helm install. As long as the vendor marketplaces do not fail when they see the annotation, it should work as you have done it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS Marketplace as one example will fail when it sees a chart with a Helm hook as they are not supported. See here for details. Just saying be mindful that when preparing the AWS Marketplace (possibly others) version, this is something that can't be present.

Helm hooks and the lookup function are not supported.

Copy link
Collaborator

@jessegoodier jessegoodier Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you. I'm good with removing the hood and having the warning. we can add a NOTES to ignore the warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing the hook

Comment on lines 35 to 36
networkCosts:
enabled: true # Enable network costs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, we shouldn't be enabling optional components by default in OpenShift that aren't enabled by default for other platforms. We should be consistent as far as which components are on by default and those which are not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, my mistake here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commenting out network costs configuration

# createMonitoringClusterRoleBinding: false # Create a Cluster Role Binding to allow using in-cluster prometheus or thanos.
# createMonitoringResourceReaderRoleBinding: false # Create a Role and Role Binding to allow in-cluster prometheus or thanos to list and watch resources. This will be necessary if you are not using bundled prometheus and need to add scrape config for resources.
# monitoringServiceAccountName: prometheus-k8s # Name of the service account to bind to the Resource Reader Role Binding.
createMonitoringClusterRoleBinding: true # Create a Cluster Role Binding to allow using in-cluster prometheus or thanos.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
createMonitoringClusterRoleBinding: true # Create a Cluster Role Binding to allow using in-cluster prometheus or thanos.
createMonitoringClusterRoleBinding: true # Create a ClusterRoleBinding to allow using in-cluster Prometheus or Thanos.

Etc. elsewhere


kubeRBACProxy: true # If true, kubecost will use kube-rbac-proxy to authenticate with in cluster Prometheus for openshift
grafana:
enabled: false # If false, Grafana will not be installed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here we're disabling Grafana by default just for OpenShift whereas it's enabled for all other platform types which creates an inconsistent deployment topology.

Copy link
Collaborator

@jessegoodier jessegoodier Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will leave values-openshift.yaml with the existing defaults and create a new file with the values that Ishaan has here. Stay tuned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's a better approach. Definitely glad to see this though! Looking forward to also seeing this in the docs referencing the new values file!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessegoodier I like that approach!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remove any excess comments & unused configs which are present in this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v2.5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants