-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix openshift values to default with in-cluster prometheus #3721
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this is the right approach (on by default) as it deviates from our standard architecture. How extensively has this been tested? Do you have any concerns here, @jessegoodier and @kwombach12 ?
This is what users are asking for, and I do think it is the best architecture. A consistent configuration will reduce support concerns with openshift- in comparison to what we have seen with the various custom configs attempted. Historically, issues with prometheus are due to misconfiguration, not version. |
# Grant the kubecost service account the cluster-monitoring-view role to enable it to query OpenShift Prometheus. | ||
# This is necessary for Kubecost to get access and query the in-cluster Prometheus instance using its service account token. | ||
# https://docs.redhat.com/en/documentation/openshift_container_platform/4.2/html/monitoring/cluster-monitoring#monitoring-accessing-prometheus-alerting-ui-grafana-using-the-web-console_accessing-prometheus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very helpful comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubecost is always recommended to be run with the helm chart's bundled Prometheus. This is usually the best user experience and results in the fewest issues.
Although we should certainly document how to run with Openshift's pre-installed Prometheus, I don't think that we should default to that experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, when users choose not to deploy Kubecost's bundled Prometheus they will not get the Kubecost-curated extraScrapeConfigs which, among them, newly include targets for NVIDIA's DCGM Exporter. I am willing to bet this discrepancy will be the root cause of many future support cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add create a service monitor with the needed scrape config for DCGM?
Perhaps we should have two value files to avoid requests asking for how to correctly implement using openshift prometheus?
Do we need additional tests to validate the non-bundled prometheus? Are we missing anything today that diagnostics would miss?
Also any new configurations that you've added into For example the following install command will use all default values from the
|
annotations: | ||
helm.sh/hook: pre-install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that some vendor marketplaces and other delivery form factors do not support various Helm hooks and so this is more surface area that will have to be removed when these listings are updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh, okay understood. Will try to find a better alternative for this then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the hook here is simply fixing a warning during helm install. As long as the vendor marketplaces do not fail when they see the annotation, it should work as you have done it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS Marketplace as one example will fail when it sees a chart with a Helm hook as they are not supported. See here for details. Just saying be mindful that when preparing the AWS Marketplace (possibly others) version, this is something that can't be present.
Helm hooks and the
lookup
function are not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you. I'm good with removing the hood and having the warning. we can add a NOTES to ignore the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing the hook
cost-analyzer/values-openshift.yaml
Outdated
networkCosts: | ||
enabled: true # Enable network costs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, we shouldn't be enabling optional components by default in OpenShift that aren't enabled by default for other platforms. We should be consistent as far as which components are on by default and those which are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, my mistake here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commenting out network costs configuration
cost-analyzer/values-openshift.yaml
Outdated
# createMonitoringClusterRoleBinding: false # Create a Cluster Role Binding to allow using in-cluster prometheus or thanos. | ||
# createMonitoringResourceReaderRoleBinding: false # Create a Role and Role Binding to allow in-cluster prometheus or thanos to list and watch resources. This will be necessary if you are not using bundled prometheus and need to add scrape config for resources. | ||
# monitoringServiceAccountName: prometheus-k8s # Name of the service account to bind to the Resource Reader Role Binding. | ||
createMonitoringClusterRoleBinding: true # Create a Cluster Role Binding to allow using in-cluster prometheus or thanos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createMonitoringClusterRoleBinding: true # Create a Cluster Role Binding to allow using in-cluster prometheus or thanos. | |
createMonitoringClusterRoleBinding: true # Create a ClusterRoleBinding to allow using in-cluster Prometheus or Thanos. |
Etc. elsewhere
cost-analyzer/values-openshift.yaml
Outdated
|
||
kubeRBACProxy: true # If true, kubecost will use kube-rbac-proxy to authenticate with in cluster Prometheus for openshift | ||
grafana: | ||
enabled: false # If false, Grafana will not be installed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here we're disabling Grafana by default just for OpenShift whereas it's enabled for all other platform types which creates an inconsistent deployment topology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will leave values-openshift.yaml with the existing defaults and create a new file with the values that Ishaan has here. Stay tuned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's a better approach. Definitely glad to see this though! Looking forward to also seeing this in the docs referencing the new values file!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jessegoodier I like that approach!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also remove any excess comments & unused configs which are present in this file.
What does this PR change?
Does this PR rely on any other PRs?
No
How does this PR impact users? (This is the kind of thing that goes in release notes!)
Added standard configuration template for openshift to use the openshift-monitoring prometheus
Links to Issues or tickets this PR addresses or fixes
What risks are associated with merging this PR? What is required to fully test this PR?
No risk, Testing that it gets deployed on openshift cluster easily and successfully need to be done.
How was this PR tested?
Installing kubecost in openshift environment with network cost enabled without helm hook:
Installing with pre-install helm hook
Resulting kubecost diagnostics page
Have you made an update to documentation? If so, please provide the corresponding PR.