Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Cost analyzer pod in CrashLoopBackOff after enabling the readonly feature #3433

Open
2 tasks done
mariojuarezc opened this issue May 21, 2024 · 3 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@mariojuarezc
Copy link

Kubecost Helm Chart Version

2.2.5

Kubernetes Version

1.27

Kubernetes Platform

AKS

Description

I installed kubecost using the helm chart, enabling the readonly feature, and the aggregator container in the kubecost-cost-analyzer pod started failing leaving the pod in CrashLoopBackOff status making kubecost UI inaccessible.

Steps to reproduce

  1. Install kubecost enabling the readonly feature
helm install kubecost cost-analyzer \
--repo https://kubecost.github.io/cost-analyzer/ \
--namespace kubecost --create-namespace \
--set kubecostToken="bWFyaW9qdWFyZXpjQGdtYWlsLmNvbQ==xm343yadf98" \
--set readonly=true
  1. Watch the aggregator logs (for the below command, the pod name should be different)
    kubectl -n kubecost logs kubecost-cost-analyzer-5dcbc54c48-hd2s7 -c aggregator -f
  2. You will get the following error: Error: listen tcp :9004: bind: address already in use
  3. Watch the kubecost pods and you will see the kubecost-cost-analyzer pod in CrashLoopBackOff status with many restarts
kubectl -n kubecost get pods -w
NAME                                          READY   STATUS             RESTARTS       AGE
kubecost-cost-analyzer-5dcbc54c48-hd2s7       3/4     CrashLoopBackOff   7 (5m1s ago)   16m
kubecost-forecasting-86c455686d-bpj2s         1/1     Running            0              16m
kubecost-grafana-8d47b4c64-klzw8              2/2     Running            0              16m
kubecost-prometheus-server-7474d45899-ch7xq   1/1     Running            0              16m

Expected behavior

It is expected to kubecost run as usual but disabling updates to kubecost from the frontend UI and via POST request as described in the values.yaml file

Impact

No response

Screenshots

No response

Logs

aggregator container logs in the kubecost-cost-analyzer pod

kubectl -n kubecost logs kubecost-cost-analyzer-5dcbc54c48-hd2s7  -c aggregator -f

2024/05/21 22:24:18 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
2024-05-21T22:24:18.321191206Z ??? Log level set to info
2024-05-21T22:24:18.321223005Z INF tracing disabled
2024-05-21T22:24:18.34644253Z INF Starting Kubecost Aggregator version kcm-0f623d1ed0_core-c3cb2218df_oc-67e81e89ca (0f623d1e)
WARN: bun: 2024/05/21 22:24:18 query "TRUE" has [] args, but no placeholders
2024-05-21T22:24:18.347090025Z INF NAMESPACE: kubecost-readonly
WARN: bun: 2024/05/21 22:24:18 query "TRUE" has [] args, but no placeholders
2024-05-21T22:24:18.490151229Z INF Using default file store as data source
2024-05-21T22:24:18.490302428Z ERR entering state: run_ingestor, err: error creating static tables: %!s(<nil>)
2024-05-21T22:24:18.490329227Z ERR after event, current state: run_ingestor, err: error creating static tables: %!s(<nil>)
2024-05-21T22:24:18.490345727Z ERR error submitting event: error creating static tables: %!s(<nil>)
2024-05-21T22:24:18.490382627Z INF Thanos Pipeline: Stopped
2024-05-21T22:24:18.490418827Z INF NetworkInsight: Ingestor: Stopped
2024-05-21T22:24:18.490402527Z INF CloudCost: Ingestor: Stopped
2024-05-21T22:24:18.490455027Z INF Asset: Ingestor: Stopped
2024-05-21T22:24:18.490448227Z INF CustomCost: Ingestor: Stopped
2024-05-21T22:24:18.490447627Z INF AllocationIngestor: Stopped
WARN: bun: 2024/05/21 22:24:18 query "TRUE" has [] args, but no placeholders
2024-05-21T22:24:18.548086725Z INF Done waiting
2024-05-21T22:24:18.548641121Z INF Starting *v1.Namespace controller
2024-05-21T22:24:18.548904019Z INF Starting *v1.Node controller
2024-05-21T22:24:18.549084318Z INF Starting *v1.Pod controller
2024-05-21T22:24:18.549198017Z INF Starting *v1.Deployment controller
2024-05-21T22:24:18.549222117Z INF Starting *v1.DaemonSet controller
2024-05-21T22:24:18.549286117Z INF Starting *v1.StatefulSet controller
2024-05-21T22:24:18.549333116Z INF Starting *v1.Job controller
2024-05-21T22:24:18.549333116Z INF Starting *v1.Service controller
2024-05-21T22:24:18.549378216Z INF Starting *v1.PersistentVolume controller
2024-05-21T22:24:18.549415316Z INF Starting *v1.ConfigMap controller
2024-05-21T22:24:18.549424216Z INF Starting *v1.PersistentVolumeClaim controller
2024-05-21T22:24:18.549462616Z INF Starting *v1.StorageClass controller
2024-05-21T22:24:18.549470415Z INF Starting *v1.ReplicationController controller
2024-05-21T22:24:18.549159018Z INF Starting *v1.ReplicaSet controller
2024-05-21T22:24:18.549503315Z INF Starting *v1.PodDisruptionBudget controller
2024-05-21T22:24:18.553382488Z INF No product-configs configmap found at install time, using existing configs: configmaps "product-configs" not found
2024-05-21T22:24:18.558352854Z INF No saved-report-configs configmap found at install time, using existing configs: configmaps "saved-report-configs" not found
2024-05-21T22:24:18.56322332Z INF No asset-report-configs configmap found at install time, using existing configs: configmaps "asset-report-configs" not found
2024-05-21T22:24:18.606345719Z INF Skipping derivation because there is no new data to derive
2024-05-21T22:24:18.623271401Z ERR savings: cluster sizing: failed to get cluster properties: could not get properties for any cluster: 
2024-05-21T22:24:18.623320701Z WRN got error failed to get cluster properties: could not get properties for any cluster:  for metric clusterSizing%%Development, not adding to cache
WARN: bun: 2024/05/21 22:24:18 query "TRUE" has [] args, but no placeholders
WARN: bun: 2024/05/21 22:24:18 query "TRUE" has [] args, but no placeholders
WARN: bun: 2024/05/21 22:24:18 query "TRUE" has [] args, but no placeholders
2024-05-21T22:24:18.713741071Z ERR savings: cluster sizing: failed to get cluster properties: could not get properties for any cluster: 
2024-05-21T22:24:18.713785871Z WRN got error failed to get cluster properties: could not get properties for any cluster:  for metric clusterSizing%%Production, not adding to cache
WARN: bun: 2024/05/21 22:24:18 query "TRUE" has [] args, but no placeholders
WARN: bun: 2024/05/21 22:24:18 query "TRUE" has [] args, but no placeholders
2024-05-21T22:24:18.75399779Z INF No cloud-cost-report-configs configmap found at install time, using existing configs: configmaps "cloud-cost-report-configs" not found
WARN: bun: 2024/05/21 22:24:18 query "TRUE" has [] args, but no placeholders
2024-05-21T22:24:18.802644649Z ERR savings: cluster sizing: failed to get cluster properties: could not get properties for any cluster: 
2024-05-21T22:24:18.802693149Z WRN got error failed to get cluster properties: could not get properties for any cluster:  for metric clusterSizing%%High-Availability, not adding to cache
2024-05-21T22:24:18.953107695Z INF No recurring-budget-rule-configs configmap found at install time, using existing configs: configmaps "recurring-budget-rule-configs" not found
2024-05-21T22:24:19.153687489Z INF No budget-configs configmap found at install time, using existing configs: configmaps "budget-configs" not found
2024-05-21T22:24:19.353348391Z INF No account-mapping configmap found at install time, using existing configs: configmaps "account-mapping" not found
2024-05-21T22:24:19.554261083Z INF No group-filters configmap found at install time, using existing configs: configmaps "group-filters" not found
Error: listen tcp :9004: bind: address already in use

Slack discussion

No response

Troubleshooting

  • I have read and followed the issue guidelines and this is a bug impacting only the Helm chart.
  • I have searched other issues in this repository and mine is not recorded.
@mariojuarezc mariojuarezc added bug Something isn't working needs-triage labels May 21, 2024
@chipzoller
Copy link
Collaborator

Confirmed on 2.2.5.

@chipzoller
Copy link
Collaborator

cc @jessegoodier

@jessegoodier
Copy link
Collaborator

jessegoodier commented May 22, 2024

Thanks-
triage:
readOnly does work on statefulset deployMethod, not singlePod. Will log this. I can't commit to timing.

kubecostAggregator:
  deployMethod: statefulset

internal issue: https://kubecost.atlassian.net/browse/BURNDOWN-434

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants