Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializing Kalm - 3/4 modules fails on Prometheus #140

Open
donovanmuller opened this issue Sep 9, 2020 · 7 comments
Open

Initializing Kalm - 3/4 modules fails on Prometheus #140

donovanmuller opened this issue Sep 9, 2020 · 7 comments

Comments

@donovanmuller
Copy link

donovanmuller commented Sep 9, 2020

Similar to #138 but prometheus is failing with the following error:

$ kubectl get pods -A -w
NAMESPACE            NAME                                         READY   STATUS              RESTARTS   AGE
cert-manager         cert-manager-7cb75cf6b4-gbmfz                1/1     Running             0          2m11s
cert-manager         cert-manager-cainjector-759496659c-76tm4     1/1     Running             0          2m11s
cert-manager         cert-manager-webhook-7c75b89bf6-hkvzb        1/1     Running             0          2m11s
istio-operator       istio-operator-7c96dd898b-9t9dz              1/1     Running             0          2m10s
istio-system         istio-ingressgateway-7bf98d4db8-54sbf        1/1     Running             0          56s
istio-system         istiod-d474486d7-7mvdg                       1/1     Running             0          76s
istio-system         prometheus-5767f54db5-hl57v                  0/2     ContainerCreating   0          55s
istio-system         prometheus-7dcd44bbcf-wr88t                  0/2     ContainerCreating   0          54s
kalm-operator        kalm-operator-559c67b785-87cnj               2/2     Running             0          2m39s
kube-system          coredns-66bff467f8-wsmdr                     1/1     Running             0          3m33s
kube-system          coredns-66bff467f8-xv4b6                     1/1     Running             0          3m33s
kube-system          etcd-kalm-control-plane                      1/1     Running             0          3m48s
kube-system          kindnet-82fn9                                1/1     Running             0          3m17s
kube-system          kindnet-ckbhx                                1/1     Running             0          3m33s
kube-system          kindnet-j5xfx                                1/1     Running             2          3m16s
kube-system          kindnet-srtzq                                1/1     Running             0          3m17s
kube-system          kube-apiserver-kalm-control-plane            1/1     Running             0          3m48s
kube-system          kube-controller-manager-kalm-control-plane   1/1     Running             0          3m48s
kube-system          kube-proxy-5k7lp                             1/1     Running             0          3m17s
kube-system          kube-proxy-fbhcb                             1/1     Running             0          3m33s
kube-system          kube-proxy-jtdmx                             1/1     Running             0          3m17s
kube-system          kube-proxy-jzkfb                             1/1     Running             0          3m16s
kube-system          kube-scheduler-kalm-control-plane            1/1     Running             0          3m48s
local-path-storage   local-path-provisioner-bd4bb6b75-znm7d       1/1     Running             0          3m33s

$ kubectl logs -f prometheus-5767f54db5-hl57v -n istio-system -c prometheus
level=warn ts=2020-09-09T15:09:52.183Z caller=main.go:283 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2020-09-09T15:09:52.183Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.1, branch=HEAD, revision=8744510c6391d3ef46d8294a7e1f46e57407ab13)"
level=info ts=2020-09-09T15:09:52.183Z caller=main.go:331 build_context="(go=go1.13.5, user=root@4b1e33c71b9d, date=20191225-01:04:15)"
level=info ts=2020-09-09T15:09:52.183Z caller=main.go:332 host_details="(Linux 4.19.76-linuxkit #1 SMP Tue May 26 11:42:35 UTC 2020 x86_64 prometheus-5767f54db5-hl57v (none))"
level=info ts=2020-09-09T15:09:52.183Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-09-09T15:09:52.183Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2020-09-09T15:09:52.183Z caller=query_logger.go:107 component=activeQueryTracker msg="Failed to create directory for logging active queries"
level=error ts=2020-09-09T15:09:52.184Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=data/queries.active err="open data/queries.active: no such file or directory"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x24dda5b, 0x5, 0x14, 0x2c62100, 0xc0006bf890, 0x2c62100)
	/app/promql/query_logger.go:115 +0x48c
main.main()
	/app/cmd/prometheus/main.go:362 +0x5229

I'm using a Kind cluster to install Kalm with:

$ cat kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "ingress-ready=true"
            authorization-mode: "AlwaysAllow"
    extraPortMappings:
      - containerPort: 80
        hostPort: 80
        protocol: TCP
      - containerPort: 443
        hostPort: 443
        protocol: TCP
  - role: worker
  - role: worker
  - role: worker

$ kind create cluster --name kalm --config kind.yaml
...

$ curl -sL https://get.kalm.dev | bash
Initializing Kalm - 3/4 modules ready:

✔ kalm-operator
✔ cert-manager
✔ istio-system
@donovanmuller
Copy link
Author

donovanmuller commented Sep 9, 2020

$ kubectl describe po/prometheus-5767f54db5-hl57v -n istio-system
Name:         prometheus-5767f54db5-hl57v
Namespace:    istio-system
Priority:     0
Node:         kalm-worker/172.18.0.2
Start Time:   Wed, 09 Sep 2020 17:08:00 +0200
Labels:       app=prometheus
              pod-template-hash=5767f54db5
              release=istio
Annotations:  sidecar.istio.io/inject: false
Status:       Running
IP:           10.244.3.3
IPs:
  IP:           10.244.3.3
Controlled By:  ReplicaSet/prometheus-5767f54db5
Containers:
  prometheus:
    Container ID:  containerd://bfab377de3c826edcee3bc0836f62b4a16495311bf63349763f253e9e36262c7
    Image:         docker.io/prom/prometheus:v2.15.1
    Image ID:      docker.io/prom/prometheus@sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6
    Port:          9090/TCP
    Host Port:     0/TCP
    Args:
      --storage.tsdb.retention=6h
      --config.file=/etc/prometheus/prometheus.yml
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Wed, 09 Sep 2020 17:12:40 +0200
      Finished:     Wed, 09 Sep 2020 17:12:41 +0200
    Ready:          False
    Restart Count:  5
    Requests:
      cpu:        10m
    Liveness:     http-get http://:9090/-/healthy delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get http://:9090/-/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/istio-certs from istio-certs (rw)
      /etc/prometheus from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-zjqjg (ro)
  istio-proxy:
    Container ID:  containerd://9bf16037197564172d2c6eeda7b52aae72a1b0404bb3afe090f87520ec2b2bfa
    Image:         docker.io/istio/proxyv2:1.6.1
    Image ID:      docker.io/istio/proxyv2@sha256:84e3afe9b4404ca94fd2e6e0277c642eb29b8b37ca46deff49dbe1f5e1b7fdc3
    Port:          15090/TCP
    Host Port:     0/TCP
    Args:
      proxy
      sidecar
      --domain
      $(POD_NAMESPACE).svc.cluster.local
      istio-proxy-prometheus
      --proxyLogLevel=warning
      --proxyComponentLogLevel=misc:error
      --controlPlaneAuthPolicy
      NONE
      --trust-domain=cluster.local
    State:          Running
      Started:      Wed, 09 Sep 2020 17:09:35 +0200
    Ready:          True
    Restart Count:  0
    Readiness:      http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
    Environment:
      OUTPUT_CERTS:           /etc/istio-certs
      JWT_POLICY:             first-party-jwt
      PILOT_CERT_PROVIDER:    istiod
      CA_ADDR:                istiod.istio-system.svc:15012
      POD_NAME:               prometheus-5767f54db5-hl57v (v1:metadata.name)
      POD_NAMESPACE:          istio-system (v1:metadata.namespace)
      INSTANCE_IP:             (v1:status.podIP)
      SERVICE_ACCOUNT:         (v1:spec.serviceAccountName)
      HOST_IP:                 (v1:status.hostIP)
      ISTIO_META_MESH_ID:     cluster.local
      ISTIO_META_CLUSTER_ID:  Kubernetes
    Mounts:
      /etc/istio-certs/ from istio-certs (rw)
      /etc/istio/config from istio-config-volume (rw)
      /etc/istio/proxy from istio-envoy (rw)
      /var/run/secrets/istio from istiod-ca-cert (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-zjqjg (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  istio-config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      istio
    Optional:  true
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus
    Optional:  false
  istio-certs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istio-envoy:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istiod-ca-cert:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      istio-ca-root-cert
    Optional:  false
  prometheus-token-zjqjg:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-token-zjqjg
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                  Message
  ----     ------     ----                   ----                  -------
  Normal   Scheduled  5m11s                  default-scheduler     Successfully assigned istio-system/prometheus-5767f54db5-hl57v to kalm-worker
  Normal   Pulling    5m11s                  kubelet, kalm-worker  Pulling image "docker.io/prom/prometheus:v2.15.1"
  Normal   Pulled     4m33s                  kubelet, kalm-worker  Successfully pulled image "docker.io/prom/prometheus:v2.15.1"
  Normal   Pulling    4m32s                  kubelet, kalm-worker  Pulling image "docker.io/istio/proxyv2:1.6.1"
  Normal   Pulled     3m36s                  kubelet, kalm-worker  Successfully pulled image "docker.io/istio/proxyv2:1.6.1"
  Normal   Created    3m36s                  kubelet, kalm-worker  Created container istio-proxy
  Normal   Started    3m36s                  kubelet, kalm-worker  Started container istio-proxy
  Normal   Created    2m48s (x4 over 4m33s)  kubelet, kalm-worker  Created container prometheus
  Normal   Pulled     2m48s (x3 over 3m35s)  kubelet, kalm-worker  Container image "docker.io/prom/prometheus:v2.15.1" already present on machine
  Normal   Started    2m47s (x4 over 4m32s)  kubelet, kalm-worker  Started container prometheus
  Warning  BackOff    8s (x21 over 3m34s)    kubelet, kalm-worker  Back-off restarting failed container

@davidqhr
Copy link
Member

davidqhr commented Sep 9, 2020

Seems promethus process can't create file in your fs. Can you try if this solution work in your situation? aws/eks-charts#21 (comment)

@donovanmuller
Copy link
Author

The data/ directory does not seem to be a mounted volume though (which is the case for the comment above) but I will try experiment a bit.

How would I go about adding these additional configurations to the Prometheus Deployment in the context of following the install guide (curl -sL https://get.kalm.dev | bash)?

@davidqhr
Copy link
Member

The installation script actually helps you quickly apply some yamls. But there is no way to customize during this process yet. You can try to modify promethus after the prometheus is installed but running abnormally, via this command.

kubectl edit deploy -n istio-system prometheus

@donovanmuller
Copy link
Author

I added the following securityContext to the prometheus Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
...
  name: prometheus
  namespace: istio-system
spec:
  template:
...
    spec:
...
      securityContext:
        runAsGroup: 0
        runAsNonRoot: false
        runAsUser: 0
...

and the installation completed successfully.

@davidqhr
Copy link
Member

Nice, seems you are using Kind to bootstrap your k8s cluster. Can you provide the cloud platform (or your PC system info if you are running it locally) you are using, so I can test against it and fix the missing config of prometheus.

@donovanmuller
Copy link
Author

donovanmuller commented Sep 11, 2020

Mac: Catalina 10.15.6
Docker for Mac: 2.3.0.4
Kind: kind v0.8.1 go1.14.2 darwin/amd64

$ kind create cluster --name kalm --config kind.yaml
Creating cluster "kalm" ...
 ✓ Ensuring node image (kindest/node:v1.18.2)
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants