[loki] add log dashboard #60

phac008 · 2024-04-11T17:32:20Z

automatic deployed dashboards are for loki health only...

examples:
https://grafana.com/grafana/dashboards/?search=loki

phac008 · 2024-04-12T11:10:33Z

7b03ff5

-> adding and automatic reload (within grafana) works

but...
there is still an issue with "default" datasource on default loki dashboards - default datasource does not contain information - whereas "-- Grafana --" contains information ... investigating

phac008 · 2024-04-18T15:09:29Z

additional information:
grafana/loki#9273

This is what may should be tested

jkleinlercher · 2024-04-18T22:21:30Z

And especially this doc https://grafana.com/docs/loki/latest/setup/install/helm/monitor-and-alert/with-local-monitoring/

jkleinlercher · 2024-04-19T07:57:27Z

erster schritt: wie lautet unser CLUSTER_NAME?

The dashboards require certain metric labels to display Kubernetes metrics. The best way to accomplish this is to install the kube-prometheus-stack Helm chart with the following values file, replacing CLUSTER_NAME with the name of your cluster. The cluster name is what you specify during the helm installation, so a cluster installed with the command helm install loki-cluster grafana/loki would be called loki-cluster.

===> schaut so aus als wär das eigentlich der Release-Name vom installierten Loki, und das ist bei uns "sx-loki", weil der Helm-Release-Name dem ArgoCD appname entspricht.
https://github.com/suxess-it/sx-cnp-oss/blob/063e90175ff126419010ed9f8301e91b38900e44/platform-apps/target-type/k3d/loki-app.yaml#L4C9-L4C16

Signed-off-by: Johannes Kleinlercher <johannes.kleinlercher@suxess-it.com>

jkleinlercher · 2024-04-19T10:24:47Z

cluster="sx-loki" label is jetzt bei jede menge metriken dabei, d.h. die Konfiguration von ea185bb scheint gezogen zu haben. Das Dashboard https://grafana-127-0-0-1.nip.io/d/logs/loki-logs?orgId=1&refresh=10s zeigt trtozdem noch keine daten an.

jkleinlercher · 2024-04-19T10:34:45Z

it seems I face the same issue right now as in grafana/loki#9273 (comment)

"the loki_build_info metric seems empty on the Prometheus data source."

investigating ...

jkleinlercher · 2024-04-19T11:08:09Z

meine Vermutung ist aktuell dass die loki_build_info Metrik nur existiert, wenn selfMonitoring aktiviert ist.
Ist derzeit false: https://github.com/suxess-it/sx-cnp-oss/blob/c41e934bb1b368fa9479fc9e03d8c6cbf27505ad/platform-apps/charts/loki/values-k3d.yaml#L99

Grund: dazu müssten zusätzliche CRDs installiert werden. Lt. Kommentar beim Attribut:

    # Self monitoring determines whether Loki should scrape its own logs.
    # This feature currently relies on the Grafana Agent Operator being installed,
    # which is installed by default using the grafana-agent-operator sub-chart.
    # It will create custom resources for GrafanaAgent, LogsInstance, and PodLogs to configure
    # scrape configs to scrape its own logs with the labels expected by the included dashboards.

Um den Grafana Agent Operator zu installieren muss man das Helm-Value monitoring.selfMonitoring.grafanaAgent.installOperator auf 'true' setzen, siehe
https://github.com/grafana/loki/blob/e39677f97b4ba27c90d9f8d2991441095e55b06e/production/helm/loki/Chart.yaml#L23

Das habe ich jetzt lokal im k3d-Cluster versucht, allerdings taucht da das nächste Problem auf beim syncen:

CustomResourceDefinition.apiextensions.k8s.io "podlogs.monitoring.grafana.com" is invalid: status.storedVersions[0]: Invalid value: "v1alpha2": must appear in spec.versions

Meine Vermutung:
es ist schon eine Podlogs CRD über alloy installiert worden mit der Version v1alpha2: https://github.com/grafana/alloy/blob/6585e5a0c2f658d7c1d5b4361313dcd30cb3309b/operations/helm/charts/alloy/charts/crds/crds/monitoring.grafana.com_podlogs.yaml#L21

und über loki will eine installiert werden mit der Version v1alpha1 und ich vermute deshalb passiert jetzt auch beim syncen von loki der Fehler.

k get crd podlogs.monitoring.grafana.com -n monitoring -o yaml

  names:
    categories:
    - grafana-alloy
    - alloy
    kind: PodLogs
    listKind: PodLogsList
    plural: podlogs
    singular: podlogs
  scope: Namespaced
  versions:
  - name: v1alpha2

jkleinlercher · 2024-04-19T12:55:54Z

According to https://grafana.com/docs/agent/latest/flow/reference/components/loki.source.podlogs/ the grafana agent is deprecated so we should use podlogs CRD from alloy which is also newer. Investigating how podlogs instance in Loki could work together with podlogs CRD from alloy (v1alpha1 VS v1alpha2)

jkleinlercher · 2024-04-19T14:44:19Z

According to https://grafana.com/docs/agent/latest/flow/reference/components/loki.source.podlogs/ the grafana agent is deprecated so we should use podlogs CRD from alloy which is also newer. Investigating how podlogs instance in Loki could work together with podlogs CRD from alloy (v1alpha1 VS v1alpha2)

You can specify the podlogs api version in loki values file https://github.com/grafana/loki/blob/4c563f7823f9f9b7e65026b52e9c16580d4d87f4/production/helm/loki/values.yaml#L3331

jkleinlercher · 2024-04-19T15:29:50Z

and here again why selfMonitoring needs to be enabled to get the included dashboards to work:

Self monitoring is enabled by default. This will deploy a GrafanaAgent, LogsInstance, and PodLogs resource which will instruct the Grafana Agent Operator (installed separately) on how to scrape this Loki cluster’s logs and send them back to itself. Scraping this Loki cluster using the scrape config defined in the PodLogs resource is required for the included dashboards to work.

https://grafana.com/docs/loki/latest/setup/install/helm/monitor-and-alert/with-local-monitoring/

jkleinlercher · 2024-04-19T15:34:22Z

Conclusion:

included dashboards like „Loki logs“ need selfMonitoring enabled. SelfMonitoring needs GrafanaAgent, and GrafanaAgent Podlogs CRD is incompatible with Alloys Podlogs CRD. Still investigating a scenario alloy and Loki selfMonitoring can work together. —> Update loki helm chart to depend on alloy not grafana-agent-operator grafana/loki#12627
Why „Kubernetes Logs“ dashboard didn‘t work without modifying the datasource variable is still under investigation

jkleinlercher · 2024-04-20T11:23:43Z

Conclusion:

included dashboards like „Loki logs“ need selfMonitoring enabled. SelfMonitoring needs GrafanaAgent, and GrafanaAgent Podlogs CRD is incompatible with Alloys Podlogs CRD. Still investigating a scenario alloy and Loki selfMonitoring can work together. —> Update loki helm chart to depend on alloy not grafana-agent-operator grafana/loki#12627

Let’s try if Loki dashboards and selfmonitoring works when alloy is not installed. If it works, we should think about skipping alloy until it is integrated in Loki instead of grafana agent operator

Some info about internal —Grafana— Datasource: this is just a datasource with test data! ( https://grafana.com/docs/grafana/latest/datasources/#grafana)

jkleinlercher · 2024-04-20T11:27:52Z

Conclusion:

Why „Kubernetes Logs“ dashboard didn‘t work without modifying the datasource variable is still under investigation

for this let’s take a look again at the off-the-shelf dashboards https://grafana.com/grafana/dashboards/15141-kubernetes-service-logs/
and check how the dazasource settings are there

jkleinlercher · 2024-04-22T08:30:31Z

after uninstalling alloy the loki app with selfMonitoring enabled installs sucessfully (grafanaagent included), however no loki metric is in prometheus datasource.
Also the servicemonitor sx-loki is installed which should scrape the loki metrics (definition in the servicemonitor looks correct and I can scrape prometheus metrics from sx-loki-0 pod sucessfully manually.

kubectl get servicemonitor -n monitoring sx-loki
NAME      AGE
sx-loki   2d22h

However, the sx-loki servicemonitor doesn't show up in servicediscovery page on prometheus itself:

I think the reason is that the prometheus selector labels are set like this

kubectl get prometheus -n monitoring -o yaml | grep selector
    selector: app.kubernetes.io/instance=sx-kube-prometheus-stack-prometheus,app.kubernetes.io/managed-by=prometheus-operator,app.kubernetes.io/name=prometheus,operator.prometheus.io/name=sx-kube-prometheus-stack-prometheus,prometheus=sx-kube-prometheus-stack-prometheus

and the loki servicemonitor doesn't have labels like this. Maybe we need to set
https://github.com/suxess-it/sx-cnp-oss/blob/a5b790b9b5bc5757e254c619f07bde9e256dbc2a/platform-apps/charts/kube-prometheus-stack/values-k3d.yaml#L3518C7-L3518C46
to false? (like in helm/charts#13196 )

Question: or should grafana-agent-operator use this servicemonitor to scrape the metrics? if so, can grafana-agent-operator push those metrics to prometheus datasource???

jkleinlercher · 2024-04-22T10:21:50Z

when setting serviceMonitorSelectorNilUsesHelmValues: false in the kube-prometheus-stack helm chart we then see in prometheus service discovery also all other servicemonitoris:

and also the "Loki / logs" Dashboard is now working because the data gets scraped now via the loki serviceMonitor:

…loki helm chart and disable serviceMonitorSelectorNilUsesHelmValues so loki serviceMonitors get honored by kube-prometheus-stack, #60 Signed-off-by: Johannes Kleinlercher <johannes.kleinlercher@suxess-it.com>

Signed-off-by: Johannes Kleinlercher <johannes.kleinlercher@suxess-it.com>

jkleinlercher · 2024-04-22T14:18:59Z

Conclusion:

Why „Kubernetes Logs“ dashboard didn‘t work without modifying the datasource variable is still under investigation

for this let’s take a look again at the off-the-shelf dashboards https://grafana.com/grafana/dashboards/15141-kubernetes-service-logs/ and check how the dazasource settings are there

everything as expected. when importing the dashboard from url above you are asked for the datasource uid. since we add it automatically in the helm chart we need to set "loki" as uid also in the configmap https://github.com/suxess-it/sx-cnp-oss/blob/4cfef24243bf187a48fa6c2a74ccfe22aa60aea5/platform-apps/charts/loki/templates/dashboard.yaml#L192

jkleinlercher · 2024-04-22T15:00:13Z

created new k3d cluster and did some tests --> looks good.

One last hint: some self-monitoring dashboards like "loki-reads resources" show no data.
The reason is, that the query looks for a pod named "loki-read" which doesn't exist in our environment:

sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~"$cluster", namespace=~"$namespace", container="loki", pod=~"(loki|enterprise-logs)-read.*"}[$__rate_interval]))

Maybe some of the dashboards only work for a certain deployment mode. Look like grafana/loki#7657

jkleinlercher added this to the demo ready for metalstack milestone Apr 18, 2024

jkleinlercher self-assigned this Apr 18, 2024

jkleinlercher added a commit that referenced this issue Apr 19, 2024

relabelings for loki - issue #60

ea185bb

Signed-off-by: Johannes Kleinlercher <johannes.kleinlercher@suxess-it.com>

jkleinlercher added a commit that referenced this issue Apr 22, 2024

alloy is incompatible with loki self-monitoring, #60 (comment)

795b943

Signed-off-by: Johannes Kleinlercher <johannes.kleinlercher@suxess-it.com>

jkleinlercher added a commit that referenced this issue Apr 22, 2024

grafanaAgent operator is needed for some CRDs with selfMonitoring, #60

cdd3ec5

Signed-off-by: Johannes Kleinlercher <johannes.kleinlercher@suxess-it.com>

jkleinlercher closed this as completed Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[loki] add log dashboard #60

[loki] add log dashboard #60

phac008 commented Apr 11, 2024

phac008 commented Apr 12, 2024

phac008 commented Apr 18, 2024 •

edited

Loading

jkleinlercher commented Apr 18, 2024

jkleinlercher commented Apr 19, 2024 •

edited

Loading

jkleinlercher commented Apr 19, 2024

jkleinlercher commented Apr 19, 2024

jkleinlercher commented Apr 19, 2024 •

edited

Loading

jkleinlercher commented Apr 19, 2024 •

edited

Loading

jkleinlercher commented Apr 19, 2024

jkleinlercher commented Apr 19, 2024

jkleinlercher commented Apr 19, 2024 •

edited

Loading

jkleinlercher commented Apr 20, 2024 •

edited

Loading

jkleinlercher commented Apr 20, 2024 •

edited

Loading

jkleinlercher commented Apr 22, 2024 •

edited

Loading

jkleinlercher commented Apr 22, 2024

jkleinlercher commented Apr 22, 2024

jkleinlercher commented Apr 22, 2024 •

edited

Loading

[loki] add log dashboard #60

[loki] add log dashboard #60

Comments

phac008 commented Apr 11, 2024

phac008 commented Apr 12, 2024

phac008 commented Apr 18, 2024 • edited Loading

jkleinlercher commented Apr 18, 2024

jkleinlercher commented Apr 19, 2024 • edited Loading

jkleinlercher commented Apr 19, 2024

jkleinlercher commented Apr 19, 2024

jkleinlercher commented Apr 19, 2024 • edited Loading

jkleinlercher commented Apr 19, 2024 • edited Loading

jkleinlercher commented Apr 19, 2024

jkleinlercher commented Apr 19, 2024

jkleinlercher commented Apr 19, 2024 • edited Loading

jkleinlercher commented Apr 20, 2024 • edited Loading

jkleinlercher commented Apr 20, 2024 • edited Loading

jkleinlercher commented Apr 22, 2024 • edited Loading

jkleinlercher commented Apr 22, 2024

jkleinlercher commented Apr 22, 2024

jkleinlercher commented Apr 22, 2024 • edited Loading

phac008 commented Apr 18, 2024 •

edited

Loading

jkleinlercher commented Apr 19, 2024 •

edited

Loading

jkleinlercher commented Apr 19, 2024 •

edited

Loading

jkleinlercher commented Apr 19, 2024 •

edited

Loading

jkleinlercher commented Apr 19, 2024 •

edited

Loading

jkleinlercher commented Apr 20, 2024 •

edited

Loading

jkleinlercher commented Apr 20, 2024 •

edited

Loading

jkleinlercher commented Apr 22, 2024 •

edited

Loading

jkleinlercher commented Apr 22, 2024 •

edited

Loading