Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[loki] add log dashboard #60

Closed
phac008 opened this issue Apr 11, 2024 · 17 comments
Closed

[loki] add log dashboard #60

phac008 opened this issue Apr 11, 2024 · 17 comments
Assignees

Comments

@phac008
Copy link
Contributor

phac008 commented Apr 11, 2024

automatic deployed dashboards are for loki health only...

examples:
https://grafana.com/grafana/dashboards/?search=loki

@phac008
Copy link
Contributor Author

phac008 commented Apr 12, 2024

7b03ff5

-> adding and automatic reload (within grafana) works

but...
there is still an issue with "default" datasource on default loki dashboards - default datasource does not contain information - whereas "-- Grafana --" contains information ... investigating

@jkleinlercher jkleinlercher self-assigned this Apr 18, 2024
@phac008
Copy link
Contributor Author

phac008 commented Apr 18, 2024

additional information:
grafana/loki#9273

This is what may should be tested

@jkleinlercher
Copy link
Contributor

@jkleinlercher
Copy link
Contributor

jkleinlercher commented Apr 19, 2024

erster schritt: wie lautet unser CLUSTER_NAME?

The dashboards require certain metric labels to display Kubernetes metrics. The best way to accomplish this is to install the kube-prometheus-stack Helm chart with the following values file, replacing CLUSTER_NAME with the name of your cluster. The cluster name is what you specify during the helm installation, so a cluster installed with the command helm install loki-cluster grafana/loki would be called loki-cluster.

===> schaut so aus als wär das eigentlich der Release-Name vom installierten Loki, und das ist bei uns "sx-loki", weil der Helm-Release-Name dem ArgoCD appname entspricht.
https://github.com/suxess-it/sx-cnp-oss/blob/063e90175ff126419010ed9f8301e91b38900e44/platform-apps/target-type/k3d/loki-app.yaml#L4C9-L4C16

jkleinlercher added a commit that referenced this issue Apr 19, 2024
Signed-off-by: Johannes Kleinlercher <johannes.kleinlercher@suxess-it.com>
@jkleinlercher
Copy link
Contributor

cluster="sx-loki" label is jetzt bei jede menge metriken dabei, d.h. die Konfiguration von ea185bb scheint gezogen zu haben. Das Dashboard https://grafana-127-0-0-1.nip.io/d/logs/loki-logs?orgId=1&refresh=10s zeigt trtozdem noch keine daten an.

image

@jkleinlercher
Copy link
Contributor

it seems I face the same issue right now as in grafana/loki#9273 (comment)

"the loki_build_info metric seems empty on the Prometheus data source."

image

investigating ...

@jkleinlercher
Copy link
Contributor

jkleinlercher commented Apr 19, 2024

meine Vermutung ist aktuell dass die loki_build_info Metrik nur existiert, wenn selfMonitoring aktiviert ist.
Ist derzeit false: https://github.com/suxess-it/sx-cnp-oss/blob/c41e934bb1b368fa9479fc9e03d8c6cbf27505ad/platform-apps/charts/loki/values-k3d.yaml#L99

Grund: dazu müssten zusätzliche CRDs installiert werden. Lt. Kommentar beim Attribut:

    # Self monitoring determines whether Loki should scrape its own logs.
    # This feature currently relies on the Grafana Agent Operator being installed,
    # which is installed by default using the grafana-agent-operator sub-chart.
    # It will create custom resources for GrafanaAgent, LogsInstance, and PodLogs to configure
    # scrape configs to scrape its own logs with the labels expected by the included dashboards.

Um den Grafana Agent Operator zu installieren muss man das Helm-Value monitoring.selfMonitoring.grafanaAgent.installOperator auf 'true' setzen, siehe
https://github.com/grafana/loki/blob/e39677f97b4ba27c90d9f8d2991441095e55b06e/production/helm/loki/Chart.yaml#L23

Das habe ich jetzt lokal im k3d-Cluster versucht, allerdings taucht da das nächste Problem auf beim syncen:

CustomResourceDefinition.apiextensions.k8s.io "podlogs.monitoring.grafana.com" is invalid: status.storedVersions[0]: Invalid value: "v1alpha2": must appear in spec.versions

Meine Vermutung:
es ist schon eine Podlogs CRD über alloy installiert worden mit der Version v1alpha2: https://github.com/grafana/alloy/blob/6585e5a0c2f658d7c1d5b4361313dcd30cb3309b/operations/helm/charts/alloy/charts/crds/crds/monitoring.grafana.com_podlogs.yaml#L21

und über loki will eine installiert werden mit der Version v1alpha1 und ich vermute deshalb passiert jetzt auch beim syncen von loki der Fehler.

k get crd podlogs.monitoring.grafana.com -n monitoring -o yaml

  names:
    categories:
    - grafana-alloy
    - alloy
    kind: PodLogs
    listKind: PodLogsList
    plural: podlogs
    singular: podlogs
  scope: Namespaced
  versions:
  - name: v1alpha2

@jkleinlercher
Copy link
Contributor

jkleinlercher commented Apr 19, 2024

According to https://grafana.com/docs/agent/latest/flow/reference/components/loki.source.podlogs/ the grafana agent is deprecated so we should use podlogs CRD from alloy which is also newer. Investigating how podlogs instance in Loki could work together with podlogs CRD from alloy (v1alpha1 VS v1alpha2)

image

@jkleinlercher
Copy link
Contributor

According to https://grafana.com/docs/agent/latest/flow/reference/components/loki.source.podlogs/ the grafana agent is deprecated so we should use podlogs CRD from alloy which is also newer. Investigating how podlogs instance in Loki could work together with podlogs CRD from alloy (v1alpha1 VS v1alpha2)

image

You can specify the podlogs api version in loki values file https://github.com/grafana/loki/blob/4c563f7823f9f9b7e65026b52e9c16580d4d87f4/production/helm/loki/values.yaml#L3331

@jkleinlercher
Copy link
Contributor

and here again why selfMonitoring needs to be enabled to get the included dashboards to work:

Self monitoring is enabled by default. This will deploy a GrafanaAgent, LogsInstance, and PodLogs resource which will instruct the Grafana Agent Operator (installed separately) on how to scrape this Loki cluster’s logs and send them back to itself. Scraping this Loki cluster using the scrape config defined in the PodLogs resource is required for the included dashboards to work.

https://grafana.com/docs/loki/latest/setup/install/helm/monitor-and-alert/with-local-monitoring/

@jkleinlercher
Copy link
Contributor

jkleinlercher commented Apr 19, 2024

Conclusion:

  • included dashboards like „Loki logs“ need selfMonitoring enabled. SelfMonitoring needs GrafanaAgent, and GrafanaAgent Podlogs CRD is incompatible with Alloys Podlogs CRD. Still investigating a scenario alloy and Loki selfMonitoring can work together. —> Update loki helm chart to depend on alloy not grafana-agent-operator grafana/loki#12627
  • Why „Kubernetes Logs“ dashboard didn‘t work without modifying the datasource variable is still under investigation

@jkleinlercher
Copy link
Contributor

jkleinlercher commented Apr 20, 2024

Conclusion:

Let’s try if Loki dashboards and selfmonitoring works when alloy is not installed. If it works, we should think about skipping alloy until it is integrated in Loki instead of grafana agent operator

Some info about internal —Grafana— Datasource: this is just a datasource with test data! ( https://grafana.com/docs/grafana/latest/datasources/#grafana)

@jkleinlercher
Copy link
Contributor

jkleinlercher commented Apr 20, 2024

Conclusion:

  • Why „Kubernetes Logs“ dashboard didn‘t work without modifying the datasource variable is still under investigation

for this let’s take a look again at the off-the-shelf dashboards https://grafana.com/grafana/dashboards/15141-kubernetes-service-logs/
and check how the dazasource settings are there

@jkleinlercher
Copy link
Contributor

jkleinlercher commented Apr 22, 2024

after uninstalling alloy the loki app with selfMonitoring enabled installs sucessfully (grafanaagent included), however no loki metric is in prometheus datasource.
Also the servicemonitor sx-loki is installed which should scrape the loki metrics (definition in the servicemonitor looks correct and I can scrape prometheus metrics from sx-loki-0 pod sucessfully manually.

kubectl get servicemonitor -n monitoring sx-loki
NAME      AGE
sx-loki   2d22h

However, the sx-loki servicemonitor doesn't show up in servicediscovery page on prometheus itself:

image

I think the reason is that the prometheus selector labels are set like this

kubectl get prometheus -n monitoring -o yaml | grep selector
    selector: app.kubernetes.io/instance=sx-kube-prometheus-stack-prometheus,app.kubernetes.io/managed-by=prometheus-operator,app.kubernetes.io/name=prometheus,operator.prometheus.io/name=sx-kube-prometheus-stack-prometheus,prometheus=sx-kube-prometheus-stack-prometheus

and the loki servicemonitor doesn't have labels like this. Maybe we need to set
https://github.com/suxess-it/sx-cnp-oss/blob/a5b790b9b5bc5757e254c619f07bde9e256dbc2a/platform-apps/charts/kube-prometheus-stack/values-k3d.yaml#L3518C7-L3518C46
to false? (like in helm/charts#13196 )

Question: or should grafana-agent-operator use this servicemonitor to scrape the metrics? if so, can grafana-agent-operator push those metrics to prometheus datasource???

@jkleinlercher
Copy link
Contributor

when setting serviceMonitorSelectorNilUsesHelmValues: false in the kube-prometheus-stack helm chart we then see in prometheus service discovery also all other servicemonitoris:

image

and also the "Loki / logs" Dashboard is now working because the data gets scraped now via the loki serviceMonitor:

image

jkleinlercher added a commit that referenced this issue Apr 22, 2024
…loki helm chart and disable serviceMonitorSelectorNilUsesHelmValues so loki serviceMonitors get honored by kube-prometheus-stack, #60

Signed-off-by: Johannes Kleinlercher <johannes.kleinlercher@suxess-it.com>
jkleinlercher added a commit that referenced this issue Apr 22, 2024
Signed-off-by: Johannes Kleinlercher <johannes.kleinlercher@suxess-it.com>
jkleinlercher added a commit that referenced this issue Apr 22, 2024
Signed-off-by: Johannes Kleinlercher <johannes.kleinlercher@suxess-it.com>
@jkleinlercher
Copy link
Contributor

Conclusion:

  • Why „Kubernetes Logs“ dashboard didn‘t work without modifying the datasource variable is still under investigation

for this let’s take a look again at the off-the-shelf dashboards https://grafana.com/grafana/dashboards/15141-kubernetes-service-logs/ and check how the dazasource settings are there

everything as expected. when importing the dashboard from url above you are asked for the datasource uid. since we add it automatically in the helm chart we need to set "loki" as uid also in the configmap https://github.com/suxess-it/sx-cnp-oss/blob/4cfef24243bf187a48fa6c2a74ccfe22aa60aea5/platform-apps/charts/loki/templates/dashboard.yaml#L192

@jkleinlercher
Copy link
Contributor

jkleinlercher commented Apr 22, 2024

created new k3d cluster and did some tests --> looks good.

One last hint: some self-monitoring dashboards like "loki-reads resources" show no data.
The reason is, that the query looks for a pod named "loki-read" which doesn't exist in our environment:

sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~"$cluster", namespace=~"$namespace", container="loki", pod=~"(loki|enterprise-logs)-read.*"}[$__rate_interval]))

Maybe some of the dashboards only work for a certain deployment mode. Look like grafana/loki#7657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants