Series API endpoint unreliable for sparse logs, breaks label_values in Grafana #8034

DieterDP-ng · 2023-01-03T18:11:09Z

I have a Grafana dashboard that displays logs for several pods in a kubernetes environment. We ingest logs from both console and log files, and apply a source label to differentiate these. In the dashboard we allow selecting the Kubernetes namespace, Kubernetes pod and source using dropdowns (Grafana variables) so the user can select which logs he wants to see.

One of these pods only outputs a single log line every +- 15 minutes. For this pod, I noticed that the source dropdown will not get filled in, i.e. the Grafana label_values call fails for Loki data . This causes it to default to None, and applied to the LogQL query, results in no logs being found. In the end, it seems that there are no logs, while in fact there are.

For reference, the source variable is a Grafana configured as: label_values({namespace="$namespace", pod="$pod"}, source)

I tracked this problem down to the Loki series API, as this is the one used by label_values.

When using the /api/v1/series API, Loki returns no data.

curl -g 'http://localhost:42135/loki/api/v1/series?end=1672767401037029205&start=1672763401037029205' --data-urlencode 'match[]={namespace="devic1-shared",pod="yarn-resourcemanager-0-0"}'
{"status":"success","data":[]}

However, when using the /api/v1/query_range over the same time range, data is returned:

curl -G -s 'http://localhost:42135/loki/api/v1/query_range?' --data-urlencode 'end=1672767401037029205' --data-urlencode 'start=1672763401037029205' --data-urlencode 'query={namespace="devic1-shared",pod="yarn-resourcemanager-0-0"}' | jq
{
  "status": "success",
  "data": {
    "resultType": "streams",
    "result": [
      {
        "stream": {
          "component": "resourcemanager",
          "filename": "/var/lib/kubelet/pods/d59a01f9-8733-4560-b7cc-78cd0514c375/volumes/kubernetes.io~empty-dir/logs/gc.log",
          "job": "devic1-shared/yarn",
          "namespace": "devic1-shared",
          "node_name": "10.178.0.95",
          "pod": "yarn-resourcemanager-0-0",
          "source": "gc.log",
          "app": "yarn"
        },
        "values": [
          [
            "1672766971030608973",
            "[2023-01-03T17:29:30.927+0000][94592.430s][1] GC(152) Pause Young (Normal) (G1 Evacuation Pause) 125M->39M(145M) 1.925ms"
          ],
          [
            "1672766250984737472",
            "[2023-01-03T17:17:30.950+0000][93872.452s][1] GC(151) Pause Young (Normal) (G1 Evacuation Pause) 125M->39M(145M) 26.382ms"
          ],
          [
            "1672765471022969603",
            "[2023-01-03T17:04:30.927+0000][93092.430s][1] GC(150) Pause Young (Normal) (G1 Evacuation Pause) 125M->39M(145M) 1.888ms"
          ],
          [
            "1672764691019264949",
            "[2023-01-03T16:51:30.954+0000][92312.457s][1] GC(149) Pause Young (Normal) (G1 Evacuation Pause) 125M->39M(145M) 28.135ms"
          ],
          [
            "1672763970962902691",
            "[2023-01-03T16:39:30.944+0000][91592.446s][1] GC(148) Pause Young (Normal) (G1 Evacuation Pause) 125M->39M(145M) 19.950ms"
          ]
        ]
      }
    ],
    "stats": {
      "summary": {
        "bytesProcessedPerSecond": 231286,
        "linesProcessedPerSecond": 1644,
        "totalBytesProcessed": 703,
        "totalLinesProcessed": 5,
        "execTime": 0.00303952,
        "queueTime": 6.1705e-05,
        "subqueries": 1,
        "totalEntriesReturned": 5
      },
      "querier": {
        "store": {
          "totalChunksRef": 0,
          "totalChunksDownloaded": 0,
          "chunksDownloadTime": 0,
          "chunk": {
            "headChunkBytes": 0,
            "headChunkLines": 0,
            "decompressedBytes": 0,
            "decompressedLines": 0,
            "compressedBytes": 0,
            "totalDuplicates": 0
          }
        }
      },
      "ingester": {
        "totalReached": 1,
        "totalChunksMatched": 0,
        "totalBatches": 1,
        "totalLinesSent": 5,
        "store": {
          "totalChunksRef": 5,
          "totalChunksDownloaded": 5,
          "chunksDownloadTime": 521213,
          "chunk": {
            "headChunkBytes": 0,
            "headChunkLines": 0,
            "decompressedBytes": 703,
            "decompressedLines": 5,
            "compressedBytes": 778,
            "totalDuplicates": 0
          }
        }
      }
    }
  }
}

The strange thing is, the behavior of the first query depends on when exactly you execute it:

< 5 mins after a log entry was ingested: the query will return the series
> 5 mins, < X: the query will return empty (as above)
> X (at some point in the future, I am unsure what triggers this): the query will return the series

I am using Loki v2.6.1.

The text was updated successfully, but these errors were encountered:

KovalDS · 2023-02-16T12:47:52Z

Have the same issue using loki-stack helm chart on Loki 2.6.1 as well as 2.7.13 in single binary mode with filesystem storage. Series api doesn't return all label values that match query, even though those definitely exist - Logs query with those label values return some logs.

Personal observation - this happens for pods that doesn't write a lot of logs. For me period after which query returns series again from last log ingestion is about 2 and a half hours

@DieterDP-ng Did you come up with some solution or workaround for this?

DieterDP-ng · 2023-02-16T13:51:51Z

@KovalDS We already spent some time on it, but didn't find the specific cause or a workaround yet unfortunately.

KovalDS · 2023-02-16T14:04:20Z

@DieterDP-ng Well, that's unfortunate. Going to spend some time investigating as well. Thanks for your reply!

jiayi-1994 · 2023-03-31T06:29:01Z

Create a cronjob and create a pod every minute to reproduce the "series" interface. A large number of Pods cannot be queried, but log "queryRange" can search for the corresponding pod logs @DieterDP-ng

chaudum · 2023-06-19T07:37:24Z

@DieterDP-ng Thanks for the detailed report. Do you have a chance to reproduce it with the latest Loki version 2.8.2? Nevertheless, we will have a look at the issue.

DieterDP-ng · 2023-06-19T13:57:46Z

Tested on Loki 2.8.2. The pod from my opening post was no longer doing sparse output, but I reproduced on another pod. This pod executed a job that only logged a few lines, and nothing else once it was completed.

It seems there is still a difference between both API endpoints.

curl -g 'http://localhost:37883/loki/api/v1/series?end=1687181123&start=1687170323' --data-urlencode 'match[]={namespace="minikube-shared",pod="oozie-upload-sharelib-zx2kv"}' | jq
{
  "status": "success",
  "data": []
}


curl -G -s 'http://localhost:37883/loki/api/v1/query_range?' --data-urlencode 'end=1687181123037029205' --data-urlencode 'start=1687170323037029205' --data-urlencode 'query={namespace="minikube-shared",pod="oozie-upload-sharelib-zx2kv"}' | jq
{
  "status": "success",
  "data": {
    "resultType": "streams",
    "result": [
      {
        "stream": {
          "app": "oozie",
          "container": "pre-install-job",
          "filename": "/var/log/pods/minikube-shared_oozie-upload-sharelib-zx2kv_32529b27-cedb-4eda-9c9e-504c61230ffb/pre-install-job/0.log",
          "instance": "oozie",
          "job": "minikube-shared/oozie",
          "namespace": "minikube-shared",
          "node_name": "minikube",
          "pod": "oozie-upload-sharelib-zx2kv",
          "source": "console",
          "stream": "stdout"
        },
        "values": [
          [
            "1687177122247220308",
            "12:18:42,247  INFO Services:520 - Shutdown\n"
          ],
          [
            "1687177120156263988",
            "Copy tasks are done\n"
          ],
          [
            "1687177104615391461",
            "Running 453 copy tasks on 8 threads\n"
          ],
          ... (abbreviated the output)
          [
            "1687177103876489632",
            "12:18:23,875  INFO XLogService:520 - Log4j configuration file [oozie-log4j.properties]\n"
          ],
          [
            "1687177103875454822",
            "*******************************************************************************\n"
          ],
          [
            "1687177103875419866",
            "12:18:23,874  INFO XLogService:520 - \n ******************************************************************************* \n  STARTUP MSG: Oozie BUILD_VERSION [5.2.1-8-lily] compiled by [benny] on [${build.time}]\n  STARTUP MSG:       revision [${vc.revision}]@[${vc.url}]\n"
          ]
        ]
      },
      {
        "stream": {
          "app": "oozie",
          "container": "pre-install-job",
          "filename": "/var/log/pods/minikube-shared_oozie-upload-sharelib-zx2kv_32529b27-cedb-4eda-9c9e-504c61230ffb/pre-install-job/0.log",
          "instance": "oozie",
          "job": "minikube-shared/oozie",
          "namespace": "minikube-shared",
          "node_name": "minikube",
          "pod": "oozie-upload-sharelib-zx2kv",
          "source": "console",
          "stream": "stderr"
        },
        "values": [
          [
            "1687177101618548331",
            "SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]\n"
          ],
          [
            "1687177101535446586",
            "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n"
          ],
          [
            "1687177101535444826",
            "SLF4J: Found binding in [jar:file:/opt/oozie/embedded-oozie-server/webapp/WEB-INF/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n"
          ],
          [
            "1687177101535441180",
            "SLF4J: Found binding in [jar:file:/opt/oozie/embedded-oozie-server/webapp/WEB-INF/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n"
          ],
          [
            "1687177101535408733",
            "SLF4J: Class path contains multiple SLF4J bindings.\n"
          ]
        ]
      }
    ],
    "stats": {
      "summary": {
        "bytesProcessedPerSecond": 226788,
        "linesProcessedPerSecond": 1881,
        "totalBytesProcessed": 5184,
        "totalLinesProcessed": 43,
        "execTime": 0.022858,
        "queueTime": 0.051464,
        "subqueries": 0,
        "totalEntriesReturned": 43,
        "splits": 13,
        "shards": 208
      },
      "querier": {
        "store": {
          "totalChunksRef": 0,
          "totalChunksDownloaded": 0,
          "chunksDownloadTime": 0,
          "chunk": {
            "headChunkBytes": 0,
            "headChunkLines": 0,
            "decompressedBytes": 0,
            "decompressedLines": 0,
            "compressedBytes": 0,
            "totalDuplicates": 0
          }
        }
      },
      "ingester": {
        "totalReached": 160,
        "totalChunksMatched": 0,
        "totalBatches": 2,
        "totalLinesSent": 43,
        "store": {
          "totalChunksRef": 2,
          "totalChunksDownloaded": 2,
          "chunksDownloadTime": 140806,
          "chunk": {
            "headChunkBytes": 0,
            "headChunkLines": 0,
            "decompressedBytes": 5184,
            "decompressedLines": 43,
            "compressedBytes": 1980,
            "totalDuplicates": 0
          }
        }
      },
      "cache": {
        "chunk": {
          "entriesFound": 0,
          "entriesRequested": 0,
          "entriesStored": 0,
          "bytesReceived": 0,
          "bytesSent": 0,
          "requests": 0,
          "downloadTime": 0
        },
        "index": {
          "entriesFound": 0,
          "entriesRequested": 0,
          "entriesStored": 0,
          "bytesReceived": 0,
          "bytesSent": 0,
          "requests": 0,
          "downloadTime": 0
        },
        "result": {
          "entriesFound": 0,
          "entriesRequested": 0,
          "entriesStored": 0,
          "bytesReceived": 0,
          "bytesSent": 0,
          "requests": 0,
          "downloadTime": 0
        }
      }
    }
  }
}

joao-aguilera-c · 2024-12-06T16:56:37Z

Is there some news on this? I'm still experiencing this in loki:2.9.2

This was referenced Mar 31, 2023

Series API requests give inconsistent and/or missing results #4547

Closed

Statefulset Series has not found #8867

Closed

Loki: Series API will return all series with no match or empty matcher #2254

Merged

Di-Is mentioned this issue Jun 19, 2023

In monolithic mode, the series contained in the Series API response is missing. #9729

Open

chaudum added the type/bug Somehing is not working as expected label Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series API endpoint unreliable for sparse logs, breaks label_values in Grafana #8034

Series API endpoint unreliable for sparse logs, breaks label_values in Grafana #8034

DieterDP-ng commented Jan 3, 2023

KovalDS commented Feb 16, 2023

DieterDP-ng commented Feb 16, 2023

KovalDS commented Feb 16, 2023

jiayi-1994 commented Mar 31, 2023

chaudum commented Jun 19, 2023

DieterDP-ng commented Jun 19, 2023

joao-aguilera-c commented Dec 6, 2024

Series API endpoint unreliable for sparse logs, breaks label_values in Grafana #8034

Series API endpoint unreliable for sparse logs, breaks label_values in Grafana #8034

Comments

DieterDP-ng commented Jan 3, 2023

KovalDS commented Feb 16, 2023

DieterDP-ng commented Feb 16, 2023

KovalDS commented Feb 16, 2023

jiayi-1994 commented Mar 31, 2023

chaudum commented Jun 19, 2023

DieterDP-ng commented Jun 19, 2023

joao-aguilera-c commented Dec 6, 2024