Metrics labels and values in loki logs #423

deathalt · 2024-01-09T14:07:11Z

Qryn Prometheus metrics duplicated in loki requests with labels and their values

it's not a problem when you can add a filter, such as {job="loki", level=~""} but if the log doesn't have a level, you have to come up with other ways to filter logs from metrics.
And then the number of labels starts to grow for loki as well.

The text was updated successfully, but these errors were encountered:

lmangani · 2024-01-11T11:28:29Z

A change is upcoming to avoid this side effect. @akvlad will provide updates once ready

akvlad · 2024-01-26T12:37:52Z

@deathalt the feature is inside the branch https://github.com/metrico/qryn/tree/feat_423 .
Merge and release will be done shortly.

* alter tables; * debug * ingestion impl * reading impl; reading test * reading impl; reading test * test if qryn works * workflow fixes * test fixes * debug * fix tests

bcarlock-mycarrier · 2024-01-29T17:22:53Z

Just reporting in- I updated our qryn instance to 3.1.5-bun and we're still seeing this issue.

akvlad · 2024-01-29T19:35:04Z

Should be fixed in 3.2.0

bcarlock-mycarrier · 2024-01-29T22:56:38Z

@akvlad I updated to 3.2.0 and we're still seeing metrics appear as logs.

lmangani · 2024-01-29T23:02:19Z

Thanks for the report @bcarlock-mycarrier could you provide some examples for a better understanding?

bcarlock-mycarrier · 2024-01-29T23:14:33Z

Sure:
deployment -

bcarlock-mycarrier · 2024-01-29T23:15:45Z

I'd be happy to provide more information if you can tell me what might be helpful. Right now we have two different qryn instances, one for metrics, one for logs in order to avoid this problem.

akvlad · 2024-01-30T08:32:33Z

@bcarlock-mycarrier @deathalt let me recap what was changed in 3.2.0.

The extra env config was added to qryn: DISTINGUISH_LOGS_METRICS=1.
After the environment is provided, the newly ingested information starts to be distinguished between logs and metrics.
All the existing information is still present in the both datasources.
Please recheck the config env vars for qryn.

I believe that it appeared somewhere in the new documentation. Will recheck and add if needed.
Sorry for the inconvenience.

deathalt · 2024-01-30T11:34:27Z

@akvlad

I have the same behavior.

CLICKHOUSE_SERVER: chproxy.chproxy.svc
  CLICKHOUSE_PORT: 9090
  CLICKHOUSE_DB: qryn
  CLICKHOUSE_AUTH: default:
  CLICKHOUSE_PROTO: http
  CLICKHOUSE_TIMEFIELD: record_datetime
  CLUSTER_NAME: shard
  BULK_MAXAGE: 4000
  BULK_MAXSIZE: 10000000
  BULK_MAXCACHE: 100000
  LABELS_DAYS: 7
  SAMPLES_DAYS: 7
  HOST: 0.0.0.0.0
  PORT: 3100
  FASTIFY_BODYLIMIT: 5242880
  FASTIFY_REQUESTTIMEOUT: 30000
  FASTIFY_MAXREQUESTS: 0
  FASTIFY_METRICS: true
  NODE_OPTIONS: "--max-old-space-size=61440"
  DISTINGUISH_LOGS_METRICS: 1

fresh metrics are still in the logs.

bcarlock-mycarrier · 2024-01-30T14:42:25Z

DISTINGUISH_LOGS_METRICS=1 seems to have solved the problem for us. Which brings up the question, under what circumstance would you want this turned off? It doesn't seem like the sort of thing you'd usually put behind a feature flag.

lmangani · 2024-01-30T15:15:48Z

@bcarlock-mycarrier I think its safe to say once we're past the testing confirmations it will become a default

bcarlock-mycarrier · 2024-01-30T15:17:45Z

Makes sense. I'm not sure if this is related or not, but I'm seeing somewhat reduced query performance after enabling that.

lmangani · 2024-01-30T15:58:37Z

Interesting. @akvlad is any performance impact expected? or might this be due to the "dual-stack" support of both old and new types?

akvlad · 2024-01-30T16:05:14Z

The type support is an extra IN in terms of the sql.
IN is incredibly fast in clickhouse, like 100M comparisons a sec. But anyway it has its cost.

bcarlock-mycarrier · 2024-01-30T16:38:42Z

The most noticeable performance decrease is actually populating searchable labels and cross referencing them in grafana. Is there any caching possible? Here's a gif example. The gif below doesn't illustrate perfectly due to frame rate, but it should give you some idea. The initial load of the Metrics Browser link is much longer than it feels in the gif.

akvlad · 2024-01-30T16:50:31Z

@bcarlock-mycarrier I see.
How many time_series do you have?

SELECT count(distinct fingerprint) from time_series

And it would be great to get /series latency from you:
Please find the /api/v1/series request in the logs:

{"level":30,"time":1706633214417,"pid":19,"hostname":"hromozeka-PC","name":"qryn","reqId":"req-e","req":{"method":"GET","url":"/api/v1/series?start=1706629615&end=1706633215&match%5B%5D=system_cpu_time_seconds_total%7B%7D","hostname":"localhost:3100","remoteAddress":"127.0.0.1","remotePort":38574},"msg":"incoming request"}
{"level":30,"time":1706633214469,"pid":19,"hostname":"hromozeka-PC","name":"qryn","reqId":"req-e","res":{"statusCode":200},"responseTime":51.44048300012946,"msg":"request completed"}

It would be nice to see "responseTime":51.44048300012946
for the /api/v1/series requests.

Or if you have nginx or something in front, please share the timings.

bcarlock-mycarrier · 2024-01-30T17:00:18Z

We just turned this back on yesterday. The current count is 232,913, so not a ton honestly.

{"level":30,"time":1706633940075,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1i","req":{"method":"GET","url":"/api/v1/label/__name__/values?start=1706612340&end=1706634000","hostname":"qryn-metrics-reader.monitoring:3100","remoteAddress":"10.2.2.247","remotePort":34502},"msg":"incoming request"}
{"level":20,"time":1706633940075,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1i","msg":"GET /api/prom/label/__name__/values"}
{"level":30,"time":1706633940779,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1i","res":{"statusCode":200},"responseTime":703.5259080082178,"msg":"request completed"}
{"level":30,"time":1706633942002,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1j","req":{"method":"GET","url":"/api/v1/labels?start=1706612340&end=1706634000","hostname":"qryn-metrics-reader.monitoring:3100","remoteAddress":"10.2.2.247","remotePort":34502},"msg":"incoming request"}
{"level":20,"time":1706633942003,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1j","msg":"GET /loki/api/v1/label"}
SELECT DISTINCT key FROM time_series_gin WHERE date >= toDate(FROM_UNIXTIME(intDiv(1706612340000000000, 1000000000))) AND date <= toDate(FROM_UNIXTIME(intDiv(1706634000000000000, 1000000000))) AND type IN (0,0) FORMAT JSON
{"level":30,"time":1706633942393,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1j","res":{"statusCode":200},"responseTime":390.80939300358295,"msg":"request completed"}

bcarlock-mycarrier · 2024-02-07T16:48:05Z

I have a quick addition to this issue. Since updating and enabling DISTINGUISH_LOGS_METRICS we've also noticed some strangeness related to the Loki interface label correlation. The first label selected is always a label from logging. The second label is always what appears to be a metric tag. The third label tends to be a log label again.

lmangani · 2024-02-07T17:06:32Z

Thanks for the report @bcarlock-mycarrier 👍
@akvlad and the team will investigate and provide feedback

lmangani · 2024-02-08T15:12:51Z

@bcarlock-mycarrier we cannot reproduce this condition. Please open a separate issue with all the case details. If you're running multiple instances or multiple readers/writers make sure they are all featuring the same settings.

lmangani · 2024-03-19T20:21:30Z

This should be resolved in the latest releases. Please reopen if the issue is still relevant.

mk-raven · 2024-03-22T10:50:54Z

The problem was stayed with metrics labels in logs.

lmangani assigned akvlad Jan 11, 2024

lmangani added enhancement New feature or request help wanted Extra attention is needed labels Jan 11, 2024

akvlad mentioned this issue Jan 26, 2024

#423 distinguish logs and metrics #440

Merged

lmangani pushed a commit that referenced this issue Jan 29, 2024

#423 distinguish logs and metrics (#440)

6c4e5e2

* alter tables; * debug * ingestion impl * reading impl; reading test * reading impl; reading test * test if qryn works * workflow fixes * test fixes * debug * fix tests

lmangani closed this as completed Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics labels and values in loki logs #423

Metrics labels and values in loki logs #423

deathalt commented Jan 9, 2024

lmangani commented Jan 11, 2024

akvlad commented Jan 26, 2024

bcarlock-mycarrier commented Jan 29, 2024

akvlad commented Jan 29, 2024

bcarlock-mycarrier commented Jan 29, 2024

lmangani commented Jan 29, 2024

bcarlock-mycarrier commented Jan 29, 2024

bcarlock-mycarrier commented Jan 29, 2024

akvlad commented Jan 30, 2024

deathalt commented Jan 30, 2024

bcarlock-mycarrier commented Jan 30, 2024

lmangani commented Jan 30, 2024

bcarlock-mycarrier commented Jan 30, 2024

lmangani commented Jan 30, 2024

akvlad commented Jan 30, 2024

bcarlock-mycarrier commented Jan 30, 2024

akvlad commented Jan 30, 2024 •

edited

Loading

bcarlock-mycarrier commented Jan 30, 2024 •

edited

Loading

bcarlock-mycarrier commented Feb 7, 2024

lmangani commented Feb 7, 2024

lmangani commented Feb 8, 2024

lmangani commented Mar 19, 2024

mk-raven commented Mar 22, 2024

Metrics labels and values in loki logs #423

Metrics labels and values in loki logs #423

Comments

deathalt commented Jan 9, 2024

lmangani commented Jan 11, 2024

akvlad commented Jan 26, 2024

bcarlock-mycarrier commented Jan 29, 2024

akvlad commented Jan 29, 2024

bcarlock-mycarrier commented Jan 29, 2024

lmangani commented Jan 29, 2024

bcarlock-mycarrier commented Jan 29, 2024

bcarlock-mycarrier commented Jan 29, 2024

akvlad commented Jan 30, 2024

deathalt commented Jan 30, 2024

bcarlock-mycarrier commented Jan 30, 2024

lmangani commented Jan 30, 2024

bcarlock-mycarrier commented Jan 30, 2024

lmangani commented Jan 30, 2024

akvlad commented Jan 30, 2024

bcarlock-mycarrier commented Jan 30, 2024

akvlad commented Jan 30, 2024 • edited Loading

bcarlock-mycarrier commented Jan 30, 2024 • edited Loading

bcarlock-mycarrier commented Feb 7, 2024

lmangani commented Feb 7, 2024

lmangani commented Feb 8, 2024

lmangani commented Mar 19, 2024

mk-raven commented Mar 22, 2024

akvlad commented Jan 30, 2024 •

edited

Loading

bcarlock-mycarrier commented Jan 30, 2024 •

edited

Loading