Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics labels and values in loki logs #423

Closed
deathalt opened this issue Jan 9, 2024 · 23 comments
Closed

Metrics labels and values in loki logs #423

deathalt opened this issue Jan 9, 2024 · 23 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@deathalt
Copy link
Contributor

deathalt commented Jan 9, 2024

Qryn Prometheus metrics duplicated in loki requests with labels and their values

image
image

it's not a problem when you can add a filter, such as {job="loki", level=~""} but if the log doesn't have a level, you have to come up with other ways to filter logs from metrics.
And then the number of labels starts to grow for loki as well.

@lmangani
Copy link
Collaborator

A change is upcoming to avoid this side effect. @akvlad will provide updates once ready

@lmangani lmangani added enhancement New feature or request help wanted Extra attention is needed labels Jan 11, 2024
@akvlad
Copy link
Collaborator

akvlad commented Jan 26, 2024

@deathalt the feature is inside the branch https://github.com/metrico/qryn/tree/feat_423 .
Merge and release will be done shortly.

lmangani pushed a commit that referenced this issue Jan 29, 2024
* alter tables;

* debug

* ingestion impl

* reading impl; reading test

* reading impl; reading test

* test if qryn works

* workflow fixes

* test fixes

* debug

* fix tests
@bcarlock-mycarrier
Copy link

Just reporting in- I updated our qryn instance to 3.1.5-bun and we're still seeing this issue.
image
image

@akvlad
Copy link
Collaborator

akvlad commented Jan 29, 2024

Should be fixed in 3.2.0

@bcarlock-mycarrier
Copy link

@akvlad I updated to 3.2.0 and we're still seeing metrics appear as logs.

@lmangani
Copy link
Collaborator

Thanks for the report @bcarlock-mycarrier could you provide some examples for a better understanding?

@bcarlock-mycarrier
Copy link

Sure:
deployment -
image
image

@bcarlock-mycarrier
Copy link

I'd be happy to provide more information if you can tell me what might be helpful. Right now we have two different qryn instances, one for metrics, one for logs in order to avoid this problem.

@akvlad
Copy link
Collaborator

akvlad commented Jan 30, 2024

@bcarlock-mycarrier @deathalt let me recap what was changed in 3.2.0.

The extra env config was added to qryn: DISTINGUISH_LOGS_METRICS=1.
After the environment is provided, the newly ingested information starts to be distinguished between logs and metrics.
All the existing information is still present in the both datasources.
Please recheck the config env vars for qryn.

I believe that it appeared somewhere in the new documentation. Will recheck and add if needed.
Sorry for the inconvenience.

@deathalt
Copy link
Contributor Author

@akvlad

I have the same behavior.

CLICKHOUSE_SERVER: chproxy.chproxy.svc
  CLICKHOUSE_PORT: 9090
  CLICKHOUSE_DB: qryn
  CLICKHOUSE_AUTH: default:
  CLICKHOUSE_PROTO: http
  CLICKHOUSE_TIMEFIELD: record_datetime
  CLUSTER_NAME: shard
  BULK_MAXAGE: 4000
  BULK_MAXSIZE: 10000000
  BULK_MAXCACHE: 100000
  LABELS_DAYS: 7
  SAMPLES_DAYS: 7
  HOST: 0.0.0.0.0
  PORT: 3100
  FASTIFY_BODYLIMIT: 5242880
  FASTIFY_REQUESTTIMEOUT: 30000
  FASTIFY_MAXREQUESTS: 0
  FASTIFY_METRICS: true
  NODE_OPTIONS: "--max-old-space-size=61440"
  DISTINGUISH_LOGS_METRICS: 1

fresh metrics are still in the logs.
image

@bcarlock-mycarrier
Copy link

DISTINGUISH_LOGS_METRICS=1 seems to have solved the problem for us. Which brings up the question, under what circumstance would you want this turned off? It doesn't seem like the sort of thing you'd usually put behind a feature flag.

@lmangani
Copy link
Collaborator

@bcarlock-mycarrier I think its safe to say once we're past the testing confirmations it will become a default

@bcarlock-mycarrier
Copy link

Makes sense. I'm not sure if this is related or not, but I'm seeing somewhat reduced query performance after enabling that.

@lmangani
Copy link
Collaborator

Interesting. @akvlad is any performance impact expected? or might this be due to the "dual-stack" support of both old and new types?

@akvlad
Copy link
Collaborator

akvlad commented Jan 30, 2024

The type support is an extra IN in terms of the sql.
IN is incredibly fast in clickhouse, like 100M comparisons a sec. But anyway it has its cost.

@bcarlock-mycarrier
Copy link

The most noticeable performance decrease is actually populating searchable labels and cross referencing them in grafana. Is there any caching possible? Here's a gif example. The gif below doesn't illustrate perfectly due to frame rate, but it should give you some idea. The initial load of the Metrics Browser link is much longer than it feels in the gif.
example

@akvlad
Copy link
Collaborator

akvlad commented Jan 30, 2024

@bcarlock-mycarrier I see.
How many time_series do you have?

SELECT count(distinct fingerprint) from time_series

And it would be great to get /series latency from you:
Please find the /api/v1/series request in the logs:

{"level":30,"time":1706633214417,"pid":19,"hostname":"hromozeka-PC","name":"qryn","reqId":"req-e","req":{"method":"GET","url":"/api/v1/series?start=1706629615&end=1706633215&match%5B%5D=system_cpu_time_seconds_total%7B%7D","hostname":"localhost:3100","remoteAddress":"127.0.0.1","remotePort":38574},"msg":"incoming request"}
{"level":30,"time":1706633214469,"pid":19,"hostname":"hromozeka-PC","name":"qryn","reqId":"req-e","res":{"statusCode":200},"responseTime":51.44048300012946,"msg":"request completed"}

It would be nice to see "responseTime":51.44048300012946
for the /api/v1/series requests.

Or if you have nginx or something in front, please share the timings.

@bcarlock-mycarrier
Copy link

bcarlock-mycarrier commented Jan 30, 2024

We just turned this back on yesterday. The current count is 232,913, so not a ton honestly.

{"level":30,"time":1706633940075,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1i","req":{"method":"GET","url":"/api/v1/label/__name__/values?start=1706612340&end=1706634000","hostname":"qryn-metrics-reader.monitoring:3100","remoteAddress":"10.2.2.247","remotePort":34502},"msg":"incoming request"}
{"level":20,"time":1706633940075,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1i","msg":"GET /api/prom/label/__name__/values"}
{"level":30,"time":1706633940779,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1i","res":{"statusCode":200},"responseTime":703.5259080082178,"msg":"request completed"}
{"level":30,"time":1706633942002,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1j","req":{"method":"GET","url":"/api/v1/labels?start=1706612340&end=1706634000","hostname":"qryn-metrics-reader.monitoring:3100","remoteAddress":"10.2.2.247","remotePort":34502},"msg":"incoming request"}
{"level":20,"time":1706633942003,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1j","msg":"GET /loki/api/v1/label"}
SELECT DISTINCT key FROM time_series_gin WHERE date >= toDate(FROM_UNIXTIME(intDiv(1706612340000000000, 1000000000))) AND date <= toDate(FROM_UNIXTIME(intDiv(1706634000000000000, 1000000000))) AND type IN (0,0) FORMAT JSON
{"level":30,"time":1706633942393,"pid":24,"hostname":"qryn-metrics-reader-9977db669-72jv9","name":"qryn","reqId":"req-1j","res":{"statusCode":200},"responseTime":390.80939300358295,"msg":"request completed"}

@bcarlock-mycarrier
Copy link

I have a quick addition to this issue. Since updating and enabling DISTINGUISH_LOGS_METRICS we've also noticed some strangeness related to the Loki interface label correlation. The first label selected is always a label from logging. The second label is always what appears to be a metric tag. The third label tends to be a log label again.
labelissues

@lmangani
Copy link
Collaborator

lmangani commented Feb 7, 2024

Thanks for the report @bcarlock-mycarrier 👍
@akvlad and the team will investigate and provide feedback

@lmangani
Copy link
Collaborator

lmangani commented Feb 8, 2024

@bcarlock-mycarrier we cannot reproduce this condition. Please open a separate issue with all the case details. If you're running multiple instances or multiple readers/writers make sure they are all featuring the same settings.

@lmangani
Copy link
Collaborator

This should be resolved in the latest releases. Please reopen if the issue is still relevant.

@mk-raven
Copy link

The problem was stayed with metrics labels in logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants