Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entry out of order for stream #16

Closed
dippynark opened this issue Aug 16, 2019 · 6 comments
Closed

Entry out of order for stream #16

dippynark opened this issue Aug 16, 2019 · 6 comments
Labels
help wanted Extra attention is needed kubernetes

Comments

@dippynark
Copy link

dippynark commented Aug 16, 2019

Problem

Fluent-bit appears to be sending log messages out of order and Loki is rejecting them - finding it hard to narrow down the source of the issue. There are lots of rejections when fluent-bit starts up and they continue but settle down quickly. I built the fluent-bit-go-loki library from master which could have an impact - see commit hash in td-agent-bit output below.

Maybe related:

Steps to replicate

Fluent-bit conf

[SERVICE]
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_PORT    2020

[INPUT]
    Name  systemd
    Tag   host.*

[OUTPUT]
    Name   loki
    Match  *
    Url    http://loki.monitoring.svc.cluster.local:3100/api/prom/push

Loki conf

auth_enabled: false
chunk_store_config:
  max_look_back_period: 0
ingester:
  chunk_block_size: 262144
  chunk_idle_period: 15m
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
schema_config:
  configs:
  - from: "2018-04-15"
    index:
      period: 168h
      prefix: index_
    object_store: filesystem
    schema: v9
    store: boltdb
server:
  http_listen_port: 3100
storage_config:
  boltdb:
    directory: /data/loki/index
  filesystem:
    directory: /data/loki/chunks
table_manager:
  retention_deletes_enabled: true
  retention_period: 336h

Expected Behavior or What you need to ask

No out of order messages - is it possible to see which logs are being rejected? I change log level to trace and there was no obvious extra relevant info.

Using Fluentd and loki plugin version

  • OS version
# uname -a
Linux pirate0 4.19.58-v7+ #1245 SMP Fri Jul 12 17:25:51 BST 2019 armv7l GNU/Linux
# cat /etc/issue
HypriotOS (Debian GNU/Linux 10) \n \l
  • Bare Metal or within Docker or Kubernetes or others? td-agent-bit is running on bare metal ( Raspberry Pi 3) and Loki/Grafana are running in Kubernetes, although I am also running fluent-bit-go-loki on Kubernetes for Pod logs and seeing the same output.
  • fluent-bit version
    • paste result of fluent-bit --version or td-agent-bin --version
# td-agent-bit --version
Fluent Bit v1.2.2
  • fluent-bit-go-loki version:
    • paste boot log of fluent-bit or td-agent-bit
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: Fluent Bit v1.2.2
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: Copyright (C) Treasure Data
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: [2019/08/16 09:39:26] [ info] [storage] initializing...
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: [2019/08/16 09:39:26] [ info] [storage] in-memory
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: [2019/08/16 09:39:26] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: [2019/08/16 09:39:26] [ info] [engine] started (pid=3538)
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: level=info caller=out_loki.go:84 [flb-go]="Starting fluent-bit-go-loki" version="(version=0.2.0, branch=master, revision=93f214b)"
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: level=info caller=out_loki.go:85 [flb-go]="provided parameter" URL=http://loki.monitoring.svc.cluster.local:3100/api/prom/push
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: level=info caller=out_loki.go:86 [flb-go]="provided parameter" BatchWait=
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: level=info caller=out_loki.go:87 [flb-go]="provided parameter" BatchSize=
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: level=info caller=out_loki.go:88 [flb-go]="provided parameter" Labels=
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: level=info caller=out_loki.go:89 [flb-go]="provided parameter" LogLevel=
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: [2019/08/16 09:39:26] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
Aug 16 09:39:26 pirate0 td-agent-bit[3538]: [2019/08/16 09:39:26] [ info] [sp] stream processor started
Aug 16 09:39:31 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:31+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:31 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:31+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:31 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:31+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:31 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:31+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:31 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:31+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:31 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:31+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:31 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:31+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:31 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:31+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:31 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:31+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:31 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:31+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:32 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:32+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:32 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:32+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:32 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:32+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:32 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:32+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:32 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:32+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:32 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:32+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:33 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:33+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:37 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:37+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:42 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:42+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
Aug 16 09:39:47 pirate0 td-agent-bit[3538]: time="2019-08-16T09:39:47+01:00" level=info component=client error="server returned HTTP status 400 Bad Request (400): entry out of order for stream: {job=\"fluent-bit\"}" fields.level=error fields.msg="final error sending batch" host="loki.monitoring.svc.cluster.local:3100" status=400
  • Grafana and Loki version: Grafana 6.2.5, Loki 0.2.0
@cosmo0920
Copy link
Owner

cosmo0920 commented Aug 16, 2019

is it possible to see which logs are being rejected?

No, it isn't. Because Go loki client library makes chunks which are grouped events and it doesn't provide failure record confirmation interface or function.

@cosmo0920 cosmo0920 added help wanted Extra attention is needed kubernetes labels Aug 16, 2019
@dippynark
Copy link
Author

@cosmo0920 I think the error can be explained by this: grafana/loki#168 (comment)

I am running multiple instances of fluent-bit but they are given the same labels as seen in the logs above - I guess this plugin should add some unique source label as per the comment, although an instance label seems more sensible IMO to be closer to what Prometheus is doing.

@dippynark
Copy link
Author

Just seen that you already support extra labels - I gave my instances unique ones and there are still the same errors on startup (about 10) but after that there are no more (so far at least)

@cosmo0920
Copy link
Owner

Hmm..., it seems that this issue is caused by Loki itself....
I've also found that the similar issue: grafana/loki#898

@chancez
Copy link

chancez commented Dec 19, 2019

Potentially related: fluent/fluent-bit#1746

@cosmo0920
Copy link
Owner

This repository is deprecated.
Use grafana/loki's fluent-bit plugin instead.
Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed kubernetes
Projects
None yet
Development

No branches or pull requests

3 participants