Out-of-order errors with log streams from multiple services despite unique labels #898

laitalaj · 2019-08-15T08:52:45Z

Describe the bug
We are getting some "400 entry out of order for stream" -errors in our multiple-sending-hosts, multiple-sending-processes setup despite every host-process combination having unique labels.

According to this comment, we should be able to have multiple parallel log streams getting pushed into Loki as long as the labels for these streams differ.

Our setup currently consists of

Fluent Bit agents on our servers, forwarding logs to a load balancer
the load balancer forwarding the logs to two Fluentd instances
each Fluentd instance using two threads to process the logs and forward them to Loki using fluent-plugin-grafana-loki

In each log, we include both the hostname of the Fluentd instance and the thread ID as labels. Therefore, each parallel stream should have unique labels, and out-of-order errors shouldn't happen. That is not the case, though - we do get out-of-order errors when we use two hosts with two threads each. The errors don't happen for some strange reason, however, when using only one host with two threads (e.g. when skipping the load balancing).

As building a setup as complex as ours for reproducing this isn't feasible, I wrote a script that manages to reproduce a similar scenario locally. It uses just pure protobufs to send logs to /api/prom/push, so this doesn't seem to be a Fluentd/Fluentd plugin problem. See the chapter below for reproduction instructions.

To Reproduce

Start the loki/production stack with docker-compose (loki version: master-78c6aa49, the other parts of the stack are not used and not relevant)
Run the lokitoy script, found at https://github.com/laitalaj/lokitoy , according to the instructions in the README
Notice the errors in loki logs and in the script output

Expected behavior
The script makes sure that each parallel logging process sends it's log stream in order, and that the log streams from each process have unique label sets. According to this comment, there should not be any out-of-order errors in this case.

Environment:

Production: Fluentd with Loki plugin, forwarding to a Loki running on Docker in a cloud hosting service
Local reproduction: Python script sending mock logs to a Loki running on Docker in localhost

Screenshots, promtail config, or terminal output
Some error lines from the loki logs:

loki_1      | level=warn ts=2019-08-15T08:16:09.0429259Z caller=grpc_logging.go:37 method=/logproto.Pusher/Push duration=659.1µs err="rpc error: code = Code(400) desc = entry out of order for stream: {app=\"m\", uniq0=\"0\", uniq1=\"0\"}" msg="gRPC\n"
loki_1      | level=warn ts=2019-08-15T08:16:10.5270068Z caller=grpc_logging.go:37 duration=289.4µs method=/logproto.Pusher/Push err="rpc error: code = Code(400) desc = entry out of order for stream: {app=\"l\", uniq0=\"1\", uniq1=\"1\"}" msg="gRPC\n"

Some findings
I've poked around a bit with the lokitoy script, and catalogued some behavior in https://github.com/laitalaj/lokitoy#some-findings

The text was updated successfully, but these errors were encountered:

cyriltovena · 2019-08-27T11:29:39Z

Hey @laitalaj ,

Thanks for the script, I ran it on my machine and after debugging, it simply send entries out of order.

For a given unique stream the max timestamp was 1566905035753341913 and the new entry that it was trying to add was 1566905035280495167.

So you're simply sending out of order entries which is not possible in Loki.

laitalaj · 2019-08-27T13:24:49Z

@cyriltovena , could you elaborate a bit on how you figured out that the entries are being sent out-of-order for a given unique stream? If that is indeed the case, there's some bug in my script, and I'd like to reproduce and fix that to do a clean reproduction of the problem in loki (as the issue still stands in the production setup that I described in the OP)

laitalaj · 2019-08-27T14:35:04Z

I quickly built a lightweight flask app called lokimock that just receives stuff from lokitoy and checks that the timestamps for uniquely labeled streams are in order. That, at least, didn't catch any out-of-order stuff from the script. I also added an option for making lokitoy send stuff deliberately out-of-order and lokimock managed to catch the out-of-orders in that case.

Therefore, to me it seems a lot like lokitoy is indeed sending stuff in order for all the unique label sets - maybe whatever made it seem to you like the stuff is out-of-order can shed some light on what's actually going on here, @cyriltovena

cyriltovena · 2019-08-28T06:56:08Z

@laitalaj I ran your script and debug Loki, I put a break point where this error happen and the timestamp received was before.

laitalaj · 2019-08-28T07:33:41Z

It makes sense that when the error happens the timestamp shown as just received is before the timestamp received earlier. However, I'm fairly certain that the incoming streams with unique labels are in reality in order and that mix-ups happen in Loki instead of in the reproduction script:

That lokimock thing that I created yesterday would notice if the script was sending stuff out-of-order
As detailed in the findings section of lokitoy's readme, changing the value of "shared_labels" in lokitoy to, for example, { "app": ['a', 'b', 'c', 'd', 'e', 'f'] }, prevents the errors from happening in loki. If the script was actually sending stuff out of order, there would be out-of-order errors regardless of the shared labels available.

Therefore, I think what's happening in loki should be investigated further, @cyriltovena . For example, what's different when the value of "app" label can be 'a', 'b', 'c', 'd', 'e' or 'f' (which results in no errors) than when the value of "app" label can be 'h', 'i', 'j', 'k', 'l' or 'm' (which results in out-of-order errors)?

cyriltovena · 2019-08-28T13:28:53Z

I will give it another look for sure.

cyriltovena · 2019-09-11T16:01:51Z

Are you waiting for a response before pushing a new batch ? If you don't wait for a response nothing stops one request to be processed before.

laitalaj · 2019-09-12T09:53:36Z

@cyriltovena , yes. The log batch is sent at https://github.com/laitalaj/lokitoy/blob/master/lokitoy.py#L117 , using the requests library. Requests is blocking, so the thread waits for a response there until continuing (and lokitoy even checks the response code and content before starting to build the next batch)

laitalaj · 2019-09-12T09:56:26Z

The loki plugin for fluentd, that we use in the production stack where we faced this problem originally, also seems to wait for a response correctly.

stale · 2019-10-26T13:40:36Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

jgehrcke · 2019-11-06T14:04:23Z

@cyriltovena can we please re-open this issue? It seems to be quite relevant.

@laitalaj thank you for the debugging work, and all the details you have provided here. Quite insightful, and indeed a bit frightening. How did you work around this problem in your production environment?

@woodsaj since you are the author of #168 (comment) (which is challenged by this issue here) -- can you maybe chime in and assess the situation? Thank you!

laitalaj · 2019-11-07T11:38:42Z

@jgehrcke , we added a third Fluentd instance, and that fixed the problems 😅 That's pretty strange, but consistent with the findings that only very specific label sets cause the problems 🤔

cyriltovena · 2019-11-07T12:26:27Z

Sure but please add your details then, this a hard problem to solve so give us as much input as possible on how this is affecting you.

pstibrany · 2019-11-11T07:57:48Z

@laitalaj Thanks a lot for providing simple Python script. It helped to locate an issue.

Bug is in Loki, which uses very simple "FastFingerprint" hash function to generate a key to map of streams. You can probably see where this is going... this hash function is prone to collisions, and creates same the fingerprint for different set of labels.

Some examples:

Both (app=l,uniq0=0,uniq1=1) and (app=m,uniq0=1,uniq1=1) have fingerprint e002a3a451262627
Both (app=l,uniq0=1,uniq1=0) and (app=m,uniq0=0,uniq1=0) hash to e002a3a451262247
Both (app=l,uniq0=0,uniq1=0) and (app=m,uniq0=1,uniq1=0) hash to e002a2a4512624f4

All these label sets are generated by your script, thanks!

Loki effectively treats different label sets with same fingerprint as the same series, which then leads to out of order errors.

/cc @cyriltovena

spahl · 2019-11-11T16:23:39Z

Nice:-) I did the same investigation yesterday and landed on the same conclusions. Short/Small number of labels are especially prone to collision.

Swapping FastFingerPrint for FingerPrint solves this issue.

diff --git a/pkg/ingester/instance.go b/pkg/ingester/instance.go
index a10fe6e9..76a8e123 100644
--- a/pkg/ingester/instance.go
+++ b/pkg/ingester/instance.go
@@ -86,7 +86,7 @@ func (i *instance) consumeChunk(ctx context.Context, labels []client.LabelAdapte
     i.streamsMtx.Lock()
     defer i.streamsMtx.Unlock()
 
-    fp := client.FastFingerprint(labels)
+    fp := client.Fingerprint(client.FromLabelAdaptersToLabels(labels))
     stream, ok := i.streams[fp]
     if !ok {
         stream = newStream(fp, labels, i.blockSize)
@@ -136,7 +136,7 @@ func (i *instance) Push(ctx context.Context, req *logproto.PushRequest) error {
 }
 
 func (i *instance) getOrCreateStream(labels []client.LabelAdapter) (*stream, error) {
-    fp := client.FastFingerprint(labels)
+    fp := client.Fingerprint(client.FromLabelAdaptersToLabels(labels))
 
     stream, ok := i.streams[fp]
     if ok {

pstibrany · 2019-11-11T16:25:12Z

Swapping FastFingerPrint for FingerPrint solves this issue.

There is still a chance for collisions though :)

spahl · 2019-11-11T16:29:42Z

There is still a chance for collisions though :)

Under what conditions?

pstibrany · 2019-11-11T16:34:07Z

There is still a chance for collisions though :)

Under what conditions?

Collisions are just a fact of life with hash functions (unless you use strong crypto hash functions, where collisions are difficult to find, but they also use much larger space of hash values). Proper solution needs to take that into account.

spahl · 2019-11-11T16:44:06Z

I fully agree, but FingerPrint is used a lot in Prometheus and working fine for the case. FastFingerPrint on the other hand is clearly called out to be prone to collisions in the code (https://github.com/prometheus/common/blob/b5fe7d854c42dc7842e48d1ca58f60feae09d77b/model/labelset.go#L147).

Do you have a better solution in mind? One could switch to use a big string of all key/values in the label but that is a lot of string comparisons.

In the meantime simply switching to FingerPrint would allow an immediate solution until something better comes along (since right now it is clearly broken).

pstibrany · 2019-11-11T16:47:37Z

Right now I plan to try to use https://github.com/cortexproject/cortex/blob/13a0639a1d5af4601a0ceca0c0fc96c748486f86/pkg/ingester/mapper.go#L25. Same code is used in Prometheus.

spahl · 2019-11-11T16:54:58Z

Ah cool I did not know about the mapper. Thanks!

pstibrany · 2019-11-11T16:57:06Z

Ah cool I did not know about the mapper. Thanks!

I just learned about it hour ago myself :-)

laitalaj · 2019-11-12T09:13:47Z

Great job finding the culprit @pstibrany @spahl , and good to hear that my little script proved useful!

Uses slightly adapted fpMapper code from Cortex. Fixes issue #898 Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

mr-karan · 2019-12-02T17:05:57Z

+1 I faced the same issue while using Fluentbit as a daemonset with the configuration key AutoKubernetesLabels set to true. Is there a fix planned for this?

cyriltovena · 2019-12-02T17:08:14Z

Fluentbit uses the same client code as promtail and the last bit is on Loki side, so it should also cover fluentbit.

) * pkg/ingester: handle labels mapping to the same fast fingerprint. Uses slightly adapted fpMapper code from Cortex. Fixes issue #898 Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * empty commit to force new build Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed empty lines in imports. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added test that pushes entries concurrently. To be run with -race. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Stream now keeps both original and mapped fingerprint Mapped fingerprint is used in streams map. Original ("rawFP") is preserved when saving chunks. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Vendoring Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Reverted previous commit (keep the test, with updated checks). Preserving raw fingerprints could lead to data loss when doing chunk deduplication while storing chunks. We don't want that. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * go mod vendor Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

chancez · 2019-12-19T19:42:41Z

fluent/fluent-bit#1746 seems related

chancez · 2019-12-19T23:27:17Z

FWIW, I've had minor improvement on this situation (but still out of order errors after a period of time) by using the timekey buffer option with fluentd https://docs.fluentd.org/configuration/buffer-section#time.

I'm currently looking at using namespace, container, and pod name + time as the buffer chunk keys so fluentd chunks are buffered per-source and time chunk:

<match kube.**>
  @type loki
  url "http://loki.loki:3100"
  extra_labels {"env":"dev"}

  <label>
    fluentd_worker
    container $.kubernetes.container_name
    namespace $.kubernetes.namespace_name
    pod $.kubernetes.pod_name
  </label>

  <buffer time, tag, $.kubernetes.container_name, $.kubernetes.namespace_name, $.kubernetes.pod_name, fluentd_worker>
    timekey 180
    timekey_wait 300
    chunk_limit_size 60MB
  </buffer>

</match>

This config produces a chunk per pod where each chunk contains 180 seconds of data for a given time frame, ensuring ordering, and it waits 300 seconds before flushing, allowing logs to arrive late before we submit to Loki.

So far with these values it seems to be working.

Also to note is I'm running fluent-bit -> fluentd aggregator -> loki.

srstsavage · 2020-01-09T22:23:15Z

@laitalaj Sorry to necropost, but can you elaborate on how you add thread id as a label on your fluentd instances?

In each log, we include both the hostname of the Fluentd instance and the thread ID as labels.

I assume you're running fluentd in multi-worker mode and setting the worker id as described in the loki plugin? Or have you found some way to include a buffer thread id when setting flush_thread_count > 1 in the buffer section? I'm searching for the latter and haven't found a way.

laitalaj · 2020-01-10T09:40:18Z

@shane-axiom , yeah, I meant worker ID as a label, exactly as described in the plugin readme. We haven't run into a situation where we'd need multiple flush threads per worker yet.

maxisam · 2021-07-28T21:46:39Z

I am using loki 2.2.1 and I still see this issue.

jgehrcke · 2021-08-17T12:43:43Z

I am using loki 2.2.1 and I still see this issue.

Well, there's a real world scenario here where the error is perfectly legit: when one tries to add a sample (log entry) older than the newest one in the system, for the same log stream (same label set). Did you try to verify that in your case it isn't imply that that happened? :)

cosmo0920 mentioned this issue Aug 27, 2019

Entry out of order for stream cosmo0920/fluent-bit-go-loki#16

Closed

cyriltovena added the hacktoberfest label Sep 26, 2019

stale bot added the stale A stale issue or PR that will automatically be closed. label Oct 26, 2019

stale bot closed this as completed Nov 2, 2019

cyriltovena reopened this Nov 7, 2019

stale bot removed the stale A stale issue or PR that will automatically be closed. label Nov 7, 2019

pstibrany mentioned this issue Nov 12, 2019

pkg/ingester: handle labels mapping to the same fast fingerprint. #1247

Merged

2 tasks

cyriltovena removed the hacktoberfest label Nov 12, 2019

pstibrany added a commit that referenced this issue Nov 12, 2019

pkg/ingester: handle labels mapping to the same fast fingerprint.

b3d13c7

Uses slightly adapted fpMapper code from Cortex. Fixes issue #898 Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

cyriltovena closed this as completed in #1247 Dec 5, 2019

This was referenced Dec 20, 2019

tail plugin sends messages out of order fluent/fluent-bit#1746

Closed

400 Bad Request sample for 'logs' has timestamp too new #1384

Closed

anish-lal mentioned this issue Jan 3, 2020

Out of order errors with unique label set even after upgrading to v1.2 #1464

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out-of-order errors with log streams from multiple services despite unique labels #898

Out-of-order errors with log streams from multiple services despite unique labels #898

laitalaj commented Aug 15, 2019

cyriltovena commented Aug 27, 2019

laitalaj commented Aug 27, 2019

laitalaj commented Aug 27, 2019

cyriltovena commented Aug 28, 2019

laitalaj commented Aug 28, 2019

cyriltovena commented Aug 28, 2019

cyriltovena commented Sep 11, 2019

laitalaj commented Sep 12, 2019

laitalaj commented Sep 12, 2019

stale bot commented Oct 26, 2019

jgehrcke commented Nov 6, 2019 •

edited

Loading

laitalaj commented Nov 7, 2019

cyriltovena commented Nov 7, 2019

pstibrany commented Nov 11, 2019 •

edited

Loading

spahl commented Nov 11, 2019 •

edited

Loading

pstibrany commented Nov 11, 2019

spahl commented Nov 11, 2019

pstibrany commented Nov 11, 2019

spahl commented Nov 11, 2019

pstibrany commented Nov 11, 2019

spahl commented Nov 11, 2019

pstibrany commented Nov 11, 2019

laitalaj commented Nov 12, 2019

mr-karan commented Dec 2, 2019

cyriltovena commented Dec 2, 2019 •

edited

Loading

chancez commented Dec 19, 2019

chancez commented Dec 19, 2019 •

edited

Loading

srstsavage commented Jan 9, 2020 •

edited

Loading

laitalaj commented Jan 10, 2020

maxisam commented Jul 28, 2021

jgehrcke commented Aug 17, 2021

Out-of-order errors with log streams from multiple services despite unique labels #898

Out-of-order errors with log streams from multiple services despite unique labels #898

Comments

laitalaj commented Aug 15, 2019

cyriltovena commented Aug 27, 2019

laitalaj commented Aug 27, 2019

laitalaj commented Aug 27, 2019

cyriltovena commented Aug 28, 2019

laitalaj commented Aug 28, 2019

cyriltovena commented Aug 28, 2019

cyriltovena commented Sep 11, 2019

laitalaj commented Sep 12, 2019

laitalaj commented Sep 12, 2019

stale bot commented Oct 26, 2019

jgehrcke commented Nov 6, 2019 • edited Loading

laitalaj commented Nov 7, 2019

cyriltovena commented Nov 7, 2019

pstibrany commented Nov 11, 2019 • edited Loading

spahl commented Nov 11, 2019 • edited Loading

pstibrany commented Nov 11, 2019

spahl commented Nov 11, 2019

pstibrany commented Nov 11, 2019

spahl commented Nov 11, 2019

pstibrany commented Nov 11, 2019

spahl commented Nov 11, 2019

pstibrany commented Nov 11, 2019

laitalaj commented Nov 12, 2019

mr-karan commented Dec 2, 2019

cyriltovena commented Dec 2, 2019 • edited Loading

chancez commented Dec 19, 2019

chancez commented Dec 19, 2019 • edited Loading

srstsavage commented Jan 9, 2020 • edited Loading

laitalaj commented Jan 10, 2020

maxisam commented Jul 28, 2021

jgehrcke commented Aug 17, 2021

jgehrcke commented Nov 6, 2019 •

edited

Loading

pstibrany commented Nov 11, 2019 •

edited

Loading

spahl commented Nov 11, 2019 •

edited

Loading

cyriltovena commented Dec 2, 2019 •

edited

Loading

chancez commented Dec 19, 2019 •

edited

Loading

srstsavage commented Jan 9, 2020 •

edited

Loading