-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out-of-order errors with log streams from multiple services despite unique labels #898
Comments
Hey @laitalaj , Thanks for the script, I ran it on my machine and after debugging, it simply send entries out of order. For a given unique stream the max timestamp was 1566905035753341913 and the new entry that it was trying to add was 1566905035280495167. So you're simply sending out of order entries which is not possible in Loki. |
@cyriltovena , could you elaborate a bit on how you figured out that the entries are being sent out-of-order for a given unique stream? If that is indeed the case, there's some bug in my script, and I'd like to reproduce and fix that to do a clean reproduction of the problem in loki (as the issue still stands in the production setup that I described in the OP) |
I quickly built a lightweight flask app called lokimock that just receives stuff from lokitoy and checks that the timestamps for uniquely labeled streams are in order. That, at least, didn't catch any out-of-order stuff from the script. I also added an option for making lokitoy send stuff deliberately out-of-order and lokimock managed to catch the out-of-orders in that case. Therefore, to me it seems a lot like lokitoy is indeed sending stuff in order for all the unique label sets - maybe whatever made it seem to you like the stuff is out-of-order can shed some light on what's actually going on here, @cyriltovena |
@laitalaj I ran your script and debug Loki, I put a break point where this error happen and the timestamp received was before. |
It makes sense that when the error happens the timestamp shown as just received is before the timestamp received earlier. However, I'm fairly certain that the incoming streams with unique labels are in reality in order and that mix-ups happen in Loki instead of in the reproduction script:
Therefore, I think what's happening in loki should be investigated further, @cyriltovena . For example, what's different when the value of "app" label can be 'a', 'b', 'c', 'd', 'e' or 'f' (which results in no errors) than when the value of "app" label can be 'h', 'i', 'j', 'k', 'l' or 'm' (which results in out-of-order errors)? |
I will give it another look for sure. |
Are you waiting for a response before pushing a new batch ? If you don't wait for a response nothing stops one request to be processed before. |
@cyriltovena , yes. The log batch is sent at https://github.com/laitalaj/lokitoy/blob/master/lokitoy.py#L117 , using the requests library. Requests is blocking, so the thread waits for a response there until continuing (and lokitoy even checks the response code and content before starting to build the next batch) |
The loki plugin for fluentd, that we use in the production stack where we faced this problem originally, also seems to wait for a response correctly. |
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
@cyriltovena can we please re-open this issue? It seems to be quite relevant. @laitalaj thank you for the debugging work, and all the details you have provided here. Quite insightful, and indeed a bit frightening. How did you work around this problem in your production environment? @woodsaj since you are the author of #168 (comment) (which is challenged by this issue here) -- can you maybe chime in and assess the situation? Thank you! |
@jgehrcke , we added a third Fluentd instance, and that fixed the problems 😅 That's pretty strange, but consistent with the findings that only very specific label sets cause the problems 🤔 |
Sure but please add your details then, this a hard problem to solve so give us as much input as possible on how this is affecting you. |
@laitalaj Thanks a lot for providing simple Python script. It helped to locate an issue. Bug is in Loki, which uses very simple "FastFingerprint" hash function to generate a key to map of streams. You can probably see where this is going... this hash function is prone to collisions, and creates same the fingerprint for different set of labels. Some examples:
All these label sets are generated by your script, thanks! Loki effectively treats different label sets with same fingerprint as the same series, which then leads to out of order errors. /cc @cyriltovena |
Nice:-) I did the same investigation yesterday and landed on the same conclusions. Short/Small number of labels are especially prone to collision. Swapping FastFingerPrint for FingerPrint solves this issue.
|
There is still a chance for collisions though :) |
Under what conditions? |
Collisions are just a fact of life with hash functions (unless you use strong crypto hash functions, where collisions are difficult to find, but they also use much larger space of hash values). Proper solution needs to take that into account. |
I fully agree, but FingerPrint is used a lot in Prometheus and working fine for the case. FastFingerPrint on the other hand is clearly called out to be prone to collisions in the code (https://github.com/prometheus/common/blob/b5fe7d854c42dc7842e48d1ca58f60feae09d77b/model/labelset.go#L147). Do you have a better solution in mind? One could switch to use a big string of all key/values in the label but that is a lot of string comparisons. In the meantime simply switching to FingerPrint would allow an immediate solution until something better comes along (since right now it is clearly broken). |
Right now I plan to try to use https://github.com/cortexproject/cortex/blob/13a0639a1d5af4601a0ceca0c0fc96c748486f86/pkg/ingester/mapper.go#L25. Same code is used in Prometheus. |
Ah cool I did not know about the mapper. Thanks! |
I just learned about it hour ago myself :-) |
Great job finding the culprit @pstibrany @spahl , and good to hear that my little script proved useful! |
Uses slightly adapted fpMapper code from Cortex. Fixes issue #898 Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
+1 I faced the same issue while using Fluentbit as a daemonset with the configuration key |
Fluentbit uses the same client code as promtail and the last bit is on Loki side, so it should also cover fluentbit. |
) * pkg/ingester: handle labels mapping to the same fast fingerprint. Uses slightly adapted fpMapper code from Cortex. Fixes issue #898 Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * empty commit to force new build Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed empty lines in imports. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added test that pushes entries concurrently. To be run with -race. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Stream now keeps both original and mapped fingerprint Mapped fingerprint is used in streams map. Original ("rawFP") is preserved when saving chunks. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Vendoring Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Reverted previous commit (keep the test, with updated checks). Preserving raw fingerprints could lead to data loss when doing chunk deduplication while storing chunks. We don't want that. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * go mod vendor Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
fluent/fluent-bit#1746 seems related |
FWIW, I've had minor improvement on this situation (but still out of order errors after a period of time) by using the timekey buffer option with fluentd https://docs.fluentd.org/configuration/buffer-section#time. I'm currently looking at using namespace, container, and pod name + time as the buffer chunk keys so fluentd chunks are buffered per-source and time chunk:
This config produces a chunk per pod where each chunk contains 180 seconds of data for a given time frame, ensuring ordering, and it waits 300 seconds before flushing, allowing logs to arrive late before we submit to Loki. So far with these values it seems to be working. Also to note is I'm running fluent-bit -> fluentd aggregator -> loki. |
@laitalaj Sorry to necropost, but can you elaborate on how you add thread id as a label on your fluentd instances?
I assume you're running fluentd in multi-worker mode and setting the worker id as described in the loki plugin? Or have you found some way to include a buffer thread id when setting |
@shane-axiom , yeah, I meant worker ID as a label, exactly as described in the plugin readme. We haven't run into a situation where we'd need multiple flush threads per worker yet. |
I am using loki 2.2.1 and I still see this issue. |
Well, there's a real world scenario here where the error is perfectly legit: when one tries to add a sample (log entry) older than the newest one in the system, for the same log stream (same label set). Did you try to verify that in your case it isn't imply that that happened? :) |
Describe the bug
We are getting some "400 entry out of order for stream" -errors in our multiple-sending-hosts, multiple-sending-processes setup despite every host-process combination having unique labels.
According to this comment, we should be able to have multiple parallel log streams getting pushed into Loki as long as the labels for these streams differ.
Our setup currently consists of
In each log, we include both the hostname of the Fluentd instance and the thread ID as labels. Therefore, each parallel stream should have unique labels, and out-of-order errors shouldn't happen. That is not the case, though - we do get out-of-order errors when we use two hosts with two threads each. The errors don't happen for some strange reason, however, when using only one host with two threads (e.g. when skipping the load balancing).
As building a setup as complex as ours for reproducing this isn't feasible, I wrote a script that manages to reproduce a similar scenario locally. It uses just pure protobufs to send logs to
/api/prom/push
, so this doesn't seem to be a Fluentd/Fluentd plugin problem. See the chapter below for reproduction instructions.To Reproduce
Expected behavior
The script makes sure that each parallel logging process sends it's log stream in order, and that the log streams from each process have unique label sets. According to this comment, there should not be any out-of-order errors in this case.
Environment:
Screenshots, promtail config, or terminal output
Some error lines from the loki logs:
Some findings
I've poked around a bit with the lokitoy script, and catalogued some behavior in https://github.com/laitalaj/lokitoy#some-findings
The text was updated successfully, but these errors were encountered: