-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stops send log after Loki Error 'entry out of order' (Http status 400) #49
Comments
Hi, @raphaelquati! As I could see here are 2 problems - problem with sink recovery & out of entry case. Will investigate bot cases Could you, please, describe me does this entry rejection happens for the same label sets of with the different? |
Hi. @mishamyte, It happens with different label sets. But one particular label always changes: "pod": https://gist.github.com/raphaelquati/0726f081a49e8d55e51314ef2ec86e3c |
@raphaelquati, could you also give me some more info - some logs with label Successfully delivered and failed one should be in that part |
Thank you, @raphaelquati! I have national holidays in my country, so I will return to investigation next week, after 25 of August |
Hi again @raphaelquati, Just did a little checks.
This behavior is designed and discussed in grafana/loki#3379, as you mentioned before. For this case I suggest using of a unique label for a pod instance.
|
Hi @raphaelquati, Good news. We have found a problem with logic, that creates bottleneck for log events. It is described in #52 I will provide a fix and I think, it could help in your case |
Hi, @raphaelquati! Just released v7.0.2 where entries, rejected by Loki are dropped from the queue. This helps sink to deliver next correct entries to Loki and fix a bottleneck with a memory. Hope, you will find this useful in your situation! |
Hi @mishamyte,
The label sets are unique because the "pod" label is always unique (created by Kubernetes).
I will update the code with the new version, and check the results. |
I'm testing now the new version, but I was thinking..... I'd realized that Promtail doesn't have the 'out of order' problem (like discussed in grafana/loki#3379) because it uses the timestamp injected by Kubernetes (when console log is captured by), and Promtail injects the application timestamp as a second label: "time" and "ts" have little different time values, but this guarantees the stream will never have 'out of order' logs. As Grafana.Loki sends directly to Loki, 'out of order' error can occur because of the multithreading nature of our application. In our case, the idea to have two timestamps is acceptable, because we don't want to lose any logline. |
I don't think it is possible via Loki for now. Why Loki team did not implemented this behavior is described here. Also allowing out of order is discussed in grafana/loki#1544. By the last info:
Also there is ordering inside a batch. But in general case I can't give a stable variants for this case. You could make a batch size smaller, so in this case a possibility of multipod concurrency could be decreased. But it's a weird and non-reliable variant as for me |
I understand. But the problem occurs with one pod only, as I wrote before.
Yes. But inside a batch, the batch is sorted, not the stream. If the next batch contains an "older" timestamp (in a high load situation - it's our case), the error 'out of order' will show at Loki. |
Yep, correct. Could you tell me how many events are logged per second? Just for interest |
Loki is reporting (one pod): Staging: Max 271 log lines per second |
Released in v7.1.0-beta.0 |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue. |
When our application has some load (a lot of broker messages to consume - using Rebus worker threads), the sink suddenly stops sending logs to Loki.
To investigate, we configured the application to have 2 sinks enabled: console and this sink, and Promtail configured to send console log to Loki.
We found this line on Loki log:
level=warn ts=2021-08-17T21:57:55.768341371Z caller=grpc_logging.go:38 method=/logproto.Pusher/Push duration=1.39409ms err="rpc error: code = Code(400) desc = entry with timestamp 2021-08-17 21:56:54.73 +0000 UTC ignored, reason: 'entry out of order' for stream: {SourceContext=\"Rebus.Retry.ErrorTracking.InMemErrorTracker\", app=\"myapp\", container=\"myapp\", errorNumber=\"1\", messageId=\"52xxb042-xxaa-xxxx-813a-31fd6810bb25\", namespace=\"default\", node_name=\"aks-agentpool\", pod=\"myapp-7bd5d4fbcb-gjvxm\"},\ntotal ignored: 1 out of 4" msg="gRPC\n"
After this error, the sink does not recover and stops sending logs to Loki.
A similar problem (entry out of order) is discussed here)
We are using version 6.0.1
The text was updated successfully, but these errors were encountered: