-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug with at-least-once delivery #10146
Comments
As far as I know, Vector for now only provides Further down below in that same link:
But, let's wait for a Vector developer to chime-in as this information might be outdated. |
@hhromic , Yeah, thanks for paying attention, but this bug is not even fit for "at-least-once" paradigm too. Because In our case vector, is still dropping some messages. |
@valerypetrov have you tested this with |
This is accurate in that partial successes will result in Vector dropping any of the errored events. There is some discussion around this in #140 . |
I've tried the acknowledgments option and still no luck. Here are too many duplicates:
|
This does seem related to mishandling of bulk acknowledgement errors. Recently a request_retry_partial option was added to the I'll close this as the above seems very likely to have been the issue, but feel free to reopen if this behavior is still being observed even when setting a unique |
Intro
Hi there, during the vector evaluation we've faced with an issue with exactly-once delivery
We have the following pipeline:
We've deployed the vector to consume the same data with the same rules of processing(we are comparing it with the logstash):
We found, that sometimes vector could produce duplicates or drop the data during the usual processing( in comparison of logstash).
For testing purposes were created a script to generate the data and send it to the producers. Look at the Elasticsearch index stat:
Look at the document amount of index sjc06-c01-logs-ttp-application-2021.11.22 and index sjc06-c01-vector-logs-ttp-application-2021.11.22 , the total diff is 8262710-8259232=3478 .
It looks like that vector has dropped 3478 documents.
If we take a look for 2021.11.23 indices:
Vector generated 9652819-9652441=378 , duplicates.
Validation
Furthermore, we've created a python script to get data from the elasticsearch per logstash and vector indices. We've made the diff between the indices per document. So, the conclusion that vector is really in some cases may produce duplicates, and in other cases - drop the messages.
I'm assuming that this case due to some issues on the ELK side, and the vector is not able to handle bulk exceptions or something like that - correctly.
Vector Version
Vector Configuration File
Expected Behavior
We are expecting the exactly-once delivery with kafka & vector.
Actual Behavior
There is no possibility to get exactly once-delivery with usage of vector & kafka.
Example Data
There is no matter which data is used to be processed. The issue with exactly-once delivery occurs on each component that we are covering for log shipping.
But here is the example data that was generated to get the truly results:
The text was updated successfully, but these errors were encountered: