How to avoid duplicate writes between Kafka cluster re-start/Producer process re-start #4311
Unanswered
aKumara123
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
A Linux C++ application is used for writing to Kafka (using librdkafka).
I have to write a set of messages sequentially(orderly without gaps) to a Kafka topic. There is no transaction requirement, hence, transactional producer is NOT used.
enable.idempotence=true is used.
When a Kafka fatal error is detected : (librdkafka) Producer object is deleted -> wait for 15 seconds -> create a new Consumer object and a new Producer object -> read(using Consumer object) last written message to the topic -> resume writing from last written msg onwards [if any of these actions fail, re-try whole thing after 15 seconds]
On Producer (Linux)Process re-start scenario : create a new Consumer object and a new Producer object -> read(using Consumer object) last written message to the topic -> resume writing from last written msg onwards [if any of these actions fail, re-try whole thing after 15 seconds]
assume below scenario : Need to write ,
|msg with Internal sequence-100||msg with Internal sequence-101||msg with Internal sequence-102|msg with Internal sequence-103|...
messages to the Producer (i.e. produce called)
detected by producer application (due to delivery timeout) and above
mentioned recovery procedure("When a Kafka fatal error is detected")
performed continuously.
How can I avoid duplicate messages in this case ?
One solution is to insert a special msg (which should be ignored by kafka readers) before reading and make sure(if not re-try) it is returned as the last msg (so that internal queue flushed assumption can be taken). But this requires asking from kafka readers to ignore the "special" msgs, which is NOT preffered.
Can Transactional producer be used to avoid this ? if it is a solution, (we need to support very high message rates like 50,000 msgs per second.) Can transactional producer support high message rates like 50,000 msgs per second ? Is commit transaction (in librdkafka) a blocking call(i.e. calling thread blocked until cluster acknowledgement) ?
Are there any other solutions to avoid the duplication ?
Beta Was this translation helpful? Give feedback.
All reactions