-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid triggering startReplProducer on newAddProducer as it may flips replicator state wrongly #232
Conversation
…replicator state wrongly
@rdhabalia But we need to resume the replicator it was stopped. Replicator will be paused when no one is connected and there are no local subscriptions. Now, whenever a new producer comes in, we need to restart the replicator. How would that work with this patch? |
Aren't we starting the replicator on topic loading. And due to inactivity when gc closes the replicator, it also deletes the topic. So, next time when producer/consumer will be connected, it forces topic to be loaded and that would also start the replicator. I think GC is the only usecase where broker pause the replicator. |
The replicator will be paused and the topic might not be deleted (eg: an incoming replicator is still connected and keeps the topic alive) |
Change lgtm. Should we have at least a test that adds and removes a cluster from the list multiple times to stress the race condition? |
Actually, this change will still create the same issue. if someone(addProducer) calls I think fundamental issue is: removeReplicator() disconnects producers and removes replicator from cache (replicators) asynchronously and it's not atomic. So, if someone(addProducer) tries to startReplProducer in between it then it will create inconsistency in behavior. So, I will add test-case if change looks fine. |
44a1c09
to
69d84e1
Compare
Added testcase to verify this race-condition. |
LGTM 👍 @merlimat Can you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
…replicator state wrongly (#232) * Avoid triggering startReplProducer on newAddProducer as it may flips replicator state wrongly * Signal replicator is stopping if porducer is not created yet * read repl-cluster from policies to avoid restart of closing-replicator
…ting a lot of requests. (apache#232) KafkaHeaderAndRequest contains native memory references. When getting a lot of requests from kafka client, for example 10 million per seconds, those redundance references will stop gc collection and memory get increased. It would be tricky to provide unit test for this patch and this pr seems to be a minor change, so I didn't write test case for it.
Motivation
On Race condition when
replicationCluster
in namespace-policy gets changed multiple times subsequently (repl-cluster added then removed) : replication producer couldn't close successfully.replicators
and it tries to createProducer
async.Producer
created repl-cluster deleted: It closes the cursor and flips replicator'sstate=stopped
asProducer
is not created yet and when producer will be created it will be closed if state is stoppedstarting
and when producer will be created in step-2 it gets following exception:Error reading entries at null. Retrying to read in 67.938s. (Cursor was already closed) (7)
Modifications
Broker should not try to startReplicator again on newProducer-creation as it flips the state of replicator.
Result
It will avoid flipping up replicator's state wrongly and replicator's producer can be closed on replicator removal.