-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix possible deadlock when AckWithResponse is true due to queueCh is full #1310
Fix possible deadlock when AckWithResponse is true due to queueCh is full #1310
Conversation
0dce2a5
to
5f0e32d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this PR because it was not in a draft state when I started.
I wanted to comment also that perhaps using two channels queueInCh
and queueOutCh
might help make the code more understandable and easier to maintain.
What it's being proposed here is to use a more complex object with an embedded list, 3 channels and its own go routine running in the background.
13d3c59
to
dcd2b4a
Compare
Hi @fracasula, thanks for your comments and I've addressed all of them. You can check out the design from updated Modifications part in the PR description. @shibd @RobertIndie Please also take a look. |
hi, @BewareMyPower I put our discuss on here. Before #1283, There won't be the issue mentioned in this PR because we can ensure that the pulled messages will never overflow Furthermore, the |
I have revert #1283 and copy unit test from this PR: #1311 This is an alternative solution to the issue. The current PR offers a method similar to an The PR #1311 maintains the initial behavior, using We can compare which approach is more user-friendly. From my perspective, #1311 is simpler and the code is more readable. |
### Motivation For the issue please refer to the PR description: #1310 Here have an analysis: #1310 (comment) ### Modifications - Rever #1283 - Add test from #1310 to cover `queueCh` never not full.
Motivation
When
AckWithResponse
is enabled, a deadlock could happen easily ifqueueCh
andmessagesCh
ofpartitionConsumer
are full. In this case, if the consumer receives new messages from broker,MessageReceived
will be blocked atpulsar-client-go/pulsar/consumer_partition.go
Line 1376 in 9366a0e
The stacks could be:
As shown in the stacks above, the
connection.run()
goroutine is blocked so that it could not handle any new command anymore, including the ACK response. Then the ACK related method will fail with "request timed out". The deadlock cannot be resolved unless the consumer peeks new messages again to makemessagesCh
not full, thenqueueCh
will move messages tomessagesCh
so thatqueueCh
will be not full.The root cause is that
queueCh
is a buffered channel that has a fixed size ofReceiverQueueSize
. However, the broker could dispatch more messages than theReceiverQueueSize
because the permits in Flow requests only limits the number of entries to read, not the number of messages. Hence this issue could be easily reproduced by reducing theReceiverQueueSize
and sending many messages with a great batch size. See howTestAckResponseNotBlocked
reproduces this issue.Modifications
Add a
list
of messages (pendingMessages
) to support queueing unlimited number of messages forpartitionedConsumer
. The flow control is actually controlled by theavailablePermits
and related Flow requests so that the queue size won't be too large (it depends on the batch size).Add two channels (
queueInCh
andqueueOutCh
) for the following loops:queueInCh -> pendingMessages -> queueOutCh
MessageReceived
, send the received message toqueueInCh
so that the message will be queued topendingMessages
. It could never be blocked.dispatcher()
, poll messages fromqueueOutCh
, which polls the 1st message frompendingMessages
.The background goroutine will exist by
close(pc.closeCh)
when the consumer is closed. It happens after completing theCloseConsumer
RPC ininternalClose
so there should not be aMessageReceived
call that sends messages to the closedqueueInCh
.Add
TestAckResponseNotBlocked
to verify it works with a very small receiver queue size config.