You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that we are getting a large volume of these messages:
DEBUG org.apache.bookkeeper.client.LedgerHandle - pending add not completed: PendingAddOp(lid:3510117, eid:27, completed:false)
It seemed like they were getting caused by timeouts, so I checked the log for how many timeouts we were seeing:
That's over half a million timeouts over 2 hours on a test cluster, and I don't think I even sent that many messages to Pulsar that would have resulted in these PendingAddOps.
I wanted to see the full history of actions on an entryID to get an understanding of the behavior, so I did a grep for it in the log.
Here's what appeared:
DEBUG org.apache.bookkeeper.client.PendingAddOp - Unsetting success for ledger: 3510077 entry: 12098 bookie index: 1
DEBUG org.apache.bookkeeper.proto.PerChannelBookieClient - Could not write Add request to bookie 10.20.69.29/10.20.69.29:3181 for ledger 3510077, entry 12098
DEBUG org.apache.bookkeeper.client.PendingAddOp - Write did not succeed: 3510077, 12098. But we have already fixed it.
DEBUG org.apache.bookkeeper.proto.PerChannelBookieClient - Could not write Add request to bookie 10.20.69.37/10.20.69.37:3181 for ledger 3510077, entry 12098
WARN org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (3510077, 12098): Bookie operation timeout
. . .
[ Repeats a ton of times ]
It looks like we're returning a success callback in LedgerHandle
but it seems like the future is failing, and then bookkeeper seems to be repeatedly trying to resolve the issue and timing out without reporting the problem to the client.
This is related to apache/pulsar#6054
(In that case, it appears that these PendingAddOp timeouts are causing the broker to not ack the messages.)
I was using BookKeeper 4.12.0, but this may have been an issue in prior versions in our environment.
The text was updated successfully, but these errors were encountered:
I noticed that we are getting a large volume of these messages:
DEBUG org.apache.bookkeeper.client.LedgerHandle - pending add not completed: PendingAddOp(lid:3510117, eid:27, completed:false)
It seemed like they were getting caused by timeouts, so I checked the log for how many timeouts we were seeing:
That's over half a million timeouts over 2 hours on a test cluster, and I don't think I even sent that many messages to Pulsar that would have resulted in these
PendingAddOp
s.I wanted to see the full history of actions on an entryID to get an understanding of the behavior, so I did a grep for it in the log.
Here's what appeared:
It looks like we're returning a success callback in LedgerHandle
but it seems like the future is failing, and then bookkeeper seems to be repeatedly trying to resolve the issue and timing out without reporting the problem to the client.
This is related to apache/pulsar#6054
(In that case, it appears that these PendingAddOp timeouts are causing the broker to not ack the messages.)
I was using BookKeeper 4.12.0, but this may have been an issue in prior versions in our environment.
The text was updated successfully, but these errors were encountered: