-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release the state lock before calling the publish api #7686
Release the state lock before calling the publish api #7686
Conversation
Prior to this change, any problem in the communication path to pubsub (e.g. bad connection, slow servers, etc) would not only tie up the calling thread itself, but also other threads waiting to get hold of the state lock as they try to publish over the same batch. We only need to hold the state lock for the transition from ACCEPTING_MESSAGES / STARTING to IN_PROGRESS. After that, since only one thread is able to transition to IN_PROGRESS, we can safely release the state lock before calling the publish api and eventually transitioning to SUCCESS / ERROR. Co-authored-by: Rencana Tarigan <rtarigan@bbmtek.com>
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
Note that the commit contains whitespace-only changes: 26ac207?w=1 |
recheck CLA |
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the ℹ️ Googlers: Go here for more info. |
I already sign the CLA |
A Googler has manually verified that the CLAs look good. (Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.) ℹ️ Googlers: Go here for more info. |
except google.api_core.exceptions.GoogleAPIError as exc: | ||
# We failed to publish, set the exception on all futures and | ||
# exit. | ||
self._status = base.BatchStatus.ERROR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The self._state_lock
is designed to protect setting self._status
: this change undoes that protection in the case that an error occurs, which introduces a race condition AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Tres, we have covered that in the PR description:
We only need to hold the state lock for the transition from ACCEPTING_MESSAGES / STARTING to IN_PROGRESS. After that, since only one thread is able to transition to IN_PROGRESS, we can safely release the state lock before calling the publish api and eventually transitioning to SUCCESS / ERROR.
Maybe you can elaborate on the race condition? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL; DR - Not holding the lock after changing the state to IN_PROGRESS indeed seems safe. However, I would still like to see the effect of this change covered by a test before merging. I can help with this, if needed.
Long reply:
I checked the Batch code, and from what I can tell, that the following threads are relevant:
MonitorBatchPublisher
(if autcommit == True): Triggers_commit()
aftermax_latency
time elapses.CommitBatchPublisher
: If somebody callscommit()
and the batch is in ACCEPTING_MESSAGES state,_commit()
gets called in this thread.- User thread(s): When calling
publisher.publish()
,batch.publish()
gets called. If that causes an overflow,commit()
is called in theCommitBatchPublisher
thread.
The first thing that _commit()
does after acquiring the lock is changing the batch status to IN_PROGRESS.
In that state, any further calls to publish()
become a no-op, because the will_accept()
check returns False
for any state that is not ACCEPTING_MESSAGES.
Additionally, any other _commit()
calls also become a no-op, because the batch status is now IN_PROGRESS.
Only the thread that changed the state to IN_PROGRESS can proceed, thus shielding the rest of the _commit()
method with _state_lock
indeed seems unnecessary, which includes the call to self._client.api.publish()
.
Without the _state_lock
held, further calls to batch.publish()
will not block longer than needed. The batch will simply not accept a new message, return False
, and publisher.publish()
will create an entirely new batch as a result.
@plamut, can you please take a look at this? |
@rtarigan, this PR is a bit stale and needs tests. Please reopen this PR once the tests are in place. |
Once the publish batch transitions to IN_PROGRESS state, any subsequent calls to commit the batch effectively become a no-op. The state lock can thus be released immediately after the state change, unblocking other threads that might be waiting to publish another PubSub message. Co-authored by @sayap (GitHub) and Rencana Tarigan rtarigan@bbmtek.com googleapis#7686
Once the publish batch transitions to IN_PROGRESS state, any subsequent calls to commit the batch effectively become a no-op. The state lock can thus be released immediately after the state change, unblocking other threads that might be waiting to publish another PubSub message. Co-authored by @sayap (GitHub) and Rencana Tarigan rtarigan@bbmtek.com googleapis#7686
* Release publish batch lock much sooner Once the publish batch transitions to IN_PROGRESS state, any subsequent calls to commit the batch effectively become a no-op. The state lock can thus be released immediately after the state change, unblocking other threads that might be waiting to publish another PubSub message. Co-authored by @sayap (GitHub) and Rencana Tarigan rtarigan@bbmtek.com #7686 * Add minor comment improvements to Batch methods
* Release publish batch lock much sooner Once the publish batch transitions to IN_PROGRESS state, any subsequent calls to commit the batch effectively become a no-op. The state lock can thus be released immediately after the state change, unblocking other threads that might be waiting to publish another PubSub message. Co-authored by @sayap (GitHub) and Rencana Tarigan rtarigan@bbmtek.com googleapis/google-cloud-python#7686 * Add minor comment improvements to Batch methods
Prior to this change, any problem in the communication path to pubsub
(e.g. bad connection, slow servers, etc) would not only tie up the
calling thread itself, but also other threads waiting to get hold of
the state lock as they try to publish over the same batch.
We only need to hold the state lock for the transition from
ACCEPTING_MESSAGES / STARTING to IN_PROGRESS. After that, since only
one thread is able to transition to IN_PROGRESS, we can safely release
the state lock before calling the publish api and eventually
transitioning to SUCCESS / ERROR.
Co-authored-by: Rencana Tarigan rtarigan@bbmtek.com