Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release the state lock before calling the publish api #7686

Conversation

sayap
Copy link

@sayap sayap commented Apr 10, 2019

Prior to this change, any problem in the communication path to pubsub
(e.g. bad connection, slow servers, etc) would not only tie up the
calling thread itself, but also other threads waiting to get hold of
the state lock as they try to publish over the same batch.

We only need to hold the state lock for the transition from
ACCEPTING_MESSAGES / STARTING to IN_PROGRESS. After that, since only
one thread is able to transition to IN_PROGRESS, we can safely release
the state lock before calling the publish api and eventually
transitioning to SUCCESS / ERROR.

Co-authored-by: Rencana Tarigan rtarigan@bbmtek.com

Prior to this change, any problem in the communication path to pubsub
(e.g. bad connection, slow servers, etc) would not only tie up the
calling thread itself, but also other threads waiting to get hold of
the state lock as they try to publish over the same batch.

We only need to hold the state lock for the transition from
ACCEPTING_MESSAGES / STARTING to IN_PROGRESS. After that, since only
one thread is able to transition to IN_PROGRESS, we can safely release
the state lock before calling the publish api and eventually
transitioning to SUCCESS / ERROR.

Co-authored-by: Rencana Tarigan <rtarigan@bbmtek.com>
@sayap sayap requested a review from crwilcox as a code owner April 10, 2019 09:53
@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no This human has *not* signed the Contributor License Agreement. label Apr 10, 2019
@sayap
Copy link
Author

sayap commented Apr 10, 2019

Note that the commit contains whitespace-only changes: 26ac207?w=1

@sayap
Copy link
Author

sayap commented Apr 10, 2019

recheck CLA

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

@rtarigan
Copy link

I already sign the CLA

@sduskis sduskis added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Apr 10, 2019
@googlebot
Copy link

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

ℹ️ Googlers: Go here for more info.

@sduskis sduskis added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 10, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 10, 2019
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Apr 17, 2019
except google.api_core.exceptions.GoogleAPIError as exc:
# We failed to publish, set the exception on all futures and
# exit.
self._status = base.BatchStatus.ERROR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self._state_lock is designed to protect setting self._status: this change undoes that protection in the case that an error occurs, which introduces a race condition AFAICT.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tres, we have covered that in the PR description:

We only need to hold the state lock for the transition from ACCEPTING_MESSAGES / STARTING to IN_PROGRESS. After that, since only one thread is able to transition to IN_PROGRESS, we can safely release the state lock before calling the publish api and eventually transitioning to SUCCESS / ERROR.

Maybe you can elaborate on the race condition? Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL; DR - Not holding the lock after changing the state to IN_PROGRESS indeed seems safe. However, I would still like to see the effect of this change covered by a test before merging. I can help with this, if needed.


Long reply:

I checked the Batch code, and from what I can tell, that the following threads are relevant:

  • MonitorBatchPublisher (if autcommit == True): Triggers _commit() after max_latency time elapses.
  • CommitBatchPublisher: If somebody calls commit() and the batch is in ACCEPTING_MESSAGES state, _commit() gets called in this thread.
  • User thread(s): When calling publisher.publish(), batch.publish() gets called. If that causes an overflow, commit() is called in the CommitBatchPublisher thread.

The first thing that _commit() does after acquiring the lock is changing the batch status to IN_PROGRESS.

In that state, any further calls to publish() become a no-op, because the will_accept() check returns False for any state that is not ACCEPTING_MESSAGES.

Additionally, any other _commit() calls also become a no-op, because the batch status is now IN_PROGRESS.

Only the thread that changed the state to IN_PROGRESS can proceed, thus shielding the rest of the _commit() method with _state_lock indeed seems unnecessary, which includes the call to self._client.api.publish().

Without the _state_lock held, further calls to batch.publish() will not block longer than needed. The batch will simply not accept a new message, return False, and publisher.publish() will create an entirely new batch as a result.

@sduskis sduskis added the api: pubsub Issues related to the Pub/Sub API. label May 9, 2019
@sduskis sduskis requested a review from plamut May 15, 2019 00:46
@sduskis
Copy link
Contributor

sduskis commented May 15, 2019

@plamut, can you please take a look at this?

@sduskis sduskis removed the request for review from crwilcox June 4, 2019 16:18
@sduskis
Copy link
Contributor

sduskis commented Jun 4, 2019

@rtarigan, this PR is a bit stale and needs tests. Please reopen this PR once the tests are in place.

@sduskis sduskis closed this Jun 4, 2019
plamut added a commit to plamut/google-cloud-python that referenced this pull request Jun 5, 2019
Once the publish batch transitions to IN_PROGRESS state, any subsequent
calls to commit the batch effectively become a no-op. The state lock
can thus be released immediately after the state change, unblocking
other threads that might be waiting to publish another PubSub  message.

Co-authored by @sayap (GitHub) and Rencana Tarigan rtarigan@bbmtek.com
googleapis#7686
plamut added a commit to plamut/google-cloud-python that referenced this pull request Jun 6, 2019
Once the publish batch transitions to IN_PROGRESS state, any subsequent
calls to commit the batch effectively become a no-op. The state lock
can thus be released immediately after the state change, unblocking
other threads that might be waiting to publish another PubSub  message.

Co-authored by @sayap (GitHub) and Rencana Tarigan rtarigan@bbmtek.com
googleapis#7686
@plamut
Copy link
Contributor

plamut commented Jun 6, 2019

@rtarigan Thank you for the fix! I opened another PR (#8234) that also includes tests, but mentioned you and Rencana Tarigan as co-authors in the commit message, as the fix is essentially the same as the one here.

plamut added a commit that referenced this pull request Jun 13, 2019
* Release publish batch lock much sooner

Once the publish batch transitions to IN_PROGRESS state, any subsequent
calls to commit the batch effectively become a no-op. The state lock
can thus be released immediately after the state change, unblocking
other threads that might be waiting to publish another PubSub  message.

Co-authored by @sayap (GitHub) and Rencana Tarigan rtarigan@bbmtek.com
#7686

* Add minor comment improvements to Batch methods
plamut added a commit to googleapis/python-pubsub that referenced this pull request Jan 31, 2020
* Release publish batch lock much sooner

Once the publish batch transitions to IN_PROGRESS state, any subsequent
calls to commit the batch effectively become a no-op. The state lock
can thus be released immediately after the state change, unblocking
other threads that might be waiting to publish another PubSub  message.

Co-authored by @sayap (GitHub) and Rencana Tarigan rtarigan@bbmtek.com
googleapis/google-cloud-python#7686

* Add minor comment improvements to Batch methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. cla: yes This human has signed the Contributor License Agreement. 🚨 This issue needs some love.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants