Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

fix: avoid deadlocks during shutdown #1397

Merged
merged 4 commits into from
Mar 17, 2020
Merged

fix: avoid deadlocks during shutdown #1397

merged 4 commits into from
Mar 17, 2020

Conversation

coryan
Copy link
Contributor

@coryan coryan commented Mar 17, 2020

Avoid deadlocks caused by the SessionPool both controlling the
lifetime of the background threads in its destructor, and being also
destructed by those threads. The background threads are now owned by
the ConnectionImpl.


This change is Reviewable

Avoid deadlocks caused by the SessionPool both controlling the
lifetime of the background threads in its destructor, and being also
destructed by those threads. The background threads are now owned by
the ConnectionImpl.
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 17, 2020
@codecov
Copy link

codecov bot commented Mar 17, 2020

Codecov Report

Merging #1397 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1397      +/-   ##
==========================================
- Coverage   94.38%   94.35%   -0.03%     
==========================================
  Files         190      190              
  Lines       15659    15679      +20     
==========================================
+ Hits        14779    14794      +15     
- Misses        880      885       +5
Impacted Files Coverage Δ
google/cloud/spanner/internal/session_pool.h 0% <ø> (ø) ⬆️
google/cloud/spanner/internal/session_pool.cc 82.63% <100%> (-0.19%) ⬇️
google/cloud/spanner/internal/session_pool_test.cc 100% <100%> (ø) ⬆️
google/cloud/spanner/internal/connection_impl.cc 96.27% <100%> (-0.01%) ⬇️
...tegration_tests/instance_admin_integration_test.cc 86.66% <0%> (-8.89%) ⬇️
google/cloud/spanner/internal/log_wrapper.h 71.42% <0%> (-3.58%) ⬇️
google/cloud/spanner/samples/samples.cc 89.48% <0%> (-0.15%) ⬇️
google/cloud/spanner/keys.h 100% <0%> (ø) ⬆️
google/cloud/spanner/internal/session.h 100% <0%> (ø) ⬆️
google/cloud/spanner/internal/transaction_impl.h 100% <0%> (ø) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00ca369...4f09d67. Read the comment docs.

@coryan coryan marked this pull request as ready for review March 17, 2020 04:44
@coryan coryan marked this pull request as ready for review March 17, 2020 12:48
Copy link
Contributor

@devjgm devjgm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks! This LGTM, but please wait for @mr-salty to confirm that this should fix the issues he was seeing.

Copy link
Contributor

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd considered this but I didn't think it was viable since ConnectionImpl doesn't actually own the SessionPool, it just has a shared_ptr to it. So, I don't think the order of operations you gave in chat is guaranteed - the SessionPool may outlive the ConnectionImpl, although it's possible that doesn't happen in any of our tests.

But, the one twist here I hadn't considered was passing CompletionQueue to the SessionPool instead of a pointer to BackgroundThreads. If we did the latter then CompletionQueue could end up dereferencing freed memory (we can't fix that with shared_ptr<BackgroundThreads> without running into the original issue). But, in this case, I believe we might end up with a CompletionQueue with no threads servicing it, is that ok? I recall having issues with that in the past.

Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @devbww, @mr-salty, and @scotthart)

Copy link
Contributor Author

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd considered this but I didn't think it was viable since ConnectionImpl doesn't actually own the SessionPool, it just has a shared_ptr to it. So, I don't think the order of operations you gave in chat is guaranteed - the SessionPool may outlive the ConnectionImpl,

Good point.

although it's possible that doesn't happen in any of our tests.

I would argue that none of the callbacks inside SessionPool can block trying to join the threads for the simple reason that they never had access to the BackgroundThreads object in the first place.

But, the one twist here I hadn't considered was passing CompletionQueue to the SessionPool instead of a pointer to BackgroundThreads. If we did the latter then CompletionQueue could end up dereferencing freed memory (we can't fix that with shared_ptr<BackgroundThreads> without running into the original issue).

Sure, but we don't do that...

But, in this case, I believe we might end up with a CompletionQueue with no threads servicing it, is that ok?

During shutdown? I think it is. I mean, sure, we still may want to add the SessionPool::Shutdown() function to wait for all the timers and RPCs to finish, but this will not deadlock with or without that function.

I recall having issues with that in the past.

There is the thing in **grpc::**CompletionQueue having to wait for all the active operations, but we fixed that in the shutdown for **google::cloud::**CompletionQueue

Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @devbww and @scotthart)

Copy link
Contributor

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I'm still a bit concerned about the possibility of a unserviced CompletionQueue but I'm not sure how to tease that out... and in any case this is an improvement over the current situation and will unblock me - and I can see if any issues arise with my pending changes.

Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @devbww and @scotthart)

@coryan coryan merged commit 6c07bb0 into googleapis:master Mar 17, 2020
@coryan coryan deleted the simplify-background-threads-lifecycle branch March 17, 2020 17:02
@coryan
Copy link
Contributor Author

coryan commented Mar 17, 2020

I am happy to write the SesssionPool::Shutdown() stuff if you want.

devjgm pushed a commit to devjgm/google-cloud-cpp that referenced this pull request May 7, 2020
…nner#1397)

Avoid deadlocks caused by the SessionPool both controlling the
lifetime of the background threads in its destructor, and being also
destructed by those threads. The background threads are now owned by
the ConnectionImpl.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants