Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

feat: support multiple channels in SessionPool #1063

Merged
merged 5 commits into from
Nov 19, 2019
Merged

feat: support multiple channels in SessionPool #1063

merged 5 commits into from
Nov 19, 2019

Conversation

mr-salty
Copy link
Contributor

@mr-salty mr-salty commented Nov 15, 2019

  • SessionPool now utilizes all channels that are given to it, creating new sessions on the least loaded channel.
  • The default number of channels created has been changed to 4.
  • Refactor Session creation and add to pool logic.
  • Add some tests that use multipe channels.
  • Rework some tests that depended on the order of returned Sessions.

Part of #307


This change is Reviewable

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Nov 15, 2019
* SessionPool now utilizes all channels that are given to it, creating
  new sessions on the least loaded channel.
* The default number of channels created has been changed to 4.
* Refactor Session creation and add to pool logic.
* Add some tests that use multipe channels.
* Rework some tests that depended on the order of returned Sessions.

Part of #307
@mr-salty mr-salty changed the title support multiple channels in SessionPool feat: support multiple channels in SessionPool Nov 15, 2019
@codecov
Copy link

codecov bot commented Nov 15, 2019

Codecov Report

Merging #1063 into master will increase coverage by 0.5%.
The diff coverage is 99.12%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #1063     +/-   ##
=========================================
+ Coverage   91.59%   92.09%   +0.5%     
=========================================
  Files         162      165      +3     
  Lines       11785    12117    +332     
=========================================
+ Hits        10794    11159    +365     
+ Misses        991      958     -33
Impacted Files Coverage Δ
google/cloud/spanner/connection_options.cc 91.17% <ø> (ø) ⬆️
google/cloud/spanner/connection_options.h 100% <ø> (ø) ⬆️
google/cloud/spanner/internal/session_pool_test.cc 99.4% <100%> (+0.23%) ⬆️
google/cloud/spanner/internal/session_pool.h 100% <100%> (ø)
google/cloud/spanner/internal/session_pool.cc 97.47% <98.46%> (+4.2%) ⬆️
...loud/spanner/internal/partial_result_set_resume.cc 90.9% <0%> (-4.55%) ⬇️
google/cloud/spanner/internal/retry_loop.h 95.45% <0%> (-4.55%) ⬇️
google/cloud/spanner/internal/polling_loop.h 91.89% <0%> (-2.71%) ⬇️
google/cloud/spanner/client_test.cc 92.85% <0%> (-2.4%) ⬇️
google/cloud/spanner/internal/spanner_stub.cc 68.13% <0%> (-2.2%) ⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f5d1aa...a5fe87f. Read the comment docs.

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool, thanks!

Reviewed 5 of 5 files at r1.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @devbww, @devjgm, @mr-salty, and @scotthart)


google/cloud/spanner/connection_options.h, line 86 at r1 (raw file):

   * parallel.
   *
   * The default value is 4.

FYI. And no change needed in this PR: I am not sure we want to hard-code this value, maybe the application developers want to change that. Also I think we want this to scale with the number of cores, we do that in Bigtable, and it kind of makes sense because more cores == more potential calls in parallel. In any case, please open a bug (probably for post-BETA) to fine tune this value.


google/cloud/spanner/internal/session_pool.h, line 157 at r1 (raw file):

  std::vector<ChannelInfo> channels_;
  ChannelInfo* least_loaded_channel_;

It seems that this is used for session creation and when the session does not have an associated stub already. Consider randomly picking a channel / stub in those cases, which is less likely to fall into thundering herd problems.


google/cloud/spanner/internal/session_pool.cc, line 74 at r1 (raw file):

  int num_channels = static_cast<int>(channels_.size());
  int sessions_per_channel = options_.min_sessions / num_channels;
  // if it doesn't divide evenly, add the remainder to the first channel.

That does not work well when you have a lot of channels (say num_channels=20) and you have K*num_channels - 1 sessions. Maybe decrement extra_sessions until it gets to zero and add one session to each of the channels if extra_sessions != 0?


google/cloud/spanner/internal/session_pool.cc, line 144 at r1 (raw file):

  // Sessions that were created for partitioned Reads/Queries do not have
  // their own stub, so return one to use.
  return least_loaded_channel_->stub;

Doesn't this implicitly change the load on that stub? It is now likely to be used, right?


google/cloud/spanner/internal/session_pool.cc, line 199 at r1 (raw file):

  channel.session_count += sessions_created;
  total_sessions_ += sessions_created;
  // TODO(#307) instead of adding all of these to the end, we could insert

Consider just reshuffling after this insert.

google/cloud/spanner/internal/session_pool.cc Outdated Show resolved Hide resolved
google/cloud/spanner/internal/session_pool.cc Outdated Show resolved Hide resolved
google/cloud/spanner/internal/session_pool.cc Outdated Show resolved Hide resolved
google/cloud/spanner/internal/session_pool.cc Outdated Show resolved Hide resolved
google/cloud/spanner/internal/session_pool.cc Show resolved Hide resolved
google/cloud/spanner/internal/session_pool.cc Outdated Show resolved Hide resolved
google/cloud/spanner/internal/session_pool.h Outdated Show resolved Hide resolved
google/cloud/spanner/internal/session_pool.cc Outdated Show resolved Hide resolved
google/cloud/spanner/internal/session_pool.cc Show resolved Hide resolved
Copy link
Contributor Author

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1.
Reviewable status: 3 of 5 files reviewed, 12 unresolved discussions (waiting on @coryan, @devbww, @devjgm, and @scotthart)


google/cloud/spanner/connection_options.h, line 86 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

FYI. And no change needed in this PR: I am not sure we want to hard-code this value, maybe the application developers want to change that. Also I think we want this to scale with the number of cores, we do that in Bigtable, and it kind of makes sense because more cores == more potential calls in parallel. In any case, please open a bug (probably for post-BETA) to fine tune this value.

I filed #1077 to cover all the options.


google/cloud/spanner/internal/session_pool.h, line 157 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

It seems that this is used for session creation and when the session does not have an associated stub already. Consider randomly picking a channel / stub in those cases, which is less likely to fall into thundering herd problems.

I split this into two renamed variables and round-robin in the latter case as noted elsewhere


google/cloud/spanner/internal/session_pool.h, line 157 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

Perhaps something like fewest_sessions_channel_ might be clearer?

WDYT now?


google/cloud/spanner/internal/session_pool.cc, line 74 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

That does not work well when you have a lot of channels (say num_channels=20) and you have K*num_channels - 1 sessions. Maybe decrement extra_sessions until it gets to zero and add one session to each of the channels if extra_sessions != 0?

Done, thanks. I was afraid of making it too complicated but your suggestion is pretty simple.


google/cloud/spanner/internal/session_pool.cc, line 144 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Doesn't this implicitly change the load on that stub? It is now likely to be used, right?

yes, but note if we end up here we're also using a dissociated session, so the pool gets no indication when the caller is done using it.

A while back @devbww and I did discuss whether it was worthwhile to try to track this, i.e. instead of using the default deleter for these Sesssions like we do now, make them use a custom deleter than updates the pool statistics but doesn't return the session to the pool. We decided it wasn't worth the complication, but noted that we could do it in the future if needed since it's not user-visible either way. Since we're only approximately distributing Sessions across channels, it doesn't seem worthwhile, although I think I'm convinced it's better to just round-robin through the sessions here instead of using least_loaded_, so we'll be guaranteed to spread out the load.

FYI I realized I was missing a lock here and added one.


google/cloud/spanner/internal/session_pool.cc, line 199 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Consider just reshuffling after this insert.

Done


google/cloud/spanner/internal/session_pool.cc, line 40 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

If we're going to say ...

options.max_sessions = (std::max)(options.max_sessions, 1);

instead of ...

if (options.max_sessions < 1) {
  options.max_sessions = 1;
}

we might as well do the same thing here:

options.max_sessions = (std::max)(options.max_sessions, options.min_sessions);

Done.


google/cloud/spanner/internal/session_pool.cc, line 74 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

Why choose to do that? Would not things be nicer with 1 extra session on each of the first extra_sessions channels, rather than extra_sessions sessions on the first channel? I guess the way we add new sessions in blobs to a single channel makes this moot?

I changed it to make it more even, but you are correct that it can and will get out of balance as we add sessions - the goal is just to try to approximately balance the Sessions across channels.


google/cloud/spanner/internal/session_pool.cc, line 144 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

Why use the channel with the fewest sessions?

(see above)


google/cloud/spanner/internal/session_pool.cc, line 161 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

Can we refactor somehow so that the caller drop/retakes the lock? That is, so that all the "inside the lock" stuff is in Allocate()?

The AddSessionsToPool stuff needs to happen under the lock. I could move that outside, although I'd just moved it inside because after we create the sessions there's really nothing to be done with them but add them to the pool. I'm also wondering if I should inline AddSessionsToPool because it's unlikely to have any other caller than CreateSessions, and in fact we could avoid an extra trip through a vector if I did that - so I decided to do it...


google/cloud/spanner/internal/session_pool.cc, line 167 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

Does any of this need to happen inside the lock?

no


google/cloud/spanner/internal/session_pool.cc, line 193 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

Nit: s/adds/Adds/

Done


google/cloud/spanner/internal/session_pool.cc, line 194 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

AddSessionsToChannel() perhaps?

It's really adding the Sessions to the the pool... the channel is just for statistics (as the comment says).


google/cloud/spanner/internal/session_pool.cc, line 204 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

It looks like this is unnecessary unless least_loaded_channel_ == &channel.

Done.

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @devbww, @devjgm, and @scotthart)


google/cloud/spanner/internal/session_pool.cc, line 207 at r3 (raw file):

void SessionPool::UpdateNextChannelForCreateSessions() {
  next_channel_for_create_sessions_ = &channels_[0];

Presumably it is impossible for channels_ to be empty? Maybe a comment?

Copy link
Contributor Author

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @devbww, @devjgm, and @scotthart)


google/cloud/spanner/internal/session_pool.h, line 160 at r3 (raw file):

Previously, devbww (Bradley White) wrote…

Opt: Consider that both these fields identify an element of channels_, but we choose different types for each, and neither is std::vector<ChannelInfo>::iterator.

Done.


google/cloud/spanner/internal/session_pool.cc, line 59 at r3 (raw file):

Previously, devbww (Bradley White) wrote…

Q: Do we know !stubs.empty()?

I just legislated it by comment.

I had changed the public methods to not crash, but then I realized I also had to do some special-case stuff in the constructor, so I just gave up and went with the comment... can we assert here?


google/cloud/spanner/internal/session_pool.cc, line 207 at r3 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Presumably it is impossible for channels_ to be empty? Maybe a comment?

@devbww commented on this above as well (in the constructor), let's comment further there if necessary.

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @devbww, @devjgm, @mr-salty, and @scotthart)


google/cloud/spanner/internal/session_pool.h, line 157 at r2 (raw file):

Previously, mr-salty (Todd Derr) wrote…

WDYT now?

FYI: Iterators become invalidated by a lot of things, while indices are not... I searched for any place that might have invalidated the iterators and it all looks correct now. It is easier to break in the future though.


google/cloud/spanner/internal/session_pool.cc, line 58 at r4 (raw file):

      backoff_policy_prototype_(std::move(backoff_policy)),
      options_(SanitizeOptions(options)) {
  channels_.reserve(stubs.size());

I would do something like:

if (stubs.empty()) google::cloud::internal::ThrowInvalidArgument("SessionPool requires a non-empty set of stubs");

That way we get an earlier error if somebody bypasses your comment.


google/cloud/spanner/internal/session_pool.cc, line 146 at r4 (raw file):

    std::unique_lock<std::mutex> lk(mu_);
    stub = next_dissociated_stub_channel_->stub;
    if (++next_dissociated_stub_channel_ == channels_.end()) {

FYI: this code is assuming that channels_ is not empty too, just in a more subtle way, I recall that calling ++ on an .end() iterator is not a great thing.

Copy link
Contributor Author

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 5 files reviewed, 11 unresolved discussions (waiting on @coryan, @devbww, @devjgm, and @scotthart)


google/cloud/spanner/internal/session_pool.h, line 157 at r2 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

FYI: Iterators become invalidated by a lot of things, while indices are not... I searched for any place that might have invalidated the iterators and it all looks correct now. It is easier to break in the future though.

I added a TODO about switching to absl::FixedArray.

We discussed some alternatives that would make this "safe" now (vector<ChannelInfo*> const or a plain new[] allocated array) but decided it was preferable to keep it as-is to make the migration easier. I also moved the actual creation out to


google/cloud/spanner/internal/session_pool.cc, line 58 at r4 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

I would do something like:

if (stubs.empty()) google::cloud::internal::ThrowInvalidArgument("SessionPool requires a non-empty set of stubs");

That way we get an earlier error if somebody bypasses your comment.

Done.


google/cloud/spanner/internal/session_pool.cc, line 146 at r4 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

FYI: this code is assuming that channels_ is not empty too, just in a more subtle way, I recall that calling ++ on an .end() iterator is not a great thing.

now that we throw the exception we can safely assume it's non-empty, unless you think it's worth special-casing everywhere we use or increment one of these iterators. I added a comment in the .h file about this and iterator validity (and also a TODO to use absl::FixedArray which is not resizeable).

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r5.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @devbww, @devjgm, and @scotthart)

@mr-salty mr-salty merged commit 403286b into googleapis:master Nov 19, 2019
@mr-salty mr-salty deleted the multi2 branch November 19, 2019 21:04
devjgm pushed a commit to devjgm/google-cloud-cpp that referenced this pull request May 7, 2020
…ud-cpp-spanner#1063)

* feat: support multiple channels in SessionPool

* SessionPool now utilizes all channels that are given to it, creating
  new sessions on the least loaded channel.
* The default number of channels created has been changed to 4.
* Refactor Session creation and add to pool logic.
* Add some tests that use multipe channels.
* Rework some tests that depended on the order of returned Sessions.

Part of googleapis/google-cloud-cpp-spanner#307

* address review comments

* address review comments

* address review comments

* very important backtick change
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants