Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tablet_picker: keep trying to find a tablet until context expires #6546

Merged
merged 10 commits into from
Aug 14, 2020

Conversation

deepthi
Copy link
Member

@deepthi deepthi commented Aug 6, 2020

There was a bug in the tablet_picker implementation which could cause it to loop forever.

  1. We were not checking whether the context is still valid.
  2. If there are non-zero tablets of any kind, but zero tablets of the desired type, we loop forever (see 1.)

Along with fixing the above problem, this change has a few other improvements

  • Always use a short context for topo calls
  • Do retry if a tablet is not found, but only after a delay.

Signed-off-by: deepthi deepthi@planetscale.com

Signed-off-by: deepthi <deepthi@planetscale.com>
@deepthi
Copy link
Member Author

deepthi commented Aug 6, 2020

@sougou can you please review the tests and logging? We don't want to spam the logs but still produce useful information.

deepthi and others added 2 commits August 7, 2020 17:11
Signed-off-by: deepthi <deepthi@planetscale.com>
…teMainflow

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@@ -80,6 +95,7 @@ func (tp *TabletPicker) PickForStreaming(ctx context.Context) (*topodatapb.Table
candidates := tp.getMatchingTablets(ctx)
if len(candidates) == 0 {
// if no candidates were found, sleep and try again
log.Infof("No tablet found for streaming, sleeping for %d", tabletPickerRetryDelay)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I was planning to add something similar.

rohit-nayak-ps and others added 7 commits August 11, 2020 10:41
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
@deepthi deepthi modified the milestone: 7.0.1 Aug 11, 2020
@deepthi deepthi merged commit e16a6c5 into vitessio:master Aug 14, 2020
deepthi added a commit to planetscale/vitess that referenced this pull request Aug 14, 2020
…tessio#6546)

* tablet_picker: keep trying to find a tablet until context expires

Signed-off-by: deepthi <deepthi@planetscale.com>

* tablet_picker: keyspace/shard/cells must be provided

Signed-off-by: deepthi <deepthi@planetscale.com>

* tablet picker: return ks/cell/shard in test result for TestShardMigrateMainflow

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet picker: fix wrangler tests by choosing a valid source per-test

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet picker: fix worker and testlib tests

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet_picker: gofmted

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet picker: fix test races

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet picker: gofmt

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet picker: gofmt

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* healthcheck: cleanup test code where we were setting the same flag twice

Signed-off-by: deepthi <deepthi@planetscale.com>

Co-authored-by: Rohit Nayak <rohit@planetscale.com>
@deepthi deepthi deleted the ds-fix-tp-busyloop branch September 9, 2020 23:54
@askdba askdba added this to the v8.0 milestone Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants