Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild SrvVSchema during InitTablet. #6276

Merged
merged 1 commit into from
Jun 6, 2020

Conversation

enisoc
Copy link
Member

@enisoc enisoc commented Jun 5, 2020

We used to rebuild SrvVSchema in all cells as part of running GetOrCreateShard from any cell. That caused VSchema problems to spread instantly to all cells at once, so we changed EnsureVSchema to only rebuild SrvVSchema in the local cell (#5930).

We believed that at least one tablet in a given cell would still end up rebuilding SrvVschema in the cell since they all call GetOrCreateShard at startup. However, I didn't realize that GetOrCreateShard
short-circuits if the shard already exists, so it was skipping EnsureVSchema entirely. As a result, only one cell (corresponding to the one tablet that won the race) actually had its SrvVSchema rebuilt.

This moves the rebuild of SrvVSchema to be an explicit step in InitTablet alongside the rebuild of the keyspace graph. This keeps GetOrCreateShard as a task that only needs to be done by one tablet globally because it affects global topo. The rebuilding of per-cell records should be done by a tablet in that cell.

@enisoc enisoc requested review from deepthi and zmagg June 5, 2020 21:56
@enisoc enisoc requested a review from sougou as a code owner June 5, 2020 21:56
We used to rebuild SrvVSchema in all cells as part of running
GetOrCreateShard from any cell. That caused VSchema problems to spread
instantly to all cells at once, so we changed EnsureVSchema to only
rebuild SrvVSchema in the local cell.

We believed that at least one tablet in a given cell would still end up
rebuilding SrvVschema in the cell since they all call GetOrCreateShard
at startup. However, I didn't realize that GetOrCreateShard
short-circuits if the shard already exists, so it was skipping
EnsureVSchema entirely.

This moves the rebuild of SrvVSchema to be an explicit step in
InitTablet alongside the rebuild of the keyspace graph. This keeps
GetOrCreateShard as a task that only needs to be done by one tablet
globally because it affects global topo. The rebuilding of per-cell
records should be done by a tablet in that cell.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
switch {
case err == nil:
// Check if vschema was rebuilt after the initial creation of the keyspace.
if _, keyspaceExists := srvVSchema.GetKeyspaces()[*initKeyspace]; !keyspaceExists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this also need a RebuildKeyspaceGraph?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Are you saying if a keyspace is missing from SrvVSchema, it implies that RebuildKeyspace has not run for that keyspace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that SrvKeyspace and SrvVSchema were completely independent. I'll double check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right. I think they are independent.
But I also think that a missing SrvKeyspace is going to be a problem. And now that I think about it, there are some workflows where this has led to odd errors.
However, I think that's an orthogonal issue, and it's better to resolve that separately.

switch {
case err == nil:
// Check if vschema was rebuilt after the initial creation of the keyspace.
if _, keyspaceExists := srvVSchema.GetKeyspaces()[*initKeyspace]; !keyspaceExists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right. I think they are independent.
But I also think that a missing SrvKeyspace is going to be a problem. And now that I think about it, there are some workflows where this has led to odd errors.
However, I think that's an orthogonal issue, and it's better to resolve that separately.

@sougou sougou merged commit 19bc9d4 into vitessio:master Jun 6, 2020
@enisoc enisoc deleted the rebuild-vschema branch June 8, 2020 23:35
@deepthi deepthi added this to the v7.0 milestone Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants