Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[topology] Global Topology Serving Shards refactor #4496

Closed
rafael opened this issue Jan 2, 2019 · 3 comments
Closed

[topology] Global Topology Serving Shards refactor #4496

rafael opened this issue Jan 2, 2019 · 3 comments

Comments

@rafael
Copy link
Member

rafael commented Jan 2, 2019

Feature Description and motivation

In order to route to a given shard, Vitess needs to know whether or not a shard is Serving. Currently, there is no canonical way to determine this. The following is a proposal to simplify Global topology and serving keyspace graph generation that will allow us to have a canonical representation of serving shards.

I see two main reasons to go in this refactor right now:

  • Not having this canonical way to know if a shard is serving, introduces incidental complexity that bleeds to different parts of the code base (more details in how it works today section)
  • Current design has the side effect that you can't register vtgates in cells that don't have tablets. This creates problems with multi cell region deployments where it is possible to have vtgates in cells that don't have tablets. A workaround was proposed in [topology] Deprecate Shard cells #4482, but as part of that exploratory work, it seems like a higher level refactor like the one proposed here is a better approach.

How it works today

Today there are three ways to know if a shard is serving or not:

  • Shard ServedTypes: This information is set in the shard topology record on creation.
  • TabletControl DisableQueryService: This is set during resharding worfklows when migrating serving types.
  • SourceShards: This is set during split clone workflows. When set, tablets set themselves to not serving.

You can see this playing in action in tablet state change.

It is important to note that this information is set at the cell level and gets replicated in different parts of the topology (Shard cells, ServedTypes Cells, TabletControl, etc).

Proposal

We can get to a canonical representation with the following changes:

  • Serving KeyspaceGraph information (which is stored in each cell) becomes the source of truth for serving shards.
  • Deprecate the use of ServedTypes/TabletControls.DisableQueryService/Cells from Shard record.
  • Add a new field to Shard record to mark serving state (IsMasterServing). The value of this attribute can keep the same semantics we have today for CreateShard (i.e on shard creation, it will guarantee that there are no overlapping shards in a serving state).
  • When generating ServingKeyspace graph, if a record does not exist in a shard for a given cell, IsMasterServing attribute will be used to derive RDONLY/REPLICA/MASTER serving types for t
    he cell.
  • When migrating serving types, instead of using TabletControls, ServingKeyspaceGraph will be updated directly in the local topology.

This covers the core of the refactor. There is one caveat: during SplitClone a change to tablet API will need to be made so it can write to a non serving shard, right now that works because non serving shards a writeable due to the use of SourceShards in tablet state change.

Besides giving us a canonical representation of serving shards, this refactor also reduces the dependency on global topology and removes the complexity around keeping cell information in the Shard record.

rafael added a commit to tinyspeck/vitess that referenced this issue Feb 5, 2019
Initial stab at: vitessio#4496

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
@mpawliszyn
Copy link
Collaborator

Is this upgrading the lockserver to a single point of failure for general operation of the system. Does it mean the lockserver must always be writable in order to do any type of failover or regular maintenance to the system?

The current way means that the overall configuration of the system is in the lockserver but the authority of who was serving was the job of the tablet healthcheck. This makes the system resilient to any problems with the lockserver. We have had situations where the lockserver had an issue but vitess kept chugging along. This is exactly the kind of resilience we need for high capacity, highly available systems.

We have noticed a large increase in the load to our lockserver and wonder if changes to this end could be to blame.

@rafael
Copy link
Member Author

rafael commented Feb 19, 2019

Hi @mpawliszyn -

The ultimate goal of this refactor is to reduce dependencies from global topo for general operations of the system. Once we are done with these changes, only lockserver in relevant cells should be available for writes when doing a failover/PlannedReparent/adding tablets to a cell. Having said this, even when we are done with #4496 we will be along way before we get to those goals. I'll be reducing dependencies into global topo, but not all of them.

The following assumption is correct and won't change after this refactor:

The current way means that the overall configuration of the system is in the lockserver but the authority of who was serving was the job of the tablet healthcheck

What do you mean by regular maintenance of the system?

Could you elaborate more on the last bit too? I'm not sure I'm following. None of the changes proposed in this work are merged, so it can't change current system behavior.

@mpawliszyn
Copy link
Collaborator

Yeah if nothing has been checked in then I guess I cannot be this particular change. Glad that we are still using the tablet's own information for figuring out what is healthy and what is master.

Keeping the existing system going by doing reparents and handling an outage on a tablet or even OS upgrades is the kind of regular maintenance that we should be able to do without a functioning lockserver and sound like it will still be the case. Otherwise as our systems get bigger we are at a higher risk for cascading failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants