Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[topology] Deprecate Shard cells #4482

Closed
5 of 10 tasks
rafael opened this issue Dec 20, 2018 · 2 comments
Closed
5 of 10 tasks

[topology] Deprecate Shard cells #4482

rafael opened this issue Dec 20, 2018 · 2 comments

Comments

@rafael
Copy link
Member

rafael commented Dec 20, 2018

Proposal

The following is a proposal to deprecate the use of cells in the definition of a topodata.Shard. This would make all shards discoverable by any cell.

Note this also requires removing the concept of cell from the following data structures:

Shard_ServedType
Shard_TabletControl
Keyspace_ServedFrom

If there are Vitess users that are relying on this feature, please chime in this issue as we would like to understand that use case.

Even though this is a risky refactor, if our assumptions are correct, it will simplify the topology management. It will also make regions feature fully functional.

Context

As part of ramping up a cluster using the regions feature from #3460, it came to our attention that RebuildKeyspaceGraph, will only perform any work if there are already tablets registered in a given shard. Specifically, it uses this field from topodata.Shard to decide if it needs to rebuild the graph in a given cell. This creates a problem in the context of regions. It possible that you will want a vtgate registered in a given cell when if fact you don't have tablet running in that cell (but you do have tablets in the region).

Stepping back from this particular issue, @demmer, @sougou and myself started to question the need of having cells in a given shard. What does it mean that a shard is only served from a given cell? It seems like this level of granularity is adding extra complexity that does not seem to serve existent Vitess use cases.

Next Steps

I started doing an audit and it seems like this refactor should be doable. The following list are all the places where this attribute is being used. I'm adding a brief description of what's the context and my first assessment on the risk of making this change:

  • CompareShardReplications (RISK: low): Get cells from shard info to compare topos shard replication. We will need to get all cells here and then get shard replication info for each. Straight forward change.
  • CopyShardReplications (RISK: low): Same as CompareShardReplications but for copy.
  • RebuildKeyspace (RISK: low): Rebuild keyspace graph in a cell. Here we would blindly rebuild keyspace graph in the desired target cells. This change seems straight forward. [DONE in Deprecate use shard cells when rebuilding keyspace graph in a cell.  #4485]
  • FindAllTabletAliasesInShardByCell (RISK: low): This method gets all the tablets in a cell. We will need to refactor it to not get cell information from the Shard record. This seems straight forward.
  • WaitForDrain (RISK: medium): This methods drains tablets in a given cell. When no cells are provided, it drains all active cells. It uses ShardInfo to decide which are the active cells. I think we could get the same info from ShardReplication.
  • removeCellsFromTabletControl (RISK: medium/high): This method removes cells from tabletControl. I think we will need to deprecate cells field from tablet control as well.
  • UpdateDisableQueryService (RISK: high): This method disables query service from tabletControl.
  • UpdateServedTypesMap (RISK: high): This method updated served types. The refactor here would involve not using cells when enabling/disabling serving types.
  • migrateServedFromLocked (RISK: high): This is used during resharding. It uses the destination shard cells when migrating serving types. This will require to remove the concept of cells from Keyspace_ServedFrom.
  • There are set of methods in wrangler shards that also modify this attribute, but they all seem low risk to remove.

If this all makes sense, my plan is to start opening a PR to remove the use of cells in each of these methods. We will need to merge all the PR's at the same time, but for reviewing I think it makes to decouple them in separate chunks.

@rafael
Copy link
Member Author

rafael commented Dec 22, 2018

I started prototyping on this. I think I understand the idea behind storing cell information in shards. There is an existent feature that relies on this: MigrateServing types per cell

At the moment it is possible to migrate one shard at a time per cell (only for rdonly and replica). The moment you get to master you can't specify a cell.

I started discussing with Sugu in Slack about this. I think we have an idea on how to support this without storing it in the global topo shard info.

Having said this, I'm pondering if the complexity added by this feature is worth it. I'm thinking that migrating one shard at a time already provides a good enough granularity in case something goes wrong.

I'm going to think about this over the holidays 😀

@rafael
Copy link
Member Author

rafael commented Jan 2, 2019

Closing this in favor of #4496

@rafael rafael closed this as completed Jan 2, 2019
systay pushed a commit that referenced this issue Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant