-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to demote master vttablet to SPARE instead of REPLICA #5520
Allow to demote master vttablet to SPARE instead of REPLICA #5520
Conversation
78b0b2d
to
12d46db
Compare
@@ -171,7 +171,7 @@ func (agent *ActionAgent) finalizeTabletExternallyReparented(ctx context.Context | |||
// Call UpdateTabletFields instead of ChangeType so that we can check the type | |||
// before changing it and avoid unnecessary topo updates | |||
var err error | |||
oldMasterTablet, err = topotools.ChangeType(ctx, agent.TopoServer, oldMasterAlias, topodatapb.TabletType_REPLICA, nil) | |||
oldMasterTablet, err = topotools.ChangeType(ctx, agent.TopoServer, oldMasterAlias, agent.DemoteMasterType, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once all components are upgraded to 4.0.0+, this RPC doesn't get called any more. It is here for backwards compatibility during rollouts. It's best to not make any changes to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, reverted changes in this RPC path
@@ -589,6 +592,50 @@ func TestRPCTabletExternallyReparentedRerun(t *testing.T) { | |||
|
|||
} | |||
|
|||
func TestRPCTabletExternallyReparentedDemotesMasterToConfiguredTabletType(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test becomes obsolete along with the RPC. The new test should go into https://github.com/vitessio/vitess/blob/master/go/vt/wrangler/testlib/external_reparent_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
f4d0535
to
0822e6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We'll let @enisoc take a look before approving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deepthi Do we not need to update any other demotion-related code paths? Does this really cover all types of internal/external reparent? This almost feels too easy. :)
@@ -252,6 +254,11 @@ func NewActionAgent( | |||
return nil, err | |||
} | |||
|
|||
tabletType, err := validateDemoteMasterType(*demoteMasterType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The var name tabletType
might mislead a reader to think it's the type that will be used at startup (since this is inside NewActionAgent). Let's call it something like demoteMasterTabletType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, done!
@@ -849,3 +870,12 @@ func (agent *ActionAgent) withRetry(ctx context.Context, description string, wor | |||
} | |||
} | |||
} | |||
|
|||
func validateDemoteMasterType(tabletTypeParam string) (*topodatapb.TabletType, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We treat topodatapb.TabletType
as a scalar type, so we usually pass it by value:
func validateDemoteMasterType(tabletTypeParam string) (*topodatapb.TabletType, error) { | |
func validateDemoteMasterType(tabletTypeParam string) (topodatapb.TabletType, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
||
func validateDemoteMasterType(tabletTypeParam string) (*topodatapb.TabletType, error) { | ||
tabletType, err := topoproto.ParseTabletType(tabletTypeParam) | ||
if err != nil && (tabletType != topodatapb.TabletType_SPARE && tabletType != topodatapb.TabletType_REPLICA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you meant for that first &&
to be an ||
. However, even then I don't think it will do the right thing. That might return a nil err
when there's actually a problem.
I think what we want is:
if err != nil {
return topodatapb.TabletType_UNKNOWN, err
}
if tabletType != topodatapb.TabletType_SPARE && tabletType != topodatapb.TabletType_REPLICA {
return topodatapb.TabletType_UNKNOWN, fmt.Errorf("invalid demote_master_type %v; can only be REPLICA or SPARE", tabletType)
}
return tabletType, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, fixed. Using unknown tablet type as suggested.
} | ||
t.Fatalf("old master (%v) should be replica but is: %v", topoproto.TabletAliasString(oldMaster.Tablet.Alias), tablet.Type) | ||
} | ||
// check the old master was converted to replica |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// check the old master was converted to replica | |
// check the old master was converted to spare |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
0822e6f
to
8d674bc
Compare
@enisoc @deepthi: One other area that will be impacted by this change is when checking whether a tablet is eligible to become master I think under semi sync replication with two set of tablets SPARE and REPLICAS replicating but only allowing SPARE to become master, there could be a situation where a lagging SPARE (with respect to a REPLICA) will be promoted to MASTER causing a replica read to return must up-to-date value than the new master. This is not a problem in our case as we use AWS Aurora and we run vttablets with semi sync disabled (default). I think demoting to a SPARE should be only allowed if semi sync is disabled to avoid the problem above. I might be missing more, please let me know. |
This flag is only safe to use together with |
8d674bc
to
bb93ade
Compare
if tabletType != topodatapb.TabletType_SPARE && tabletType != topodatapb.TabletType_REPLICA { | ||
return topodatapb.TabletType_UNKNOWN, fmt.Errorf("invalid demote_master_type %v; can only be REPLICA or SPARE", tabletType) | ||
} | ||
if tabletType == topodatapb.TabletType_SPARE && !*mysqlctl.DisableActiveReparents { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should check for any value that is not REPLICA. I.e. you should not able to use this flag at all unless you also set --disable_active_reparents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if it has been passed at all in the Golang flag API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Refactored to parse flag inside validation
In our setups we use SPARE tablets to take over MASTER role during reparents or failures. Currently shard sync and external reparent demote MASTER to REPLICA. This change allows to configure the tablet type of a demoted master. SPARE and REPLICA are the only ones allowed. REPLICA is the default one if flag is not set. This can also be achieved with a boolean flag if only SPARE and REPLICA make sense as the only tablet types that can be promoted to master Signed-off-by: Karel Alfonso Sague <kalfonso@squareup.com>
bb93ade
to
c2eec14
Compare
This problem existed even before we refactored the reparenting. Do you want to handle it in this PR, or leave it for another day? If you don't include it here, could you please create an issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@morgo how do we get the github action statuses to update? they seem to be stuck. |
Build is borked. Forcing this through. |
In our setups we use SPARE tablets to take over MASTER role during reparents or failures. Currently shard sync and external reparent demote MASTER to REPLICA. This change allows to configure the tablet type of a demoted master. SPARE and REPLICA are the only ones allowed. REPLICA is the default one if flag is not set. This can also be achieved with a boolean flag if only SPARE and REPLICA make sense as the only tablet types that can be promoted to master
Signed-off-by: Karel Alfonso Sague kalfonso@squareup.com