-
Notifications
You must be signed in to change notification settings - Fork 936
Add basic support for group replication #1180
Add basic support for group replication #1180
Conversation
cc @dveeden , @sjmudd , @luisyonaldo |
* Update docs to include information on partial support for GR.
I think group replication is an important feature for Orchestrator as it looks like this is at the core of the long term strategy of Oracle MySQL. Group Replication is a core component of InnoDB Cluster. Note that some info about group replication is available in the same way as normal replication. There is nothing in @ejortegau, Could you squash your commits? Or are there multiple on purpose? |
There are multiple on purpose, as I have worked on it and added more and more support so far. I figure they would/could be squashed once the merge is accepted. |
Would this work ok with a group in multi primary mode? https://dev.mysql.com/doc/refman/8.0/en/group-replication-multi-primary-mode.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work ok with a group in multi primary mode?
https://dev.mysql.com/doc/refman/8.0/en/group-replication-multi-primary-mode.html
It will not. I have missed pushing an update to the FAQ, indicating this. For now, this only works with single primary mode.
For groups we should draw a line around them and maybe show the name of the group. With D3.js this can be done with The I also notice that D3.js v3 is used, which is two versions behind the latest major release (D3.js v5). The code for the example is in: Some of this could be done outside of this PR, but we should make sure that the nodelist is populated with group info to allow the frontend part to be done later on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only an initial, partial review. I'll need to revisit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further comments. Review is not yet complete.
Thank you!
go/inst/instance_dao.go
Outdated
if instance.ReplicationGroupName != "" && instance.ReplicationGroupMemberState != "ONLINE" { | ||
instance.Problems = append(instance.Problems, "group_replication_member_not_online") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we "upgrade" this to be analyzed in instance.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move it there, though I thought this was a better place for it. From what I see, instance.go
does not populate problems anywhere; instead, they seem to be populated here.
* Add group replication fields to database_instance table. * Add group replication fields to Instance, and a function to evaluate whether a host is member of a replication group or not. * Modifued `Instance.IsMaster()` to not consider a host as a Master if it is a secondary member of a replication group. * Defined constants for group replication member roles and states. * `ReadTopologyInstanceBufferable()` now attempts to populate group replication fields in Oracle MySQL >= 5.7 * `DiscoverInstance()` not attempts to discover not only an instance's slave hosts but also the member of its replication group
* Adde methods to evaluate whether an instance is either the primary or secondary of a replication group. * Replication group members with secondary role now get their cluster attributes populated from the group's primary. The group's primary gets them, in turn, from its async/semi-sync master (if any). * Group replication attributes from instances are now read from the backend DB, and therefore, are now correctly returned by the API.
* Replication group secondary members now are shown in the UI as replicating from the group primary.
* Stop considering GTIDs coming from the group as errant GTIDs. While at it, fix tiny typo.
* Disallow setting a group primary to replicate from a secondary.
* Split DB migration adding multiple columns in a single ALTER statement to make CI happy.
* Add instance icon showing whether the host is a replication group member as well as its group role and state.
* Replication group members that have been expelled from the group are still identified as part of the same cluster and shown in the topology, instead of appearing as separate chains.
* Group members that are not ONLINE are now exposed as having problems through the API. * The web UI now shows replication group members that have been expelled by the group majority. * Group members that are not ONLINE are now shown in the web UI in either red or orange depending on their state in the group (orange when RECOVERING, red when OFFLINE or in ERROR).
* Update docs to include information on partial support for GR.
* Fix unit tests that were broken due to addition of new columns to `database_instance` table.
* Address some MR comments and failing test.
* Revert bad change to DB migration. * Fix SIGSEV introduced by bad placement of `defer rows.Close()`.
* Fix integration testing that was failing due to missing value on `INSERT` statement for non nullable column `replication_group_members` of table `database_instance` which has no default.
* Instance attribute renamed from `ReplicationGroupPrimaryKey` to `ReplicationGroupPrimaryInstanceKey`. * Use `Instance.Equals()`. * Easier to read `Instance.IsMaster()`. * Don't blindly assume that any error coming from executing a query to find out group replication attributes for an instance means that the instance does not support group replication. Instead, check error codes to determine whether the error comes from that or from something else, and have different behavior depending on the answer. * Fix issue leading to random detection of group secondaries as not group members, coming from the fact that, while attempting to find the member in the records of `performance_schema.replication_group_members`, the required attribute for server UUID had not yet been populated. This should also reduce the loss of parallelism previously present, as new WaitGroup only waits for determining the server UUID instead of for all standing routines to finish. * Prevent replication refactoring operations from taking place for group secondaries since they make no sense for them. * Group secondaries with replicas under them no longer are shown as with a `NoWriteableMasterStructureWarning` problem.
* Remove no longer needed `WaitGroup.wait()`.
86197c5
to
105cdc6
Compare
* Fix formatting issues.
* Re-order of GR constants to have most frequently used ones above.
Sorry for the delays. I hope to be able to re-review and merge early next week (Sunday/Monday) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. I've made a few formatting changes.
AND ( | ||
master_instance.replication_group_name = '' | ||
OR master_instance.replication_group_member_role = 'PRIMARY' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
No support has been added (yet) to handling group member failure. If all you have is a single replication group, this is fine, because you don't need it; the group will handle all failures as long as it can secure a majority. | ||
|
||
If, however, you have the primary of a group as a replica to another instance; or you have replicas under your group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
` | ||
ALTER TABLE | ||
database_instance | ||
ADD COLUMN replication_group_primary_port smallint(5) unsigned NOT NULL DEFAULT 0 AFTER replication_group_primary_host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
ReplicationGroupMembers InstanceKeyMap | ||
|
||
// Primary of the replication group | ||
ReplicationGroupPrimaryInstanceKey InstanceKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Problems: []string{}, | ||
Replicas: make(map[InstanceKey]bool), | ||
ReplicationGroupMembers: make(map[InstanceKey]bool), | ||
Problems: []string{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
instance.ReplicationDepth = replicationDepth | ||
instance.IsCoMaster = isCoMaster | ||
instance.AncestryUUID = ancestryUUID | ||
instance.masterExecutedGtidSet = masterExecutedGtidSet | ||
instance.masterExecutedGtidSet = masterOrGroupPrimaryExecutedGtidSet | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// Group replication problems | ||
if instance.ReplicationGroupName != "" && instance.ReplicationGroupMemberState != GroupReplicationMemberStateOnline { | ||
instance.Problems = append(instance.Problems, "group_replication_member_not_online") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
"replication_group_member_role", | ||
"replication_group_members", | ||
"replication_group_primary_host", | ||
"replication_group_primary_port", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
args = append(args, instance.ReplicationGroupMemberRole) | ||
args = append(args, instance.ReplicationGroupMembers.ToJSONString()) | ||
args = append(args, instance.ReplicationGroupPrimaryInstanceKey.Hostname) | ||
args = append(args, instance.ReplicationGroupPrimaryInstanceKey.Port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
"part of a replication group", instance.Key) | ||
} | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I find when MGR changed master, orchestrator has no call PostFailoverProcesses hooks. |
orchestrator monitor mysql group replication(MGR),when MGR PRIMARY changed, can orchestrator update consul key-value info for this MGR cluster? |
Related issue: #1179
Description
This PR adds some initial support for group replication in orchestrator.
ToDo:
go test ./go/...
master
branch) is successfulAssume a 3 member group, plus an async slave replicating from one of them. Orchestrator shows all three members as separate clusters, as it does not understand group replication and there is no traditional replication configured in any group member. The async slave is shown as a slave in the cluster whose master is the group member the slave is set up to replicate from.
Orchestrator also shows the async slave with a problem of errant GTIDs.
With this PR, some basic support for single primary replication groups is added. So, a replication group shown like this in mysqlshell:
Is now understood by orchestrator, as two group secondary group members that replicate from the primary. In addition, the async/replica continues to be shown normally:
Notice also that the errant GTID problem is not shown.
The UI also shows group membership information, including:
the primary and secondary instances so that they can be easily identified.
On top of this, certain replication operations are prevented from taking place. Namely, attempting to relocate group secondaries to replicate from outside the group, since they actually replicate from the primary, as well as attempting to set-up a group primary to replicate from a secondary.