Let the orchestrator handle the read-only/writable of nodes #627

AMecea · 2020-11-03T18:26:51Z

We would prefer to manage the readable/writable state of mysql instances outside of the operator. On the rare occasion (for us) that the entire cluster must be set to RO, we will do this ourselves. In all other scenarios, we are used to trusting Orchestrator to do the right thing.

When we introduced the readOnly in #100 we wanted to have a quarantee that when readOnly is True the cluster will not get writable again but in case of a failover the orchestrator will set new master writable without knowing about readOnly: True. So in #222 we forced the orchestrator to do the failover but not apply promotion (which will not set node writable).

There are a few issues with this approach:

The nodes are left in a bad configuration (detacted replication) resulting in the following error messages:

2020-10-15T21:40:51.873465Z 576 [ERROR] Slave I/O for channel '': error connecting to master 'sys_replication@//$release-mysqlcluster-db-mysql-0.mysql.$namespace:3306' - retry-time: 1  retries: 1755, Error_code: 2005

See the other replication issues related to this: #613, #608, #588, #565

This process of setting cluster writable brokes Orchestrator failover process, see markReadOnlyNodesInOrc broke gracefulfailover process #566
Poor documentation Q: orchestrator failover defaults #482

The fix that I propose here is to weaken the guarantees of readOnly attribute (and document it) and let the Orchestrator handle read-only/ writable nodes by configuring it as follows:

ApplyMySQLPromotionAfterMasterFailover: true
MasterFailoverDetachReplicaMasterHost: false

For this we need to "revert" #222 and refactor #100 in a less invasive way. An implementation is already started in #522.

The text was updated successfully, but these errors were encountered:

dougfales · 2020-11-11T17:02:50Z

Thanks for writing up this issue, @AMecea! Unfortunately, I have not had a chance to finish making a global ignore-read-only option as we discussed in #522. I am still willing to do that, but I am not sure how much time I will have before the end of the year.

I had a few comments on what you mentioned above...

ApplyMySQLPromotionAfterMasterFailover: true

I think this is a great idea. For us, letting Orchestrator manage failover is one of the biggest benefits of using it. I'm guessing many other users of the operator would feel the same way. When we enabled this, @stankevich dealt with making sure automatic promotion worked well with the operator-managed cluster. I'll let him comment on any operational considerations or best practices that users would need to think about when running in this mode.

MasterFailoverDetachReplicaMasterHost: false

I don't think this is strictly necessary because this option is meaningless when ApplyMySQLPromotionAfterMasterFailover, at least according to this comment in the code and elsewhere in the documentation.

I think building this ignore-read-only setting could end up being a good default for many users. Maybe we could also consider highlighting it in the documentation, or making it the actual default mode, at some point in the future. Thanks for writing up this issue @AMecea!

jianhaiqing · 2020-12-03T11:57:53Z

@AMecea I agree with you, as i describe in #566, Read_only mode might be broken by orchestrator unexpected. We should trust orchestrator can handle read_only state rightly. Users can set the cluster read_only mode by manager process if they want.

AMecea mentioned this issue Nov 3, 2020

Reset replication on master after failover #617

Closed

1 task

AMecea mentioned this issue Mar 4, 2021

Make the readOnly guarantee weeker and let the Orchestrator handle it #659

Merged

1 task

AMecea closed this as completed in #659 Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let the orchestrator handle the read-only/writable of nodes #627

Let the orchestrator handle the read-only/writable of nodes #627

AMecea commented Nov 3, 2020

dougfales commented Nov 11, 2020

jianhaiqing commented Dec 3, 2020

Let the orchestrator handle the read-only/writable of nodes #627

Let the orchestrator handle the read-only/writable of nodes #627

Comments

AMecea commented Nov 3, 2020

dougfales commented Nov 11, 2020

jianhaiqing commented Dec 3, 2020