Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let the orchestrator handle the read-only/writable of nodes #627

Closed
AMecea opened this issue Nov 3, 2020 · 2 comments · Fixed by #659
Closed

Let the orchestrator handle the read-only/writable of nodes #627

AMecea opened this issue Nov 3, 2020 · 2 comments · Fixed by #659

Comments

@AMecea
Copy link
Contributor

AMecea commented Nov 3, 2020

As @dougfales put it:

We would prefer to manage the readable/writable state of mysql instances outside of the operator. On the rare occasion (for us) that the entire cluster must be set to RO, we will do this ourselves. In all other scenarios, we are used to trusting Orchestrator to do the right thing.

When we introduced the readOnly in #100 we wanted to have a quarantee that when readOnly is True the cluster will not get writable again but in case of a failover the orchestrator will set new master writable without knowing about readOnly: True. So in #222 we forced the orchestrator to do the failover but not apply promotion (which will not set node writable).

There are a few issues with this approach:

  • The nodes are left in a bad configuration (detacted replication) resulting in the following error messages:
2020-10-15T21:40:51.873465Z 576 [ERROR] Slave I/O for channel '': error connecting to master 'sys_replication@//$release-mysqlcluster-db-mysql-0.mysql.$namespace:3306' - retry-time: 1  retries: 1755, Error_code: 2005

See the other replication issues related to this: #613, #608, #588, #565

The fix that I propose here is to weaken the guarantees of readOnly attribute (and document it) and let the Orchestrator handle read-only/ writable nodes by configuring it as follows:

ApplyMySQLPromotionAfterMasterFailover: true
MasterFailoverDetachReplicaMasterHost: false

For this we need to "revert" #222 and refactor #100 in a less invasive way. An implementation is already started in #522.

@dougfales
Copy link
Contributor

Thanks for writing up this issue, @AMecea! Unfortunately, I have not had a chance to finish making a global ignore-read-only option as we discussed in #522. I am still willing to do that, but I am not sure how much time I will have before the end of the year.

I had a few comments on what you mentioned above...

ApplyMySQLPromotionAfterMasterFailover: true

I think this is a great idea. For us, letting Orchestrator manage failover is one of the biggest benefits of using it. I'm guessing many other users of the operator would feel the same way. When we enabled this, @stankevich dealt with making sure automatic promotion worked well with the operator-managed cluster. I'll let him comment on any operational considerations or best practices that users would need to think about when running in this mode.

MasterFailoverDetachReplicaMasterHost: false

I don't think this is strictly necessary because this option is meaningless when ApplyMySQLPromotionAfterMasterFailover, at least according to this comment in the code and elsewhere in the documentation.

I think building this ignore-read-only setting could end up being a good default for many users. Maybe we could also consider highlighting it in the documentation, or making it the actual default mode, at some point in the future. Thanks for writing up this issue @AMecea!

@jianhaiqing
Copy link
Contributor

@AMecea I agree with you, as i describe in #566, Read_only mode might be broken by orchestrator unexpected. We should trust orchestrator can handle read_only state rightly. Users can set the cluster read_only mode by manager process if they want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants