Add replicated MySQL tutorial #1722

enisoc · 2016-11-18T02:06:55Z

cc @erictune @foxish @kow3ns @janetkuo @kubernetes/sig-apps

This is a replicated MySQL StatefulSet tutorial. It was inspired by #1599, but uses StatefulSet to achieve the following properties:

One StatefulSet can run a master and any number of slaves.
The StatefulSet can be scaled up and down.
PersistentVolumes are auto-provisioned.
New slaves perform a clone of existing data, and then start replicating from the master.
Slave pods that get rescheduled (e.g. due to Node failure) get re-linked to the stable PersistentVolumeClaim, start back up, and reconnect to replication.
If the master is restarted or rescheduled, slaves will keep retrying to connect back to it via the stable DNS name.

This example has the following (known) caveats:

There's no authentication anywhere.
Ordinal index 0 is always assumed to be the master. You must recover the master rather than failover to a slave.
It requires an image (Dockerfile included) with some extra MySQL tools installed.
If using only transactional tables (InnoDB), instance N will slow down while instance N+1 is taking a clone from it upon scaling the StatefulSet up. If using non-transactional tables (MyISAM), the whole table will be locked while being cloned.

This change is

bparees · 2016-11-18T02:34:09Z

you might also be interested in the mongodb stateful set example we did for openshift (when they were still called petsets):
https://github.com/sclorg/mongodb-container/tree/master/examples/petset

foxish · 2016-11-18T05:35:24Z

docs/tutorials/replicated-stateful-application/mysql-statefulset.yaml

@@ -0,0 +1,144 @@
+apiVersion: apps/v1beta1


Perhaps by convention, we should have the headless service be in the same file as the statefulset itself? I think most other examples do that.

In my Vitess chart, I found it useful to have multiple StatefulSets share a single headless service. Is that a reasonable pattern? If so, the convention of putting the headless service together with the StatefulSet wouldn't make sense. For the tutorial, I also like the idea of keeping the service separate to reduce noise, and so I can separately explain the parts. If we still want to stick to the convention though, I'll do that. Let me know what you think.

+1 for having both in the same file, I like the convention and it's a bit easier to create from a single file rather than creating from two separate files.

Since this initial comment thread, I've also added a client service in addition to the headless one. So now I have two services in mysql-services.yaml. I also have mysql-configmap.yaml, so 3 files total. I think if it was just a StatefulSet and one headless service, I would agree with putting them together to get to the one-file ideal. However, with 2 services and a ConfigMap, I feel like the separation adds enough clarity to be worth the extra files.

foxish · 2016-11-18T05:44:47Z

docs/tutorials/replicated-stateful-application/mysql-statefulset.yaml

+              # Generate mysql server-id from pod ordinal index.\n
+              [[ `hostname` =~ -([0-9]+)$ ]] || exit 1\n
+              echo [mysqld] > /mnt/conf.d/server-id.cnf\n
+              echo server-id=$((100 + ${BASH_REMATCH[1]})) >> /mnt/conf.d/server-id.cnf\n


Why do we do (100 + ordinal_idx)?

MySQL server id 0 is reserved, so we need some offset on the ordinal index. I didn't want to use something like ordinal+1 because it would be easy to see server id 1 and mistakenly assume that corresponds to ordinal 1. I'll add a comment somewhere nearby to make this less mysterious.

foxish · 2016-11-18T16:43:34Z

docs/tutorials/replicated-stateful-application/mysql-statefulset.yaml

+
+            echo "Initializing replication from clone position"
+            [[ `cat xtrabackup_binlog_info` =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
+            mv xtrabackup_binlog_info xtrabackup_binlog_info.orig


What happens if the container fails after this step?

I wanted the CHANGE MASTER TO to be executed at-most-once, because it's dangerous to re-point the replication position if replication has already started making progress. So, if the container fails after moving xtrabackup_binlog_info but before finishing CHANGE MASTER TO, it will leave the slave without any replication and the operator will need to resolve it.

foxish · 2016-11-18T16:43:45Z

MyISAM tables are supported here? are they just going to be locked completely when the backup is being created?
Would the constructed backups need encryption?
When this adds authentication, it should have a separate backup user-role in addition to the mysql user I assume. Is that right?
What are the limitations of this binlog based replication in comparison with that using transactions/GTID?

enisoc · 2016-11-18T19:04:46Z

Regarding the above questions:

They will probably be locked completely. To be honest I hadn't thought about MyISAM since it's essentially deprecated now.
The backups are streamed directly to the destination data-dir, so they don't need at-rest encryption. But yes, you would want to use encryption and authentication on the "stream backup" port.
Yes, there should be non-root users for replication and app access with all the grants and such. I was thinking these features would make sense in a Chart, but just add noise in a tutorial.
GTID mode would make it safer to re-point replication positions and easier to handle failover. For this tutorial, I just wanted to use default MySQL settings as much as possible to focus on explaining patterns for using StatefulSet in general. I can make sure to mention this as a caveat.

kow3ns · 2016-11-21T21:15:39Z

Reviewed 1 of 4 files at r1, 3 of 5 files at r2.
Review status: all files reviewed at latest revision, 4 unresolved discussions.

docs/tutorials/replicated-stateful-application/mysql-statefulset.yaml, line 1 at r1 (raw file):

Previously, enisoc (Anthony Yeh) wrote…

> In my Vitess chart, I found it useful to have multiple StatefulSets share a single headless service. Is that a reasonable pattern? If so, the convention of putting the headless service together with the StatefulSet wouldn't make sense. For the tutorial, I also like the idea of keeping the service separate to reduce noise, and so I can separately explain the parts. If we still want to stick to the convention though, I'll do that. Let me know what you think.

As long as both files are referenced in the tutorial it probably doesn't hurt too much to keep the service in a separate file. There are other examples in contrib that do this.

docs/tutorials/replicated-stateful-application/mysql-statefulset.yaml, line 2 at r2 (raw file):

apiVersion: apps/v1beta1
kind: StatefulSet

We discussed this offline already, but given the lack of security, when the tutorial is presented, we want to stress that this is not a production ready example.

Comments from Reviewable

enisoc · 2016-11-23T22:21:45Z

The tutorial itself is now ready for review.

enisoc · 2016-11-23T22:25:49Z

Deploy preview is here:

https://deploy-preview-1722--kubernetes-io-vnext-staging.netlify.com/docs/tutorials/replicated-stateful-application/run-replicated-stateful-application/

devin-donnelly · 2016-11-23T23:04:07Z

@steveperry-53, can you handle the doc review on this one, as it's a Tutorial?

kow3ns · 2016-11-28T23:21:16Z

Reviewed 6 of 6 files at r4, 1 of 1 files at r5.
Review status: all files reviewed at latest revision, 11 unresolved discussions.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 23 at r5 (raw file):

In particular, MySQL settings remain on insecure defaults to keep the focus
on general patterns for running stateful applications in Kubernetes.

I like the phrasing of the above, and this addresses my earlier concern.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 33 at r5 (raw file):

[Persistent Volumes](/docs/user-guide/persistent-volumes/)
and [Stateful Sets](/docs/concepts/controllers/statefulsets/),
as well as other core concepts like Pods, Services and Config Maps.

Might want to just add the hyperlinks for Pods, Services, and Config Maps.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 99 at r5 (raw file):

Note that only read queries can use the load-balanced Client Service.
Since there is only one master, clients should connect directly to the master
Pod (through its DNS entry within the Headless Service) to execute writes.

Do you mean the SRV record associated with the master? Might want to explicit.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 137 at r5 (raw file):

### Understanding stateful Pod initialization

The Stateful Set controller starts Pods one at a time, in order by their

We might want to link this to the Stateful Set concept, or the Stateful Set Basics tutorial when all three PRs get merged.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 152 at r5 (raw file):

Before starting any of the containers in the Pod spec, the Pod first runs any
[Init Containers](/docs/user-guide/production-pods/#handling-initialization)
in the order defined.

Maybe "in the order in which they are defined"

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 171 at r5 (raw file):

Since the example topology consists of a single master and any number of slaves,
the script simply assigns ordinal `0` to be the master, and everyone else to be
slaves.

You might want to expand on why its important that 0 is the canonical element of the set that is assigned to be the master. If the user fully understands how ordered Pod creation works and how MySQL master-slave replication works, they should be able to derive why this is so, but it might not hurt to be explicit.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 211 at r5 (raw file):

Also, since slaves look for the master at its stable DNS name (`mysql-0.mysql`),
they will automatically find the master even if it gets a new Pod IP due to
being rescheduled.

Do you know what the behavior is when the A record pointed to by the SRV record is removed and re-added resulting in an IP Address change (e.g. The node hosting the master fails, and it is rescheduled to a new node)? Are we sure that, after the connection between the Master and Slaves fails, the Slaves will call getbyhostname to re-resolve the IP address associated with SRV recorded and that they do not cache the previously resolved address?

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 480 at r5 (raw file):

{% capture cleanup %}

You should indicate to the user that she needs to manually delete any provisioned storage (i.e. Persistent Volumes).

Comments from Reviewable

enisoc · 2016-11-29T18:40:43Z

Review status: 7 of 10 files reviewed at latest revision, 10 unresolved discussions.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 33 at r5 (raw file):