-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async replication setup fails when there's write load to primary #399
Comments
Upstream bug report: http://bugs.mysql.com/114624 Oracles internal bug report reference: |
* poc k8s-k8s async repl * isort had mismatched line_length config * (WIP) improved support * (WIP) support more cases * scale up/down for repl clusters * idle state for 2nds * instance label when creating repl cluster * more typing * (wip) replica secondaries support * allow replica secondaries to rejoin * (WIP) sync cluster-set-name and better deal with repl 2ndaries * (WIP) fix secondaries join and messaging * fix: removed unneeded flag * support for relation broken * dissolved replica cluster stays blocked * added the promote standby action * support for cluster set name config * minor refactors and typing * async common methods, single module/file * removed dup action and lint fixes * check for user data on replica side * moved async_replication to a library (owned by vm charm) * sync with vm code * lint fixes * scale-in locks write to global primary * address pr comments * second batch of PR comments adressing * fence/unfence actions * fix for single unit replica cluster * workaround: secrets in relation data * partial unit test fixes * normalized cluster name for dict reference * remove test due to be refactored * allows rejoin after unrelate * fix lock instance reference * allow unrelated cluster rejoin * reset cluster-set name * dealing with secret not found * ensure secrets are shared with full uri * fix recreation and unrelation after promotion * ensure all online unis * unset read only after unfence * refactor remove instance to acommodate changing * fix race condition * automatic deal with clusters with the same name * bump * test for mysql version and cluster-set-name on rejoin * avoid handling on unit removal * sync local root * use method from lib * lint fixes * addressing pr feedback * using node mode instead of role * covering edge cases for recovery/failover * add rejoin invalidated cluster action * the integration test * workaround for issue #399 * bump libpatch * add group marks * missing default * set flag to avoid lock release * chore: fixes old issue on test * fix markers * fix marker import * fix retry process on removal * fix user creation after refactor `root@%` created only when required. * lint/grammar fixes * pr comment
@paulomach do we have an update from MySQL team here? I see no activity in http://bugs.mysql.com/114624 |
@taurus-forever they've pinged directly on slack. tl;dr; they cannot reproduce on sandboxed environment. There's something special about the way we are setting things. I've allocated some time to compare their instructions to the way we do it. For the record, the instructions: dba.deploy_sandbox_instance(3310, {'password': "", 'sandboxDir': "/tmp/sandbox"});
dba.deploy_sandbox_instance(3320, {'password': "", 'sandboxDir': "/tmp/sandbox"});
dba.deploy_sandbox_instance(3330, {'password': "", 'sandboxDir': "/tmp/sandbox"});
# 1. Deploy Group Replication cluster (A)
shell.connect("root@127.0.0.1:3310", "")
cluster = dba.create_cluster("my_cluster")
cluster.add_instance("127.0.0.1:3320", {'recoveryMethod':'incremental'});
cluster.add_instance("127.0.0.1:3330", {'recoveryMethod':'incremental'});
# 2. Create ClusterSet on A
clusterset = cluster.create_cluster_set("foobar")
# 3. Deploy standalone instance B1
dba.deploy_sandbox_instance(3340, {'password': "", 'sandboxDir': "/tmp/sandbox"});
# 4. Create ReplicaCluster on B1 from A (cluster_set.create_replica_cluster)
clusterset.create_replica_cluster("127.0.0.1:3340", "replica", {'recoveryMethod':'incremental'});
# 5. Add some write load to cluster A
# \sql
my_session = mysql.get_classic_session('root@localhost:3310', '');
my_session.run_sql("CREATE DATABASE test");
my_session.run_sql("USE test;");
my_session.run_sql("CREATE TABLE t1 (c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, c2 INT) ENGINE=InnoDB;");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) SELECT c2 FROM t1;");
my_session.run_sql("INSERT INTO t1 (c2) SELECT c2 FROM t1;");
my_session.run_sql("INSERT INTO t1 (c2) SELECT c2 FROM t1;");
my_session.run_sql("INSERT INTO t1 (c2) SELECT c2 FROM t1;");
my_session.run_sql("INSERT INTO t1 (c2) SELECT c2 FROM t1;"); On other terminal generate load with mysqlslap: mysqlslap -S /tmp/sandbox/3310/sandboxdata/mysqld.sock --create-schema=test --delimiter=";" --iterations=300000 --query="INSERT INTO t1 (c2) SELECT c2 FROM t1 LIMIT 100" --concurrency=100 & And on mysqlshell execute final steps:
|
Reproducible using: dba.deploy_sandbox_instance(3310, {'password': "", 'sandboxDir': "/tmp/sandbox"});
dba.deploy_sandbox_instance(3320, {'password': "", 'sandboxDir': "/tmp/sandbox"});
dba.deploy_sandbox_instance(3330, {'password': "", 'sandboxDir': "/tmp/sandbox"});
# 1. Deploy Group Replication cluster (A)
shell.connect("root@127.0.0.1:3310", "")
cluster = dba.create_cluster("my_cluster")
cluster.add_instance("127.0.0.1:3320", {'recoveryMethod':'incremental'});
cluster.add_instance("127.0.0.1:3330", {'recoveryMethod':'incremental'});
# 2. Create ClusterSet on A
clusterset = cluster.create_cluster_set("foobar")
# 3. Deploy standalone instance B1
dba.deploy_sandbox_instance(3340, {'password': "", 'sandboxDir': "/tmp/sandbox"});
# 4. Create ReplicaCluster on B1 from A (cluster_set.create_replica_cluster)
clusterset.create_replica_cluster("127.0.0.1:3340", "replica", {'recoveryMethod':'incremental'});
# 5. Add some write load to cluster A
# \sql
my_session = mysql.get_classic_session('root@localhost:3310', '');
my_session.run_sql("CREATE DATABASE test");
my_session.run_sql("USE test;");
my_session.run_sql("CREATE TABLE t1 (c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, c2 INT) ENGINE=InnoDB;");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) values (2);");
my_session.run_sql("INSERT INTO t1 (c2) SELECT c2 FROM t1;");
my_session.run_sql("INSERT INTO t1 (c2) SELECT c2 FROM t1;");
my_session.run_sql("INSERT INTO t1 (c2) SELECT c2 FROM t1;");
my_session.run_sql("INSERT INTO t1 (c2) SELECT c2 FROM t1;");
my_session.run_sql("INSERT INTO t1 (c2) SELECT c2 FROM t1;"); On other terminal generate load with mysqlslap: mysqlslap -S /tmp/sandbox/3310/sandboxdata/mysqld.sock --create-schema=test --delimiter=";" --iterations=300000 --query="INSERT INTO t1 (c2) SELECT c2 FROM t1 LIMIT 100" --concurrency=100 & And on mysqlshell execute final steps: shell.connect("root@127.0.0.1:3340", "")
dba.deploy_sandbox_instance(3350, {'password': "", 'sandboxDir': "/tmp/sandbox"});
# First difference - I'm adding instances to the replica cluster
cluster = dba.get_cluster("replica")
cluster.add_instance("127.0.0.1:3350", {'recoveryMethod':'incremental'})
# Second difference - sometimes It was needed to add a third instance to the replica cluster
dba.deploy_sandbox_instance(3360, {'password': "", 'sandboxDir': "/tmp/sandbox"});
cluster.add_instance("127.0.0.1:3360", {'recoveryMethod':'incremental'})
session.run_sql("SELECT max(c1) FROM test.t1")
# PK stop incrementing while mysqlslap is running
# connect to `my_cluster` primary
shell.connect("root@127.0.0.1:3310", "")
# PK still incrementing while mysqlslap is running
session.run_sql("SELECT max(c1) FROM test.t1") |
Steps to reproduce
Expected behavior
Cluster set is formed
Actual behavior
Joining instances on secondary cluster fail to join the cluster
The text was updated successfully, but these errors were encountered: