-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PS-3345: LP #1527463: Waiting for binlog lock (5.7) #3426
Conversation
# When it finds the deadlock, it throws assert. | ||
################################################################################ | ||
|
||
--source include/have_debug.inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also
--source include/have_debug_sync.inc
--source include/have_innodb.inc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
--source include/rpl_connection_slave.inc | ||
--source include/only_mts_slave_parallel_workers.inc | ||
--source include/only_mts_slave_parallel_type_logical_clock.inc | ||
--source include/stop_slave_sql.inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider --let $rpl_skip_start_slave= 1
before including master-slave.inc
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test needs stop_slave_sql.inc
as a sync point.
--echo # | ||
--source include/rpl_connection_master.inc | ||
CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT, INDEX(c2)) ENGINE = InnoDB; | ||
SET debug = "+d,set_commit_parent_100"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this 2 INSERTs are processed with a single worker thread.
sql/binlog.cc
Outdated
{ | ||
Slave_worker *worker= dynamic_cast<Slave_worker *>(thd->rli_slave); | ||
|
||
static bool skip_first_query= true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sue if it really works when test is run second time without server restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I improved this a little bit and verified that it works fine with --repeat
with and without deadlock.
Fix 3-way deadlock that can be achieved with 2 slave threads working and parallel and with 1 slave client that executes LOCK BINLOG FOR BACKUP. And the deadlock is: worker0: applying INSERT INTO t1 VALUES(11, NULL); worker1: applying INSERT INTO t1 VALUES(12, NULL); worker1: calls backup_binlog_lock.acquire_protection() worker1: waits for worker0 in wait_for_its_turn() client: executes LOCK BINLOG FOR BACKUP client: waits in backup_binlog_lock.acquire(), but protection is acquired by worker1 worker0: calls backup_binlog_lock.acquire_protection(), but it's blocked by client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fix 3-way deadlock that can be achieved with 2 slave threads working and parallel and with 1 slave client that executes LOCK BINLOG FOR BACKUP.
And the deadlock is:
worker0: applying INSERT INTO t1 VALUES(11, NULL);
worker1: applying INSERT INTO t1 VALUES(12, NULL);
worker1: calls backup_binlog_lock.acquire_protection()
worker1: waits for worker0 in wait_for_its_turn()
client: executes LOCK BINLOG FOR BACKUP
client: waits in backup_binlog_lock.acquire(), but protection is acquired by worker1
worker0: calls backup_binlog_lock.acquire_protection(), but it's blocked by client