Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-3345: LP #1527463: Waiting for binlog lock (5.7) #3426

Merged
merged 1 commit into from
Oct 9, 2019
Merged

PS-3345: LP #1527463: Waiting for binlog lock (5.7) #3426

merged 1 commit into from
Oct 9, 2019

Conversation

inikep
Copy link
Collaborator

@inikep inikep commented Sep 12, 2019

Fix 3-way deadlock that can be achieved with 2 slave threads working and parallel and with 1 slave client that executes LOCK BINLOG FOR BACKUP.

And the deadlock is:
worker0: applying INSERT INTO t1 VALUES(11, NULL);
worker1: applying INSERT INTO t1 VALUES(12, NULL);
worker1: calls backup_binlog_lock.acquire_protection()
worker1: waits for worker0 in wait_for_its_turn()
client: executes LOCK BINLOG FOR BACKUP
client: waits in backup_binlog_lock.acquire(), but protection is acquired by worker1
worker0: calls backup_binlog_lock.acquire_protection(), but it's blocked by client

@inikep inikep requested a review from gl-sergei September 12, 2019 12:19
@inikep
Copy link
Collaborator Author

inikep commented Sep 13, 2019

# When it finds the deadlock, it throws assert.
################################################################################

--source include/have_debug.inc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also

--source include/have_debug_sync.inc
--source include/have_innodb.inc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

--source include/rpl_connection_slave.inc
--source include/only_mts_slave_parallel_workers.inc
--source include/only_mts_slave_parallel_type_logical_clock.inc
--source include/stop_slave_sql.inc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider --let $rpl_skip_start_slave= 1 before including master-slave.inc instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test needs stop_slave_sql.inc as a sync point.

--echo #
--source include/rpl_connection_master.inc
CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT, INDEX(c2)) ENGINE = InnoDB;
SET debug = "+d,set_commit_parent_100";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this 2 INSERTs are processed with a single worker thread.

sql/binlog.cc Outdated
{
Slave_worker *worker= dynamic_cast<Slave_worker *>(thd->rli_slave);

static bool skip_first_query= true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sue if it really works when test is run second time without server restart.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improved this a little bit and verified that it works fine with --repeat with and without deadlock.

@inikep
Copy link
Collaborator Author

inikep commented Sep 18, 2019

Fix 3-way deadlock that can be achieved with 2 slave threads working and parallel and with 1 slave client that executes LOCK BINLOG FOR BACKUP.

And the deadlock is:
worker0: applying INSERT INTO t1 VALUES(11, NULL);
worker1: applying INSERT INTO t1 VALUES(12, NULL);
worker1: calls backup_binlog_lock.acquire_protection()
worker1: waits for worker0 in wait_for_its_turn()
client: executes LOCK BINLOG FOR BACKUP
client: waits in backup_binlog_lock.acquire(), but protection is acquired by worker1
worker0: calls backup_binlog_lock.acquire_protection(), but it's blocked by client
@inikep
Copy link
Collaborator Author

inikep commented Oct 9, 2019

Copy link
Collaborator

@percona-ysorokin percona-ysorokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@inikep inikep merged commit 5952017 into percona:5.7 Oct 9, 2019
@inikep inikep deleted the PS-3345-5.7 branch October 9, 2019 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants