Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix START/STOP SLAVE deadlock caused by slave stats daemon #1377

Open
wants to merge 1 commit into
base: fb-mysql-8.0.32
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 12 additions & 13 deletions sql/rpl_replica.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2261,6 +2261,10 @@ int terminate_slave_threads(Master_info *mi, int thread_mask,

if (thread_mask & (SLAVE_IO | SLAVE_FORCE_ALL)) {
DBUG_PRINT("info", ("Terminating IO thread"));

// stop sending secondary lag stats to primary
stop_handle_slave_stats_daemon();

mi->abort_slave = true;
DBUG_EXECUTE_IF("pause_after_queue_event",
{ rpl_replica_debug_point(DBUG_RPL_S_PAUSE_QUEUE_EV); });
Expand Down Expand Up @@ -2601,10 +2605,17 @@ bool start_slave_threads(bool need_lock_slave, bool wait_for_start,
lock_cond_sql = &mi->rli->run_lock;
}

if ((thread_mask & SLAVE_IO) && !enable_raft_plugin)
if ((thread_mask & SLAVE_IO) && !enable_raft_plugin) {
is_error = start_slave_thread(key_thread_replica_io, handle_slave_io,
lock_io, lock_cond_io, cond_io,
&mi->slave_running, &mi->slave_run_id, mi);
if (!is_error) {
// clean up - stop previous run of slave_stats_daemon, if any
stop_handle_slave_stats_daemon();
// start sending secondary lag stats to primary
start_handle_slave_stats_daemon();
}
}

if (!is_error && (thread_mask & (SLAVE_IO | SLAVE_MONITOR)) &&
mi->is_source_connection_auto_failover() &&
Expand Down Expand Up @@ -5952,7 +5963,6 @@ extern "C" void *handle_slave_io(void *arg) {
uint retry_count;
bool suppress_warnings;
int ret;
bool slave_stats_daemon_created = false;
Global_THD_manager *thd_manager = Global_THD_manager::get_instance();
// needs to call my_thread_init(), otherwise we get a coredump in DBUG_ stuff
my_thread_init();
Expand Down Expand Up @@ -6118,12 +6128,6 @@ extern "C" void *handle_slave_io(void *arg) {
goto connected;
}

if (!slave_stats_daemon_created) {
// clean up - stop previous run of slave_stats_daemon, if any
stop_handle_slave_stats_daemon();
// start sending secondary lag stats to primary
slave_stats_daemon_created = start_handle_slave_stats_daemon();
}
DBUG_PRINT("info", ("Starting reading binary log from master"));
while (!io_slave_killed(thd, mi)) {
MYSQL_RPL rpl;
Expand Down Expand Up @@ -6345,11 +6349,6 @@ extern "C" void *handle_slave_io(void *arg) {

// error = 0;
err:
if (slave_stats_daemon_created) {
// stop sending secondary lag stats to primary
stop_handle_slave_stats_daemon();
}

/*
If source_connection_auto_failover (async connection failover) is
enabled, this server is not a Group Replication SECONDARY and
Expand Down
4 changes: 1 addition & 3 deletions sql/slave_stats_daemon.cc
Original file line number Diff line number Diff line change
Expand Up @@ -230,18 +230,16 @@ static void *handle_slave_stats_daemon(void *arg MY_ATTRIBUTE((unused))) {
bool start_handle_slave_stats_daemon() {
DBUG_ENTER("start_handle_slave_stats_daemon");

channel_map.rdlock();
channel_map.assert_some_lock();
Copy link
Contributor

@george-reynya george-reynya Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this working when start_handle_slave_stats_daemon() is called from register_raft_followers()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this working when start_handle_slave_stats_daemon() is called from register_raft_followers()?

@george-reynya , is there any way I can check this with the open source build?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, is the problem that you don't have the raft plugin? Let me check what the options are.

if (channel_map.get_num_instances() != 1) {
// more than one channels exists for this slave. We only support
// single source slave topologies for now. Skip creating the thread.
sql_print_information(
"Number of channels = %lu. There should be only one channel"
" with slave_stats_daemon. Not creating the thread.",
channel_map.get_num_instances());
channel_map.unlock();
DBUG_RETURN(false);
}
channel_map.unlock();

my_thread_handle thread_handle;
slave_stats_daemon_thread_counter++;
Expand Down