You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A deadlock can take place if a concurrent Admin connection query the stats database meanwhile other Admin session is serving a query that requires calling GenericRefreshStatistics.
ProxySQL version
Affected versions v2.5.2 to v2.5.5.
OS version
ArchLinux
Detailed explanation
The deadlock takes place when the following flow is exercised:
A first session is being processed by the Admin interface, the query in this session is forcing a stats refresh via GenericRefreshStatistics. This calls stats___proxysql_servers_checksums, starting a transaction against SQLitehere.
A second session is waiting in Admin global mutex sql_query_global_mutex, for now, let's assume this session query is SELECT * FROM stats_mysql_gtid_executed, more details about why later.
Several lines later, the first session unlocks global Admin mutex, preventing a potential deadlock with Cluster.
Since the mutex is unlocked, the second session is able to take it, and continue its processing. The previously started transaction on SQLite connection for statsdb object, is still open.
Since the query being executed in the second session is stats_mysql_gtid_executed, no COMMITs operations using the same connection (statsdb), are found in the execution flow, before reaching ProxySQL_Admin::vacuum_stats. This is crucial for the deadlock to be triggered, because otherwise the created transaction in the first session would have been closed by this one.
Since both are Admin sessions (no sessions using stats user), when the first query tries to be executed in ProxySQL_Admin::vacuum_stats, it will try to do so using the admindb connection to SQLite, since there is a transaction already started by the first session in statsdb connection, SQLite will respond to this execution with error SQLITE_LOCKED.
ProxySQL will keep retrying the operation in hope for a temporary lock, waiting for the other transaction to finish, but since the first session is locked in Admin global mutex, that will never happen.
In another terminal use the supplied script that opens to Admin connections:
./admin_conns_br.sh
The script just performs the following operations:
mysql --prompt="admin> " -uadmin -padmin -h127.0.0.1 -P6032 -e "SELECT * FROM stats_proxysql_servers_checksums" &
sleep 1
mysql --prompt="admin> " -uadmin -padmin -h127.0.0.1 -P6032 -e "SELECT * FROM stats_mysql_gtid_executed" &
wait
This will trigger the breakpoints defined in init.gdb in the intended sequence. Once the script is waiting, we must go back to gdb and make the stopped threads continue in the following order:
Thread in Breakpoint 1, at ProxySQL_Admin::stats___proxysql_servers_checksums
Thread at Breakpoint 2, at ProxySQL_Admin::vacuum_stats
Example:
Thread 17 "proxysql" hit Breakpoint 1, ProxySQL_Admin::stats___proxysql_servers_checksums (this=0x7ffff6c59c00) at ProxySQL_Admin.cpp:9730
9730 resultset=GloProxyCluster->get_stats_proxysql_servers_checksums();
(gdb)
Thread 18 "proxysql" hit Breakpoint 2, ProxySQL_Admin::vacuum_stats (this=0x7ffff6c59c00, is_admin=true) at ProxySQL_Admin.cpp:5424
5424 for (auto it = tablenames.begin(); it != tablenames.end(); it++) {
Quit
(gdb) thread 17
[Switching to thread 17 (Thread 0x7fffee7ff6c0 (LWP 2732346))]
#0 ProxySQL_Admin::stats___proxysql_servers_checksums (this=0x7ffff6c59c00) at ProxySQL_Admin.cpp:9730
9730 resultset=GloProxyCluster->get_stats_proxysql_servers_checksums();
(gdb) c
Continuing.
^C
Thread 1 "proxysql" received signal SIGINT, Interrupt.
0x00007ffff76f37f5 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fffffffc840, rem=rem@entry=0x0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48 r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,
(gdb) c -a
Continuing.
At this point the Admin interface will be deadlock, meaning the previous connections will be stuck, and no other Admin connections will be accepted. Example of thread status in this scenario:
Thread 19 (Thread 0x7fffef5ff6c0 (LWP 2732596) "proxysql"):
#0 futex_wait (private=0, expected=2, futex_word=0x7ffff6c59f00) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x7ffff6c59f00, private=0) at lowlevellock.c:49
#2 0x00007ffff76adec2 in lll_mutex_lock_optimized (mutex=0x7ffff6c59f00) at pthread_mutex_lock.c:48
#3 ___pthread_mutex_lock (mutex=0x7ffff6c59f00) at pthread_mutex_lock.c:93
#4 0x0000555555c4e8fe in admin_session_handler (sess=0x7fffee80c000, _pa=0x7ffff6c59c00, pkt=0x7fffee80c1a0) at ProxySQL_Admin.cpp:3745
#5 0x0000555555beb554 in MySQL_Session::handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY___not_mysql (this=0x7fffee80c000, pkt=...) at MySQL_Session.cpp:3532
#6 0x0000555555bed51a in MySQL_Session::get_pkts_from_client (this=0x7fffee80c000, wrong_pass=@0x7fffef5fcc1f: false, pkt=...) at MySQL_Session.cpp:4058
#7 0x0000555555bf06f7 in MySQL_Session::handler (this=0x7fffee80c000) at MySQL_Session.cpp:4797
#8 0x0000555555c5801d in child_mysql (arg=0x7fffefa00010) at ProxySQL_Admin.cpp:5545
#9 0x00007ffff76aa9eb in start_thread (arg=<optimized out>) at pthread_create.c:444
#10 0x00007ffff772e7cc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 18 (Thread 0x7fffed9ff6c0 (LWP 2732351) "proxysql"):
#0 0x00007ffff76f37f5 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fffed9fbb20, rem=rem@entry=0x0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
#1 0x00007ffff77058c7 in __GI___nanosleep (req=req@entry=0x7fffed9fbb20, rem=rem@entry=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
#2 0x00007ffff772e349 in usleep (useconds=<optimized out>) at ../sysdeps/posix/usleep.c:31
#3 0x0000555555b302a7 in SQLite3DB::execute (this=0x7ffff6da2230, str=0x7fffecc0c7c0 "DELETE FROM stats.stats_mysql_commands_counters") at sqlite3db.cpp:175
#4 0x0000555555c578fa in ProxySQL_Admin::vacuum_stats (this=0x7ffff6c59c00, is_admin=true) at ProxySQL_Admin.cpp:5428
#5 0x0000555555c56bbf in admin_session_handler (sess=0x7fffecc13000, _pa=0x7ffff6c59c00, pkt=0x7fffecc131a0) at ProxySQL_Admin.cpp:5365
#6 0x0000555555beb554 in MySQL_Session::handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY___not_mysql (this=0x7fffecc13000, pkt=...) at MySQL_Session.cpp:3532
#7 0x0000555555bed51a in MySQL_Session::get_pkts_from_client (this=0x7fffecc13000, wrong_pass=@0x7fffed9fcc1f: false, pkt=...) at MySQL_Session.cpp:4058
#8 0x0000555555bf06f7 in MySQL_Session::handler (this=0x7fffecc13000) at MySQL_Session.cpp:4797
#9 0x0000555555c5801d in child_mysql (arg=0x7fffefa00460) at ProxySQL_Admin.cpp:5545
#10 0x00007ffff76aa9eb in start_thread (arg=<optimized out>) at pthread_create.c:444
#11 0x00007ffff772e7cc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 17 (Thread 0x7fffee7ff6c0 (LWP 2732346) "proxysql"):
#0 futex_wait (private=0, expected=2, futex_word=0x7ffff6c59f00) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x7ffff6c59f00, private=0) at lowlevellock.c:49
#2 0x00007ffff76adec2 in lll_mutex_lock_optimized (mutex=0x7ffff6c59f00) at pthread_mutex_lock.c:48
#3 ___pthread_mutex_lock (mutex=0x7ffff6c59f00) at pthread_mutex_lock.c:93
#4 0x0000555555c77384 in ProxySQL_Admin::stats___proxysql_servers_checksums (this=0x7ffff6c59c00) at ProxySQL_Admin.cpp:9731
#5 0x0000555555c4d14d in ProxySQL_Admin::GenericRefreshStatistics (this=0x7ffff6c59c00, query_no_space=0x7fffeda00150 "SELECT * FROM stats_proxysql_servers_checksums", query_no_space_length=46, admin=true) at ProxySQL_Admin.cpp:3385
#6 0x0000555555c50de8 in admin_session_handler (sess=0x7fffeda29000, _pa=0x7ffff6c59c00, pkt=0x7fffeda291a0) at ProxySQL_Admin.cpp:4294
#7 0x0000555555beb554 in MySQL_Session::handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY___not_mysql (this=0x7fffeda29000, pkt=...) at MySQL_Session.cpp:3532
#8 0x0000555555bed51a in MySQL_Session::get_pkts_from_client (this=0x7fffeda29000, wrong_pass=@0x7fffee7fcc1f: false, pkt=...) at MySQL_Session.cpp:4058
#9 0x0000555555bf06f7 in MySQL_Session::handler (this=0x7fffeda29000) at MySQL_Session.cpp:4797
#10 0x0000555555c5801d in child_mysql (arg=0x7fffefa004a0) at ProxySQL_Admin.cpp:5545
#11 0x00007ffff76aa9eb in start_thread (arg=<optimized out>) at pthread_create.c:444
#12 0x00007ffff772e7cc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
In the previous output we can see Threads 18 and Thread 17 in the deadlock, while Thread 19 is a new connection waiting for the lock that won't be released.
The text was updated successfully, but these errors were encountered:
A deadlock can take place if a concurrent
Admin
connection query thestats
database meanwhile otherAdmin
session is serving a query that requires callingGenericRefreshStatistics
.Affected versions
v2.5.2
tov2.5.5
.ArchLinux
The deadlock takes place when the following flow is exercised:
Admin
interface, the query in this session is forcing a stats refresh viaGenericRefreshStatistics
. This callsstats___proxysql_servers_checksums
, starting a transaction againstSQLite
here.Admin
global mutexsql_query_global_mutex
, for now, let's assume this session query isSELECT * FROM stats_mysql_gtid_executed
, more details about why later.Admin
mutex, preventing a potential deadlock withCluster
.SQLite
connection forstatsdb
object, is still open.stats_mysql_gtid_executed
, noCOMMITs
operations using the same connection (statsdb
), are found in the execution flow, before reachingProxySQL_Admin::vacuum_stats
. This is crucial for the deadlock to be triggered, because otherwise the created transaction in the first session would have been closed by this one.Admin
sessions (no sessions usingstats
user), when the first query tries to be executed in ProxySQL_Admin::vacuum_stats, it will try to do so using theadmindb
connection toSQLite
, since there is a transaction already started by the first session instatsdb
connection,SQLite
will respond to this execution with errorSQLITE_LOCKED
.ProxySQL
will keep retrying the operation in hope for a temporary lock, waiting for the other transaction to finish, but since the first session is locked inAdmin
global mutex, that will never happen.Files: repro.tar.gz
Using the provided
init.gdb
startProxySQL
withgdb
via the following command:In another terminal use the supplied script that opens to
Admin
connections:The script just performs the following operations:
This will trigger the breakpoints defined in
init.gdb
in the intended sequence. Once the script is waiting, we must go back togdb
and make the stopped threads continue in the following order:Breakpoint 1
, atProxySQL_Admin::stats___proxysql_servers_checksums
Breakpoint 2
, atProxySQL_Admin::vacuum_stats
Example:
At this point the
Admin
interface will be deadlock, meaning the previous connections will be stuck, and no otherAdmin
connections will be accepted. Example of thread status in this scenario:In the previous output we can see
Threads 18
andThread 17
in the deadlock, whileThread 19
is a new connection waiting for the lock that won't be released.The text was updated successfully, but these errors were encountered: