forked from percona/percona-server
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PS-7865 Dropping a table with discarded tablespace crashes the server #2
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…h_drop_page_hash_index
inikep
pushed a commit
that referenced
this pull request
Jun 1, 2022
…ILER WARNINGS Remove some stringop-truncation warning using cstrbuf. Change-Id: I3ab43f6dd8c8b0b784d919211b041ac3ad4fad40
inikep
pushed a commit
that referenced
this pull request
Jun 1, 2022
Patch #1 caused several problems in mysql-trunk related to ndbinfo initialization and upgrade, including the failure of the test ndb_76_inplace_upgrade and the failure of all NDB MTR tests in Pushbuild on Windows. This patch fixes these issues, including fixes for bug#33726826 and bug#33730799. In ndbinfo, revert the removal of ndb$blocks and ndb$index_stats and the change of blocks and index_stats from views to tables. Improve the ndbinfo schema upgrade & initialization logic to better handle such a change in the future. This logic now runs in two passes: first it drops the known tables and views from current and previous versions, then it creates the tables and views for the current version. Add a new class method NdbDictionary::printColumnTypeDescription(). This is needed for the ndbinfo.columns table in patch #2 but was missing from patch #1. Add boilerplate index lookup initialization code that was also missing. Fix ndbinfo prefix determination on Windows. Change-Id: I422856bcad4baf5ae9b14c1e3a1f2871bd6c5f59
inikep
pushed a commit
that referenced
this pull request
Jun 7, 2022
A subset of binlog encryption tests was crashing with: * thread percona#39, stop reason = signal SIGSTOP frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278 frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254 frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1 frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93 frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70 frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180 frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267 * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113 frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128 frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756 frame percona#13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382 frame percona#14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836 frame percona#15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340 frame percona#16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377 frame percona#17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13 This was caused by my_b_append_tell trying to lock a nullptr IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the IO_CACHE type from the former to the latter. Fix by using the correct API for the new IO_CACHE type: my_b_tell instead of my_b_append_tell. [1]: commit dbd2ca2 Author: Joao Gramacho <joao.gramacho@oracle.com> Date: Tue Nov 1 06:45:39 2016 +0000 WL#8599: Reduce contention in IO and SQL threads (...)
inikep
pushed a commit
that referenced
this pull request
Jun 7, 2022
create_table_info_t::create_table_def leaked memory in the case enable_encryption(table) call failed: worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests: binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation -------------------------------------------------------------------------- ==14131==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1136 byte(s) in 1 object(s) allocated from: #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608 #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281 #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464 #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349 #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420 #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805 #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756 #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156 #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991 #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118 #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200 #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950 percona#13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319 percona#14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417 percona#15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139 percona#16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295 percona#17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882 percona#18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352 percona#19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812 percona#20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912 percona#21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 percona#22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Fix by adding the missing mem_heap_free(heap) call.
inikep
pushed a commit
that referenced
this pull request
Jun 7, 2022
Avoid undefined behavior in audit_log_update_thd_local by avoiding passing NULL as source pointer to memcpy, even with zero length. The UBSan report fixed is /usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987 #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105 #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284 #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322 #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340 #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438 #7 0x1350071 in check_connection sql/sql_connect.cc:868 #8 0x1350071 in login_connection sql/sql_connect.cc:929 #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084 #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313 #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) percona#13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
inikep
pushed a commit
that referenced
this pull request
Jun 7, 2022
A subset of binlog encryption tests was crashing with: * thread percona#39, stop reason = signal SIGSTOP frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278 frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254 frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1 frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93 frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70 frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180 frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267 * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113 frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128 frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756 frame percona#13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382 frame percona#14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836 frame percona#15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340 frame percona#16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377 frame percona#17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13 This was caused by my_b_append_tell trying to lock a nullptr IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the IO_CACHE type from the former to the latter. Fix by using the correct API for the new IO_CACHE type: my_b_tell instead of my_b_append_tell. [1]: commit dbd2ca2 Author: Joao Gramacho <joao.gramacho@oracle.com> Date: Tue Nov 1 06:45:39 2016 +0000 WL#8599: Reduce contention in IO and SQL threads (...)
inikep
pushed a commit
that referenced
this pull request
Jun 7, 2022
create_table_info_t::create_table_def leaked memory in the case enable_encryption(table) call failed: worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests: binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation -------------------------------------------------------------------------- ==14131==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1136 byte(s) in 1 object(s) allocated from: #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608 #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281 #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464 #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349 #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420 #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805 #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756 #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156 #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991 #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118 #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200 #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950 percona#13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319 percona#14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417 percona#15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139 percona#16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295 percona#17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882 percona#18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352 percona#19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812 percona#20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912 percona#21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 percona#22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Fix by adding the missing mem_heap_free(heap) call.
inikep
pushed a commit
that referenced
this pull request
Jun 7, 2022
Avoid undefined behavior in audit_log_update_thd_local by avoiding passing NULL as source pointer to memcpy, even with zero length. The UBSan report fixed is /usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987 #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105 #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284 #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322 #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340 #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438 #7 0x1350071 in check_connection sql/sql_connect.cc:868 #8 0x1350071 in login_connection sql/sql_connect.cc:929 #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084 #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313 #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) percona#13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
inikep
pushed a commit
that referenced
this pull request
Jun 10, 2022
A subset of binlog encryption tests was crashing with: * thread percona#39, stop reason = signal SIGSTOP frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278 frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254 frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1 frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93 frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70 frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180 frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267 * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113 frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128 frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756 frame percona#13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382 frame percona#14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836 frame percona#15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340 frame percona#16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377 frame percona#17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13 This was caused by my_b_append_tell trying to lock a nullptr IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the IO_CACHE type from the former to the latter. Fix by using the correct API for the new IO_CACHE type: my_b_tell instead of my_b_append_tell. [1]: commit dbd2ca2 Author: Joao Gramacho <joao.gramacho@oracle.com> Date: Tue Nov 1 06:45:39 2016 +0000 WL#8599: Reduce contention in IO and SQL threads (...)
inikep
pushed a commit
that referenced
this pull request
Jun 10, 2022
create_table_info_t::create_table_def leaked memory in the case enable_encryption(table) call failed: worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests: binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation -------------------------------------------------------------------------- ==14131==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1136 byte(s) in 1 object(s) allocated from: #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608 #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281 #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464 #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349 #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420 #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805 #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756 #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156 #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991 #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118 #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200 #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950 percona#13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319 percona#14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417 percona#15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139 percona#16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295 percona#17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882 percona#18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352 percona#19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812 percona#20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912 percona#21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 percona#22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Fix by adding the missing mem_heap_free(heap) call.
inikep
pushed a commit
that referenced
this pull request
Jun 10, 2022
Avoid undefined behavior in audit_log_update_thd_local by avoiding passing NULL as source pointer to memcpy, even with zero length. The UBSan report fixed is /usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987 #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105 #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284 #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322 #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340 #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438 #7 0x1350071 in check_connection sql/sql_connect.cc:868 #8 0x1350071 in login_connection sql/sql_connect.cc:929 #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084 #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313 #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) percona#13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
inikep
pushed a commit
that referenced
this pull request
Jun 13, 2022
A subset of binlog encryption tests was crashing with: * thread percona#39, stop reason = signal SIGSTOP frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278 frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254 frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1 frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93 frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70 frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180 frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267 * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113 frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128 frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756 frame percona#13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382 frame percona#14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836 frame percona#15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340 frame percona#16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377 frame percona#17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13 This was caused by my_b_append_tell trying to lock a nullptr IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the IO_CACHE type from the former to the latter. Fix by using the correct API for the new IO_CACHE type: my_b_tell instead of my_b_append_tell. [1]: commit dbd2ca2 Author: Joao Gramacho <joao.gramacho@oracle.com> Date: Tue Nov 1 06:45:39 2016 +0000 WL#8599: Reduce contention in IO and SQL threads (...)
inikep
pushed a commit
that referenced
this pull request
Jun 13, 2022
create_table_info_t::create_table_def leaked memory in the case enable_encryption(table) call failed: worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests: binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation -------------------------------------------------------------------------- ==14131==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1136 byte(s) in 1 object(s) allocated from: #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608 #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281 #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464 #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349 #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420 #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805 #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756 #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156 #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991 #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118 #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200 #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950 percona#13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319 percona#14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417 percona#15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139 percona#16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295 percona#17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882 percona#18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352 percona#19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812 percona#20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912 percona#21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 percona#22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Fix by adding the missing mem_heap_free(heap) call.
inikep
pushed a commit
that referenced
this pull request
Jun 13, 2022
Avoid undefined behavior in audit_log_update_thd_local by avoiding passing NULL as source pointer to memcpy, even with zero length. The UBSan report fixed is /usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987 #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105 #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284 #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322 #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340 #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438 #7 0x1350071 in check_connection sql/sql_connect.cc:868 #8 0x1350071 in login_connection sql/sql_connect.cc:929 #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084 #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313 #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) percona#13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
inikep
pushed a commit
that referenced
this pull request
Jun 13, 2022
A subset of binlog encryption tests was crashing with: * thread percona#39, stop reason = signal SIGSTOP frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278 frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254 frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1 frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93 frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70 frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180 frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267 * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113 frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128 frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756 frame percona#13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382 frame percona#14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836 frame percona#15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340 frame percona#16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377 frame percona#17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13 This was caused by my_b_append_tell trying to lock a nullptr IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the IO_CACHE type from the former to the latter. Fix by using the correct API for the new IO_CACHE type: my_b_tell instead of my_b_append_tell. [1]: commit dbd2ca2 Author: Joao Gramacho <joao.gramacho@oracle.com> Date: Tue Nov 1 06:45:39 2016 +0000 WL#8599: Reduce contention in IO and SQL threads (...)
inikep
pushed a commit
that referenced
this pull request
Jun 13, 2022
create_table_info_t::create_table_def leaked memory in the case enable_encryption(table) call failed: worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests: binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation -------------------------------------------------------------------------- ==14131==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1136 byte(s) in 1 object(s) allocated from: #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608 #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281 #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464 #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349 #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420 #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805 #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756 #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156 #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991 #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118 #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200 #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950 percona#13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319 percona#14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417 percona#15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139 percona#16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295 percona#17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882 percona#18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352 percona#19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812 percona#20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912 percona#21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 percona#22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Fix by adding the missing mem_heap_free(heap) call.
inikep
pushed a commit
that referenced
this pull request
Jun 13, 2022
Avoid undefined behavior in audit_log_update_thd_local by avoiding passing NULL as source pointer to memcpy, even with zero length. The UBSan report fixed is /usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987 #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105 #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284 #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322 #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340 #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438 #7 0x1350071 in check_connection sql/sql_connect.cc:868 #8 0x1350071 in login_connection sql/sql_connect.cc:929 #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084 #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313 #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) percona#13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
inikep
pushed a commit
that referenced
this pull request
Jun 14, 2022
1. (Mostly) fix the audit log plugin: - set correct flags for PFS memory instrumentation; - fix undefined behavior in make_argv; - fix plugin declaration having swapped deinit and check-deinit function pointers; - fix an include in filter.cc; - re-record some testcases. 2. Fix audit log plugin command filtering. The initial 8.0 port dropped the lowercasing of passed filter strings by mistake. Fix by restoring it. At the same time notice that MySQL collations charsets are not interesting for command names, which are 7-bit ASCII, and so replace collation_unordered_set uses with simpler malloc_unordered_set. 3. audit_log testcases under 8.0 tend to produce control characters more often, due to CREATE USER ... IDENTIFIED AS hashes containing them. There is no good handling option for them in XML 1.0, thus 1) as a lesser evil, print them as numeric sequences anyway, and replace \0 with ?; 2) patch testcases not to produce control characters if the output is to be consumed by an XML parser. 4. Avoid undefined behavior in audit_log_update_thd_local by avoiding passing NULL as source pointer to memcpy, even with zero length. The UBSan report fixed is /usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987 #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105 #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284 #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322 #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340 #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438 #7 0x1350071 in check_connection sql/sql_connect.cc:868 #8 0x1350071 in login_connection sql/sql_connect.cc:929 #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084 #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313 #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) percona#13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
inikep
added a commit
that referenced
this pull request
Jun 14, 2022
A subset of binlog encryption tests was crashing with: * thread percona#39, stop reason = signal SIGSTOP frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278 frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254 frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1 frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93 frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70 frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180 frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267 * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113 frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128 frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756 frame percona#13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382 frame percona#14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836 frame percona#15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340 frame percona#16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377 frame percona#17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13 This was caused by my_b_append_tell trying to lock a nullptr IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the IO_CACHE type from the former to the latter. Fix by using the correct API for the new IO_CACHE type: my_b_tell instead of my_b_append_tell. [1]: commit dbd2ca2 Author: Joao Gramacho <joao.gramacho@oracle.com> Date: Tue Nov 1 06:45:39 2016 +0000 WL#8599: Reduce contention in IO and SQL threads (...)
inikep
added a commit
that referenced
this pull request
Jun 14, 2022
1. Fix merge error that broke row_log_online_op by making it write unencrypted blocks even with encryption enabled 2. Plug a memory leak in log_online_setup_bitmap_file_range introduced by a rewrite in 8.0 to use my_dir: use my_dirend. 3. Plug a memory leak in log_online_read_init introduced by a rewrite in 8.0 to use my_dir: use my_dirend. 4. create_table_info_t::create_table_def leaked memory in the case enable_encryption(table) call failed: worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests: binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation -------------------------------------------------------------------------- ==14131==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1136 byte(s) in 1 object(s) allocated from: #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608 #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281 #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464 #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349 #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420 #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805 #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756 #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156 #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991 #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118 #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200 #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950 percona#13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319 percona#14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417 percona#15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139 percona#16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295 percona#17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882 percona#18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352 percona#19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812 percona#20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912 percona#21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 percona#22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Fix by adding the missing mem_heap_free(heap) call. 5. Fix merge error which resulted in a single LRU manager thread being created as opposed to one per buffer pool instance. At the same time trivially fix one of the multiple-buffer-pool-instance testcases. 6. Partially fix changed page tracking: - avoid deadlock on Fil_shard mutex between server shutdown thread and changed page tracking by moving the Fil_system::wait_for_changed_page_tracker call outside this mutex critical section; - extend recv_read_log_seg with a new argument bool online, which is false during recovery and true for changed page tracking reads. In function body, use it to guard log_background_threads_inactive_validate call (as the background threads are active during changed page tracking), and to acquire the log_writer mutex, which is not held for the changed page tracking; - fix MIN_TRACKED_LSN to stop being OS_FILE_LOG_BLOCK_SIZE-too large; - take last checkpoint LSN to start tracking from, instead of the larger of that LSN and MIN_TRACKED_LSN, as the checkpoint LSN is always valid at that point; - strengthen asserts in log_online_add_to_parse_buf.
inikep
pushed a commit
that referenced
this pull request
Jun 14, 2022
1. (Mostly) fix the audit log plugin: - set correct flags for PFS memory instrumentation; - fix undefined behavior in make_argv; - fix plugin declaration having swapped deinit and check-deinit function pointers; - fix an include in filter.cc; - re-record some testcases. 2. Fix audit log plugin command filtering. The initial 8.0 port dropped the lowercasing of passed filter strings by mistake. Fix by restoring it. At the same time notice that MySQL collations charsets are not interesting for command names, which are 7-bit ASCII, and so replace collation_unordered_set uses with simpler malloc_unordered_set. 3. audit_log testcases under 8.0 tend to produce control characters more often, due to CREATE USER ... IDENTIFIED AS hashes containing them. There is no good handling option for them in XML 1.0, thus 1) as a lesser evil, print them as numeric sequences anyway, and replace \0 with ?; 2) patch testcases not to produce control characters if the output is to be consumed by an XML parser. 4. audit_log_rotations and audit_log_rotate_on_size were messed up when set at runtime. 5. Avoid undefined behavior in audit_log_update_thd_local by avoiding passing NULL as source pointer to memcpy, even with zero length. The UBSan report fixed is /usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987 #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105 #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284 #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322 #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340 #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438 #7 0x1350071 in check_connection sql/sql_connect.cc:868 #8 0x1350071 in login_connection sql/sql_connect.cc:929 #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084 #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313 #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) percona#13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
inikep
added a commit
that referenced
this pull request
Jun 14, 2022
A subset of binlog encryption tests was crashing with: * thread percona#39, stop reason = signal SIGSTOP frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278 frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254 frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1 frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93 frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70 frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180 frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267 * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113 frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128 frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756 frame percona#13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382 frame percona#14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836 frame percona#15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340 frame percona#16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377 frame percona#17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13 This was caused by my_b_append_tell trying to lock a nullptr IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the IO_CACHE type from the former to the latter. Fix by using the correct API for the new IO_CACHE type: my_b_tell instead of my_b_append_tell. [1]: commit dbd2ca2 Author: Joao Gramacho <joao.gramacho@oracle.com> Date: Tue Nov 1 06:45:39 2016 +0000 WL#8599: Reduce contention in IO and SQL threads (...)
inikep
added a commit
that referenced
this pull request
Jun 14, 2022
1. Fix merge error that broke row_log_online_op by making it write unencrypted blocks even with encryption enabled 2. Plug a memory leak in log_online_setup_bitmap_file_range introduced by a rewrite in 8.0 to use my_dir: use my_dirend. 3. Plug a memory leak in log_online_read_init introduced by a rewrite in 8.0 to use my_dir: use my_dirend. 4. create_table_info_t::create_table_def leaked memory in the case enable_encryption(table) call failed: worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests: binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation -------------------------------------------------------------------------- ==14131==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1136 byte(s) in 1 object(s) allocated from: #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608 #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281 #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464 #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349 #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420 #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805 #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756 #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156 #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991 #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118 #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200 #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950 percona#13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319 percona#14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417 percona#15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139 percona#16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295 percona#17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882 percona#18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352 percona#19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812 percona#20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912 percona#21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 percona#22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Fix by adding the missing mem_heap_free(heap) call. 5. Fix merge error which resulted in a single LRU manager thread being created as opposed to one per buffer pool instance. At the same time trivially fix one of the multiple-buffer-pool-instance testcases. 6. Partially fix changed page tracking: - avoid deadlock on Fil_shard mutex between server shutdown thread and changed page tracking by moving the Fil_system::wait_for_changed_page_tracker call outside this mutex critical section; - extend recv_read_log_seg with a new argument bool online, which is false during recovery and true for changed page tracking reads. In function body, use it to guard log_background_threads_inactive_validate call (as the background threads are active during changed page tracking), and to acquire the log_writer mutex, which is not held for the changed page tracking; - fix MIN_TRACKED_LSN to stop being OS_FILE_LOG_BLOCK_SIZE-too large; - take last checkpoint LSN to start tracking from, instead of the larger of that LSN and MIN_TRACKED_LSN, as the checkpoint LSN is always valid at that point; - strengthen asserts in log_online_add_to_parse_buf.
inikep
pushed a commit
that referenced
this pull request
Jun 15, 2022
1. (Mostly) fix the audit log plugin: - set correct flags for PFS memory instrumentation; - fix undefined behavior in make_argv; - fix plugin declaration having swapped deinit and check-deinit function pointers; - fix an include in filter.cc; - re-record some testcases. 2. Fix audit log plugin command filtering. The initial 8.0 port dropped the lowercasing of passed filter strings by mistake. Fix by restoring it. At the same time notice that MySQL collations charsets are not interesting for command names, which are 7-bit ASCII, and so replace collation_unordered_set uses with simpler malloc_unordered_set. 3. audit_log testcases under 8.0 tend to produce control characters more often, due to CREATE USER ... IDENTIFIED AS hashes containing them. There is no good handling option for them in XML 1.0, thus 1) as a lesser evil, print them as numeric sequences anyway, and replace \0 with ?; 2) patch testcases not to produce control characters if the output is to be consumed by an XML parser. 4. audit_log_rotations and audit_log_rotate_on_size were messed up when set at runtime. 5. Avoid undefined behavior in audit_log_update_thd_local by avoiding passing NULL as source pointer to memcpy, even with zero length. The UBSan report fixed is /usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987 #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105 #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284 #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322 #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340 #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438 #7 0x1350071 in check_connection sql/sql_connect.cc:868 #8 0x1350071 in login_connection sql/sql_connect.cc:929 #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084 #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313 #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) percona#13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
inikep
added a commit
that referenced
this pull request
Jun 15, 2022
A subset of binlog encryption tests was crashing with: * thread percona#39, stop reason = signal SIGSTOP frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278 frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254 frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1 frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93 frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70 frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180 frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267 * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113 frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128 frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756 frame percona#13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382 frame percona#14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836 frame percona#15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340 frame percona#16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377 frame percona#17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13 This was caused by my_b_append_tell trying to lock a nullptr IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the IO_CACHE type from the former to the latter. Fix by using the correct API for the new IO_CACHE type: my_b_tell instead of my_b_append_tell. [1]: commit dbd2ca2 Author: Joao Gramacho <joao.gramacho@oracle.com> Date: Tue Nov 1 06:45:39 2016 +0000 WL#8599: Reduce contention in IO and SQL threads (...)
inikep
added a commit
that referenced
this pull request
Jun 15, 2022
1. Fix merge error that broke row_log_online_op by making it write unencrypted blocks even with encryption enabled 2. Plug a memory leak in log_online_setup_bitmap_file_range introduced by a rewrite in 8.0 to use my_dir: use my_dirend. 3. Plug a memory leak in log_online_read_init introduced by a rewrite in 8.0 to use my_dir: use my_dirend. 4. create_table_info_t::create_table_def leaked memory in the case enable_encryption(table) call failed: worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests: binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation -------------------------------------------------------------------------- ==14131==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1136 byte(s) in 1 object(s) allocated from: #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608 #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281 #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464 #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349 #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420 #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805 #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756 #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156 #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991 #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118 #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200 #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950 percona#13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319 percona#14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417 percona#15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139 percona#16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295 percona#17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882 percona#18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352 percona#19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812 percona#20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912 percona#21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 percona#22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Fix by adding the missing mem_heap_free(heap) call. 5. Fix merge error which resulted in a single LRU manager thread being created as opposed to one per buffer pool instance. At the same time trivially fix one of the multiple-buffer-pool-instance testcases. 6. Partially fix changed page tracking: - avoid deadlock on Fil_shard mutex between server shutdown thread and changed page tracking by moving the Fil_system::wait_for_changed_page_tracker call outside this mutex critical section; - extend recv_read_log_seg with a new argument bool online, which is false during recovery and true for changed page tracking reads. In function body, use it to guard log_background_threads_inactive_validate call (as the background threads are active during changed page tracking), and to acquire the log_writer mutex, which is not held for the changed page tracking; - fix MIN_TRACKED_LSN to stop being OS_FILE_LOG_BLOCK_SIZE-too large; - take last checkpoint LSN to start tracking from, instead of the larger of that LSN and MIN_TRACKED_LSN, as the checkpoint LSN is always valid at that point; - strengthen asserts in log_online_add_to_parse_buf.
inikep
pushed a commit
that referenced
this pull request
Jun 15, 2022
1. (Mostly) fix the audit log plugin: - set correct flags for PFS memory instrumentation; - fix undefined behavior in make_argv; - fix plugin declaration having swapped deinit and check-deinit function pointers; - fix an include in filter.cc; - re-record some testcases. 2. Fix audit log plugin command filtering. The initial 8.0 port dropped the lowercasing of passed filter strings by mistake. Fix by restoring it. At the same time notice that MySQL collations charsets are not interesting for command names, which are 7-bit ASCII, and so replace collation_unordered_set uses with simpler malloc_unordered_set. 3. audit_log testcases under 8.0 tend to produce control characters more often, due to CREATE USER ... IDENTIFIED AS hashes containing them. There is no good handling option for them in XML 1.0, thus 1) as a lesser evil, print them as numeric sequences anyway, and replace \0 with ?; 2) patch testcases not to produce control characters if the output is to be consumed by an XML parser. 4. audit_log_rotations and audit_log_rotate_on_size were messed up when set at runtime. 5. Avoid undefined behavior in audit_log_update_thd_local by avoiding passing NULL as source pointer to memcpy, even with zero length. The UBSan report fixed is /usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987 #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105 #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284 #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322 #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340 #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438 #7 0x1350071 in check_connection sql/sql_connect.cc:868 #8 0x1350071 in login_connection sql/sql_connect.cc:929 #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084 #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313 #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) percona#13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
inikep
added a commit
that referenced
this pull request
Nov 14, 2024
PS-5741: Incorrect use of memset_s in keyring_vault. Fixed the usage of memset_s. The arguments should be: void memset_s(void *dest, size_t dest_max, int c, size_t n) where the 2nd argument is size of buffer and the 3rd is argument is character to fill. --------------------------------------------------------------------------- PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate --- *Problem:* `st_mysql_value::val_str` might return a pointer to `buf` which after the function called is deleted. Therefore the value in `save`, after reuturnin from the function, is invalid. In this particular case, the error is not manifesting as val_str` returns memory allocated with `thd_strmake` and it does not use `buf`. *Solution:* Allocate memory with `thd_strmake` so the memory in `save` is not local. --------------------------------------------------------------------------- Fix test main.bug12969156 when WITH_ASAN=ON *Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable. --------------------------------------------------------------------------- PS-8204: Fix XML escape rules for audit plugin https://jira.percona.com/browse/PS-8204 There was a wrong length specified for some XML escape rules. As a result of this terminating null symbol from replacement rule was copied into resulting string. This lead to quer text truncation in audit log file. In addition added empty replacement rules for '\b' and 'f' symbols which just remove them from resulting string. These symboles are not supported in XML 1.0. --------------------------------------------------------------------------- PS-8854: Add main.percona_udf MTR test Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions. --------------------------------------------------------------------------- PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build) https://perconadev.atlassian.net/browse/PS-9218
inikep
pushed a commit
that referenced
this pull request
Nov 14, 2024
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc. ---------------------------------------------------------------------- PS-9218: Merge MySQL 8.4.0 (fix terminology in replication tests) https://perconadev.atlassian.net/browse/PS-9218 mysql/mysql-server@44a77b5
inikep
added a commit
that referenced
this pull request
Nov 18, 2024
Import all functions and changes required to compile MyRocks (-DWITH_ROCKSDB=1). ---------------------------------------------------------------------- PS-9218 merge: Merge MySQL 8.4.0 (GCC 12.3 RelWithDebInfo fixes) (percona#5356) https://perconadev.atlassian.net/browse/PS-9218 Added more warning suppressions for RocksDB submodule files that appeared in GCC 12.3 in RelWithDebInfo mode. ---------------------------------------------------------------------- PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build) #2 (rocksdb part)
inikep
pushed a commit
that referenced
this pull request
Nov 18, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
added a commit
that referenced
this pull request
Nov 18, 2024
PS-5741: Incorrect use of memset_s in keyring_vault. Fixed the usage of memset_s. The arguments should be: void memset_s(void *dest, size_t dest_max, int c, size_t n) where the 2nd argument is size of buffer and the 3rd is argument is character to fill. --------------------------------------------------------------------------- PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate --- *Problem:* `st_mysql_value::val_str` might return a pointer to `buf` which after the function called is deleted. Therefore the value in `save`, after reuturnin from the function, is invalid. In this particular case, the error is not manifesting as val_str` returns memory allocated with `thd_strmake` and it does not use `buf`. *Solution:* Allocate memory with `thd_strmake` so the memory in `save` is not local. --------------------------------------------------------------------------- Fix test main.bug12969156 when WITH_ASAN=ON *Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable. --------------------------------------------------------------------------- PS-8204: Fix XML escape rules for audit plugin https://jira.percona.com/browse/PS-8204 There was a wrong length specified for some XML escape rules. As a result of this terminating null symbol from replacement rule was copied into resulting string. This lead to quer text truncation in audit log file. In addition added empty replacement rules for '\b' and 'f' symbols which just remove them from resulting string. These symboles are not supported in XML 1.0. --------------------------------------------------------------------------- PS-8854: Add main.percona_udf MTR test Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions. --------------------------------------------------------------------------- PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build) https://perconadev.atlassian.net/browse/PS-9218
inikep
pushed a commit
that referenced
this pull request
Nov 18, 2024
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc. ---------------------------------------------------------------------- PS-9218: Merge MySQL 8.4.0 (fix terminology in replication tests) https://perconadev.atlassian.net/browse/PS-9218 mysql/mysql-server@44a77b5
inikep
pushed a commit
that referenced
this pull request
Jan 23, 2025
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c
inikep
pushed a commit
that referenced
this pull request
Jan 23, 2025
Upstream commit ID : fb-mysql-5.6.35/911d1a387a0d80f3ba52b7432c1abdbd7e8cb220 PS-6867 : Merge fb-prod201905 Summary: Missed a few in earlier fixes for AutoInitCopy rule. Also added a few fixes for anoymous class rule and local shadowing rule. Reviewed By: luqun Differential Revision: D15467213 fbshipit-source-id: 9325852dbdd
inikep
added a commit
that referenced
this pull request
Jan 23, 2025
Import all functions and changes required to compile MyRocks (-DWITH_ROCKSDB=1). ---------------------------------------------------------------------- PS-9218 merge: Merge MySQL 8.4.0 (GCC 12.3 RelWithDebInfo fixes) (percona#5356) https://perconadev.atlassian.net/browse/PS-9218 Added more warning suppressions for RocksDB submodule files that appeared in GCC 12.3 in RelWithDebInfo mode. ---------------------------------------------------------------------- PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build) #2 (rocksdb part)
inikep
pushed a commit
that referenced
this pull request
Jan 23, 2025
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
that referenced
this pull request
Jan 23, 2025
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c
inikep
added a commit
that referenced
this pull request
Jan 23, 2025
PS-5741: Incorrect use of memset_s in keyring_vault. Fixed the usage of memset_s. The arguments should be: void memset_s(void *dest, size_t dest_max, int c, size_t n) where the 2nd argument is size of buffer and the 3rd is argument is character to fill. --------------------------------------------------------------------------- PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate --- *Problem:* `st_mysql_value::val_str` might return a pointer to `buf` which after the function called is deleted. Therefore the value in `save`, after reuturnin from the function, is invalid. In this particular case, the error is not manifesting as val_str` returns memory allocated with `thd_strmake` and it does not use `buf`. *Solution:* Allocate memory with `thd_strmake` so the memory in `save` is not local. --------------------------------------------------------------------------- Fix test main.bug12969156 when WITH_ASAN=ON *Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable. --------------------------------------------------------------------------- PS-8204: Fix XML escape rules for audit plugin https://jira.percona.com/browse/PS-8204 There was a wrong length specified for some XML escape rules. As a result of this terminating null symbol from replacement rule was copied into resulting string. This lead to quer text truncation in audit log file. In addition added empty replacement rules for '\b' and 'f' symbols which just remove them from resulting string. These symboles are not supported in XML 1.0. --------------------------------------------------------------------------- PS-8854: Add main.percona_udf MTR test Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions. --------------------------------------------------------------------------- PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build) https://perconadev.atlassian.net/browse/PS-9218
inikep
pushed a commit
that referenced
this pull request
Jan 23, 2025
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc. ---------------------------------------------------------------------- PS-9218: Merge MySQL 8.4.0 (fix terminology in replication tests) https://perconadev.atlassian.net/browse/PS-9218 mysql/mysql-server@44a77b5
inikep
pushed a commit
that referenced
this pull request
Jan 23, 2025
Upstream commit ID : fb-mysql-5.6.35/911d1a387a0d80f3ba52b7432c1abdbd7e8cb220 PS-6867 : Merge fb-prod201905 Summary: Missed a few in earlier fixes for AutoInitCopy rule. Also added a few fixes for anoymous class rule and local shadowing rule. Reviewed By: luqun Differential Revision: D15467213 fbshipit-source-id: 9325852dbdd
inikep
added a commit
that referenced
this pull request
Jan 23, 2025
PS-5741: Incorrect use of memset_s in keyring_vault. Fixed the usage of memset_s. The arguments should be: void memset_s(void *dest, size_t dest_max, int c, size_t n) where the 2nd argument is size of buffer and the 3rd is argument is character to fill. --------------------------------------------------------------------------- PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate --- *Problem:* `st_mysql_value::val_str` might return a pointer to `buf` which after the function called is deleted. Therefore the value in `save`, after reuturnin from the function, is invalid. In this particular case, the error is not manifesting as val_str` returns memory allocated with `thd_strmake` and it does not use `buf`. *Solution:* Allocate memory with `thd_strmake` so the memory in `save` is not local. --------------------------------------------------------------------------- Fix test main.bug12969156 when WITH_ASAN=ON *Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable. --------------------------------------------------------------------------- PS-8204: Fix XML escape rules for audit plugin https://jira.percona.com/browse/PS-8204 There was a wrong length specified for some XML escape rules. As a result of this terminating null symbol from replacement rule was copied into resulting string. This lead to quer text truncation in audit log file. In addition added empty replacement rules for '\b' and 'f' symbols which just remove them from resulting string. These symboles are not supported in XML 1.0. --------------------------------------------------------------------------- PS-8854: Add main.percona_udf MTR test Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions. --------------------------------------------------------------------------- PS-9369: Fix currently processed query comparison in audit_log https://perconadev.atlassian.net/browse/PS-9369 The audit_log uses stack to keep track of table access operations being performed in scope of one query. It compares last known table access query string stored on top of this stack with actual query in audit event being processed at the moment to decide if new record should be pushed to stack or it is time to clean records from the stack. Currently audit_log simply compares char* variables to decide if this is the same query string. This approach doesn't work. As a result plugin looses control of the stack size and it starts growing with the time consuming memory. This issue is not noticable on short term server connections as memory is freed once connection is closed. At the same time this leads to extra memory consumption for long running server connections. The following is done to fix the issue: - Query is sent along with audit event as MYSQL_LEX_CSTRING structure. It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised properly. Added string length check to make sure structure contains any valid string. - Used strncmp to compare actual strings instead of comparing char* variables.
inikep
pushed a commit
that referenced
this pull request
Jan 23, 2025
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc.
inikep
pushed a commit
that referenced
this pull request
Jan 23, 2025
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
that referenced
this pull request
Jan 27, 2025
Description: ============ Dropping a primary key and adding a new auto-increment column as a primary key in descending order using the "inplace" algorithm fails. Analysis: ========= Dropping an existing primary key and adding a new auto-increment key in descending order requires arranging the records in reverse order, which necessitates a file sort. However, this scenario was not detected in the method innobase_pk_order_preserved(), causing it to return false. As a result, the ALTER INPLACE operation, which calls this method, skips the file sort. Instead, it processes the primary key as usual in batches, a method known as bulk mode. In bulk mode, records are inserted into a sort buffer (in descending order in this case). When the sort buffer becomes full, records are directly inserted into the B-tree. Consider a case where we have 2000 records, and the sort buffer can hold 1000 records in a batch: Batch #1 inserted: Records 1000 to 1 (in descending order) Batch #2 inserted: Records 2000 to 1001 (in descending order) If the records from both batches happen to be in the same page, the record order is violated. It's important to note that this record order violation would still exist even if the sort buffer were skipped when file sort was skipped. Therefore, enabling file sort is essential to ensure correct record order across batches. Fix: ==== Enable file sort when add autoinc descending. This patch is based on the contribution from Shaohua Wang at Alibaba Group. We thank you for contributing to MySQL. Change-Id: I398173bbd27db7f5e29218d217bf11c30297c242
inikep
pushed a commit
that referenced
this pull request
Jan 27, 2025
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c
inikep
pushed a commit
that referenced
this pull request
Jan 27, 2025
Upstream commit ID : fb-mysql-5.6.35/911d1a387a0d80f3ba52b7432c1abdbd7e8cb220 PS-6867 : Merge fb-prod201905 Summary: Missed a few in earlier fixes for AutoInitCopy rule. Also added a few fixes for anoymous class rule and local shadowing rule. Reviewed By: luqun Differential Revision: D15467213 fbshipit-source-id: 9325852dbdd
inikep
added a commit
that referenced
this pull request
Jan 27, 2025
PS-5741: Incorrect use of memset_s in keyring_vault. Fixed the usage of memset_s. The arguments should be: void memset_s(void *dest, size_t dest_max, int c, size_t n) where the 2nd argument is size of buffer and the 3rd is argument is character to fill. --------------------------------------------------------------------------- PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate --- *Problem:* `st_mysql_value::val_str` might return a pointer to `buf` which after the function called is deleted. Therefore the value in `save`, after reuturnin from the function, is invalid. In this particular case, the error is not manifesting as val_str` returns memory allocated with `thd_strmake` and it does not use `buf`. *Solution:* Allocate memory with `thd_strmake` so the memory in `save` is not local. --------------------------------------------------------------------------- Fix test main.bug12969156 when WITH_ASAN=ON *Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable. --------------------------------------------------------------------------- PS-8204: Fix XML escape rules for audit plugin https://jira.percona.com/browse/PS-8204 There was a wrong length specified for some XML escape rules. As a result of this terminating null symbol from replacement rule was copied into resulting string. This lead to quer text truncation in audit log file. In addition added empty replacement rules for '\b' and 'f' symbols which just remove them from resulting string. These symboles are not supported in XML 1.0. --------------------------------------------------------------------------- PS-8854: Add main.percona_udf MTR test Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions. --------------------------------------------------------------------------- PS-9369: Fix currently processed query comparison in audit_log https://perconadev.atlassian.net/browse/PS-9369 The audit_log uses stack to keep track of table access operations being performed in scope of one query. It compares last known table access query string stored on top of this stack with actual query in audit event being processed at the moment to decide if new record should be pushed to stack or it is time to clean records from the stack. Currently audit_log simply compares char* variables to decide if this is the same query string. This approach doesn't work. As a result plugin looses control of the stack size and it starts growing with the time consuming memory. This issue is not noticable on short term server connections as memory is freed once connection is closed. At the same time this leads to extra memory consumption for long running server connections. The following is done to fix the issue: - Query is sent along with audit event as MYSQL_LEX_CSTRING structure. It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised properly. Added string length check to make sure structure contains any valid string. - Used strncmp to compare actual strings instead of comparing char* variables.
inikep
pushed a commit
that referenced
this pull request
Jan 27, 2025
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
added a commit
that referenced
this pull request
Jan 28, 2025
PS-5741: Incorrect use of memset_s in keyring_vault. Fixed the usage of memset_s. The arguments should be: void memset_s(void *dest, size_t dest_max, int c, size_t n) where the 2nd argument is size of buffer and the 3rd is argument is character to fill. --------------------------------------------------------------------------- PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate --- *Problem:* `st_mysql_value::val_str` might return a pointer to `buf` which after the function called is deleted. Therefore the value in `save`, after reuturnin from the function, is invalid. In this particular case, the error is not manifesting as val_str` returns memory allocated with `thd_strmake` and it does not use `buf`. *Solution:* Allocate memory with `thd_strmake` so the memory in `save` is not local. --------------------------------------------------------------------------- Fix test main.bug12969156 when WITH_ASAN=ON *Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable. --------------------------------------------------------------------------- PS-8204: Fix XML escape rules for audit plugin https://jira.percona.com/browse/PS-8204 There was a wrong length specified for some XML escape rules. As a result of this terminating null symbol from replacement rule was copied into resulting string. This lead to quer text truncation in audit log file. In addition added empty replacement rules for '\b' and 'f' symbols which just remove them from resulting string. These symboles are not supported in XML 1.0. --------------------------------------------------------------------------- PS-8854: Add main.percona_udf MTR test Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions. --------------------------------------------------------------------------- PS-9369: Fix currently processed query comparison in audit_log https://perconadev.atlassian.net/browse/PS-9369 The audit_log uses stack to keep track of table access operations being performed in scope of one query. It compares last known table access query string stored on top of this stack with actual query in audit event being processed at the moment to decide if new record should be pushed to stack or it is time to clean records from the stack. Currently audit_log simply compares char* variables to decide if this is the same query string. This approach doesn't work. As a result plugin looses control of the stack size and it starts growing with the time consuming memory. This issue is not noticable on short term server connections as memory is freed once connection is closed. At the same time this leads to extra memory consumption for long running server connections. The following is done to fix the issue: - Query is sent along with audit event as MYSQL_LEX_CSTRING structure. It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised properly. Added string length check to make sure structure contains any valid string. - Used strncmp to compare actual strings instead of comparing char* variables.
inikep
pushed a commit
that referenced
this pull request
Jan 28, 2025
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc.
inikep
pushed a commit
that referenced
this pull request
Jan 28, 2025
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
that referenced
this pull request
Jan 30, 2025
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c
inikep
pushed a commit
that referenced
this pull request
Jan 30, 2025
Upstream commit ID : fb-mysql-5.6.35/911d1a387a0d80f3ba52b7432c1abdbd7e8cb220 PS-6867 : Merge fb-prod201905 Summary: Missed a few in earlier fixes for AutoInitCopy rule. Also added a few fixes for anoymous class rule and local shadowing rule. Reviewed By: luqun Differential Revision: D15467213 fbshipit-source-id: 9325852dbdd
inikep
added a commit
that referenced
this pull request
Jan 30, 2025
Import all functions and changes required to compile MyRocks (-DWITH_ROCKSDB=1). ---------------------------------------------------------------------- PS-9218 merge: Merge MySQL 8.4.0 (GCC 12.3 RelWithDebInfo fixes) (percona#5356) https://perconadev.atlassian.net/browse/PS-9218 Added more warning suppressions for RocksDB submodule files that appeared in GCC 12.3 in RelWithDebInfo mode. ---------------------------------------------------------------------- PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build) #2 (rocksdb part)
inikep
pushed a commit
that referenced
this pull request
Jan 30, 2025
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
added a commit
that referenced
this pull request
Jan 30, 2025
PS-5741: Incorrect use of memset_s in keyring_vault. Fixed the usage of memset_s. The arguments should be: void memset_s(void *dest, size_t dest_max, int c, size_t n) where the 2nd argument is size of buffer and the 3rd is argument is character to fill. --------------------------------------------------------------------------- PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate --- *Problem:* `st_mysql_value::val_str` might return a pointer to `buf` which after the function called is deleted. Therefore the value in `save`, after reuturnin from the function, is invalid. In this particular case, the error is not manifesting as val_str` returns memory allocated with `thd_strmake` and it does not use `buf`. *Solution:* Allocate memory with `thd_strmake` so the memory in `save` is not local. --------------------------------------------------------------------------- Fix test main.bug12969156 when WITH_ASAN=ON *Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable. --------------------------------------------------------------------------- PS-8204: Fix XML escape rules for audit plugin https://jira.percona.com/browse/PS-8204 There was a wrong length specified for some XML escape rules. As a result of this terminating null symbol from replacement rule was copied into resulting string. This lead to quer text truncation in audit log file. In addition added empty replacement rules for '\b' and 'f' symbols which just remove them from resulting string. These symboles are not supported in XML 1.0. --------------------------------------------------------------------------- PS-8854: Add main.percona_udf MTR test Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions. --------------------------------------------------------------------------- PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build) https://perconadev.atlassian.net/browse/PS-9218
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is how the bug occurs:
btr_search_update_block_hash_info
returns true: https://github.com/percona/percona-server/blob/8.0/storage/innobase/btr/btr0sea.cc#L588 on one ofLOAD DATA
commands execution. This causes the creation of a hash index.index->space == FIL_NULL
butblock->page.id.space()
stays the same.I suppose that solution can be in widening assert condition to include
index->space == FIL_NULL
case. But I'am not sure, is it okay that we drop the reseted index inbtr_search_drop_page_hash_index
? But early returning from the function probably is not an option, because in that caseassert_block_ahi_empty(block)
is failed:Assert was originally added in percona@195760a57028 commit. Probably it's not good idea to widen other same asserts because other cases are not related with dropping index, but with creating, updating etc. But I'am also not sure.
I added a little faster version of a testcase. It differs from the original one, but works in the same way.