Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FB8-263: Add JUnit reporting support to MTR #1

Closed

Conversation

oleksandr-kachan
Copy link

https://jira.percona.com/browse/FB8-263

This change is a combination of a few commits:

----- percona/percona-server#3901 -----
Add JUnit reporting support to MTR

  • Add JUnit reporting support to the MySQL test suite runner
  • Add "junit" test suite with deliberately failing tests
  • Add a deliberately failing unittests suite
  • Attach mysqld shutdown and valgrind reports to JUnit report
  • Run unit tests under Valgrind too if it was enabled
  • Mark self-skipping tests as skipped in JUnit reports
  • Convey tests output to JUnit reports regardless of test failure
  • Exclude tests skipped by framework from results xml

----- percona/percona-server#4246 -----
Improve communication between MTR test worker and main process

When MTR worker finishes its job it sends shutdown and valgrind
reports to main process. It exits right after that. Two issues
are possible at this point:

  • main process may lose last sent data from a worker;
  • from time to time main MTR ptocess may silently exit right
    after closing a worker.

To solve both these issues communication protocol between
main MTR process and workers is modified in this change.
The following changes are done:

  • Worker shutdown process is split into two steps - collecting
    and sending shutdown/valgrind reports, actual worker process shutdown.
  • The GETREPORTS message gets sent to a worker to make it collect and
    send shutdown reports. Worker process stays active at this point.
  • Once main process receives all reports it sends BYE to a worker.
    Worker process exits after receiving this message.
    Above changes make sure main MTR process has time to collect
    all data from a worker before it exits.

----- percona/percona-server#4361 -----
Select HEX encoded string when checking data in encrypted column

In case of innodb.import_compress_encrypt test failure it adds
substrings selected from encrypted VARCHAR column directly into xml
report. This results in unreadable characters being added into report
and corrupted xml file.
Use hex() encoded substrings for results comparison instead.

https://jira.percona.com/browse/FB8-263

This change is a combination of a few commits:

----- percona/percona-server#3901 -----
Add JUnit reporting support to MTR

- Add JUnit reporting support to the MySQL test suite runner
- Add "junit" test suite with deliberately failing tests
- Add a deliberately failing unittests suite
- Attach mysqld shutdown and valgrind reports to JUnit report
- Run unit tests under Valgrind too if it was enabled
- Mark self-skipping tests as skipped in JUnit reports
- Convey tests output to JUnit reports regardless of test failure
- Exclude tests skipped by framework from results xml

----- percona/percona-server#4246 -----
Improve communication between MTR test worker and main process

When MTR worker finishes its job it sends shutdown and valgrind
reports to main process. It exits right after that. Two issues
are possible at this point:
- main process may lose last sent data from a worker;
- from time to time main MTR ptocess may silently exit right
  after closing a worker.

To solve both these issues communication protocol between
main MTR process and workers is modified in this change.
The following changes are done:
- Worker shutdown process is split into two steps - collecting
  and sending shutdown/valgrind reports, actual worker process shutdown.
- The GETREPORTS message gets sent to a worker to make it collect and
  send shutdown reports. Worker process stays active at this point.
- Once main process receives all reports it sends BYE to a worker.
  Worker process exits after receiving this message.
Above changes make sure main MTR process has time to collect
all data from a worker before it exits.

----- percona/percona-server#4361 -----
Select HEX encoded string when checking data in encrypted column

In case of innodb.import_compress_encrypt test failure it adds
substrings selected from encrypted VARCHAR column directly into xml
report. This results in unreadable characters being added into report
and corrupted xml file.
Use hex() encoded substrings for results comparison instead.
inikep pushed a commit that referenced this pull request Oct 15, 2021
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990

fbshipit-source-id: 68a234fb694
inikep pushed a commit that referenced this pull request Oct 15, 2021
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990

fbshipit-source-id: 68a234fb694
inikep pushed a commit that referenced this pull request Oct 20, 2021
Summary:
MySQL bypass infrastructure change

1. Adds a new hton override `handlerton::handle_single_table_select` to allow hijacking of SELECT statement execution from any storage engine.
2. Add lex parsing for bypass hints in optimizer hint style /*+ bypass */ and /*+ no_bypass */
3. MyRocks overrides handle_single_table_select but does nothing and simply fallback.

This is a port from from bypass feature branch - change #1 of 3. I'm planning to port the changes in the following order:

1. MySQL lexer + SELECT hijacking (this one).
2. SELECT Statement parsing and checking
3. Execution of SELECT statement in bypass

Reference Patch: facebook@2d19dc0

-----
Porting Notes:

* The entry points have changed due to MySQL 8.0 refactoring to SELECT path, and now it is in `Sql_cmd_dml::execute_inner`. I'll also see if overriding the entire execute would be better but this should be good enough for now.
* The check for whether we should go to bypass handler is mostly the same, except st_select_lex_t is now SELECT_LEX.
* The lex parsing for hints was mostly ported as-is. I was also debating whether I should just use optimizer hints instead of hacking the lexer, but decided against using optimizer hint for now so that migration from 5.6 will be easier. We can migrate to optimizer hint once 8.0 migration is complete.

Differential Revision: D22807040

fbshipit-source-id: 6e6aec4c099
inikep pushed a commit that referenced this pull request Oct 20, 2021
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990

fbshipit-source-id: 68a234fb694
inikep pushed a commit that referenced this pull request Oct 21, 2021
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990

fbshipit-source-id: 68a234fb694
inikep pushed a commit that referenced this pull request Oct 22, 2021
Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418

fbshipit-source-id: db2ac9c92f8
inikep pushed a commit that referenced this pull request Oct 22, 2021
Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

fbshipit-source-id: ed57cea875c
inikep pushed a commit that referenced this pull request Oct 22, 2021
Summary:
MySQL bypass infrastructure change

1. Adds a new hton override `handlerton::handle_single_table_select` to allow hijacking of SELECT statement execution from any storage engine.
2. Add lex parsing for bypass hints in optimizer hint style /*+ bypass */ and /*+ no_bypass */
3. MyRocks overrides handle_single_table_select but does nothing and simply fallback.

This is a port from from bypass feature branch - change #1 of 3. I'm planning to port the changes in the following order:

1. MySQL lexer + SELECT hijacking (this one).
2. SELECT Statement parsing and checking
3. Execution of SELECT statement in bypass

Reference Patch: facebook@2d19dc0

-----
Porting Notes:

* The entry points have changed due to MySQL 8.0 refactoring to SELECT path, and now it is in `Sql_cmd_dml::execute_inner`. I'll also see if overriding the entire execute would be better but this should be good enough for now.
* The check for whether we should go to bypass handler is mostly the same, except st_select_lex_t is now SELECT_LEX.
* The lex parsing for hints was mostly ported as-is. I was also debating whether I should just use optimizer hints instead of hacking the lexer, but decided against using optimizer hint for now so that migration from 5.6 will be easier. We can migrate to optimizer hint once 8.0 migration is complete.

Differential Revision: D22807040

fbshipit-source-id: 6e6aec4c099
inikep pushed a commit that referenced this pull request Oct 22, 2021
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990

fbshipit-source-id: 68a234fb694
inikep pushed a commit that referenced this pull request Oct 22, 2021
Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418

fbshipit-source-id: db2ac9c92f8
inikep pushed a commit that referenced this pull request Oct 22, 2021
Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

fbshipit-source-id: ed57cea875c
inikep pushed a commit that referenced this pull request Oct 25, 2021
Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418

fbshipit-source-id: db2ac9c92f8
inikep pushed a commit that referenced this pull request Oct 25, 2021
Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

fbshipit-source-id: ed57cea875c
inikep pushed a commit that referenced this pull request Oct 27, 2021
Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418

fbshipit-source-id: db2ac9c92f8
inikep pushed a commit that referenced this pull request Oct 27, 2021
Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

fbshipit-source-id: ed57cea875c
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
Summary:
This is an upstream bug: https://bugs.mysql.com/bug.php?id=92964

The issue is that if slave instance use sysvar_session_track_gtids == OWN_GTID and also enable MTS, the slave will lag behind master and its performance degrades over time.

The reason for this lag/perf degrade is caused by each slave worker keep adding its own gtid into its own Session_consistency_gtids_ctx.m_gtid_set during each transaction commit. Overtime, some of Session_consistency_gtids_ctx.m_gtid_set may have millions gno_interval and it will take long time to insert 1 gtid into Session_consistency_gtids_ctx.m_gtid_set(time complexity O(n)), see [Gtid_set::add_gno_interval](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_gtid_set.cc#L323).

There are couple related code to this issue:
- During slave worker thread start, there is a [check](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/session_tracker.cc#L1560) to not enable tracker for system threads but that check doesn't work---The THD system_thread field is initialized with default value(NON_SYSTEM_THREAD) during THD::THD.
```
  m_enabled = thd->variables.session_track_gtids != OFF &&
              /* No need to track GTIDs for system threads. */
              thd->system_thread == NON_SYSTEM_THREAD;
```
```
Call stack:
```
- in [Session_gtids_track::Update](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/session_tracker.cc#L1581), the else code doesn't unregister listener due to mysql/mysql-server@1e0844c try to fix another issue.

Solutions:
1.  [Current]add system threads check during collecting data, since THD will be fully initialized at that time.
 -Pro: simple change(easy to port) and perf is ok
2. add `thd->session_track_gtids=OFF; thd->session_tracker.get_tracker(SESSION_GTIDS_TRACKER)->update(thd);` after [init_slave_thread()](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_slave.cc#L6788)
-Pro: During init_slave_thread(), thd->system_thread will be set correctly value.
-Con: it need to add couple places, such as handle_slave_io, handle_slave_sql, handle_slave_worker
3. add `thd->session_track_gtids=OFF;thd->session_tracker.get_tracker(SESSION_GTIDS_TRACKER)->update(thd);` inside [init_slave_thread()](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_slave.cc#L6788)
   -Pro: only add once for slave/slave worker/IO threads
   -Con: [should_collect](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_context.cc#L53) check still return true, thus it is less perf than inikep#1
4. add `thd->session_track_gtids=OFF;thd->rpl_thd_ctx.session_gtids_ctx().unregister_ctx_change_listener(thd->session_tracker.get_tracker(SESSION_GTIDS_TRACKER));` inside init_slave_thread()
   -Pro: perf is better than inikep#3, on par as inikep#1
   -Con: if there are other non-system threads(except IO/sql threads) do transactions, these threads will still collect its own gtids.
5. add another THD overload constructor which takes 'enum_thread_type' as a parameter whose default value is NON_SYSTEM_THREAD
 -Pro: it will initialize correct value with THD and won't enable those session_tracker
 -Con: similar to inikep#2/inikep#4, need to replace THD() with THD(enum_thread_type) couples places.

Differential Revision: D18072672 (facebook@4b1d7d5)

fbshipit-source-id: 4b902020e76
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
Summary:
This adds two features in RocksDB.
1. Supporting START TRANSACTION WITH CONSISTENT SNAPSHOT
2. Getting current binlog position in addition to inikep#1.
With these features, mysqldump can take consistent logical backup.

The second feature is done by START TRANSACTION WITH
CONSISTENT ROCKSDB SNAPSHOT. This is Facebook's extension, and
it works like existing START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT.

This diff changed some existing codebase/behaviors.

- Original Facebook-MySQL always started InnoDB transaction
regardless of engine clause. For example, START TRANSACTION WITH
CONSISTENT MYISAM SNAPSHOT was accepted but it actually started
InnoDB transaction, not MyISAM. This patch does not allow
setting engine that does not support consistent snapshot.

mysql> start transaction with consistent myisam snapshot;
ERROR 1105 (HY000): Consistent Snapshot is not supported for this engine

Currently only InnoDB and RocksDB support consistent snapshot.
To check engines, I modified sql/sql_yacc.yy, trans_begin()
and ha_start_consistent_snapshot() to pass handlerton.

- Changed constant name from
  MYSQL_START_TRANS_OPT_WITH_CONS_INNODB_SNAPSHOT to
MYSQL_START_TRANS_OPT_WITH_CONS_ENGINE_SNAPSHOT, because it's no longer
InnoDB dependent.

- When not setting engine, START TRANSACTION WITH CONSISTENT SNAPSHOT
takes both InnoDB and RocksDB snapshots, and both InnoDB and RocksDB
participate in transaction. When executing COMMIT, both InnoDB and
RocksDB modifications are committed. Remember that XA is not supported yet,
so mixing engines is not recommended anyway.

- When setting engine, START TRANSACTION WITH CONSISTENT.. takes
snapshot for the specified engine only. But it starts both
InnoDB and RocksDB transactions.

Differential Revision: https://reviews.facebook.net/D32355

fbshipit-source-id: b246adccb0c
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
Summary:
Don't update secondary indexes that do not have columns that
are included in table->write_map at query start.

Differential Revision: https://reviews.facebook.net/D33471

fbshipit-source-id: f2116a657ad
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
Summary:
Inside index_next_same() call, we should
1. first check whether the record matches the index
   lookup prefix,
2. then check pushed index condition.

If we try to check inikep#2 without checking inikep#1 first, we may walk
off the index lookup prefix and scan till the end of the index.

Differential Revision: https://reviews.facebook.net/D38769

fbshipit-source-id: 6c34e3ca0d2
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
Summary:
MyRocks had two bugs when calculating index scan cost.
1. block_size was not considered. This made covering index scan cost
(both full index scan and range scan) much higher
2. ha_rocksdb::records_in_range() may have estimated more rows
than the estimated number of rows in the table. This was wrong,
and MySQL optimizer decided to use full index scan even though
range scan was more efficient.

This diff fixes inikep#1 by setting stats.block_size at ha_rocksdb::open(),
and fixes inikep#2 by reducing the number of estimated rows if it was
larger than stats.records.

Differential Revision: https://reviews.facebook.net/D55869

fbshipit-source-id: 15ca2c3fa8e
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
…?=20doesnt=20work=20correctly=20for=20nes=E2=80=A6=20(facebook#804)?= (facebook#804)

Summary:
…ted statements

Variant inikep#1: When the statement fails, we should roll back to the latest
savepoint taken at the top level.
Closes facebook#804

Differential Revision: D7509380 (facebook@6ddedd8)

Pulled By: hermanlee

fbshipit-source-id: be411bfae1b
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
…acebook#871)

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 inikep#1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 inikep#2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 inikep#3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 inikep#4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 inikep#5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 inikep#6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook#838
Pull Request resolved: facebook#871

Differential Revision: D9417382 (facebook@8cb1dc8)

Pulled By: lth

fbshipit-source-id: 5aa7ed56c0a
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow inikep#1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow inikep#3: fetch ttl value from ttl column

The change is to merge inikep#1 and inikep#3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Differential Revision: D14662716 (facebook@7703200)

fbshipit-source-id: a46d980aeb3
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
Summary:
Disabling tests:
1. Transaction related - those are already disabled in 5.6 as they are not supported
2. hotbackup tests - we need Xtrabackup for MyRocks
3. partition tests - we need Percona's partition patch. Note these are partition specific tests - tests that only use partition are properly fixed up to work without partitions to ensure we have good coverage.
4. tests that are requires authentication plugin
5. Missing patches

In 8.0 the disabled.def aren't being used anymore and instead they are located in collections and needs to be specifically added.

Reviewed By: lloyd

Differential Revision: D17555081 (facebook@7bd25c7)

fbshipit-source-id: 45e576d791b
avbelov23 pushed a commit to avbelov23/mysql-5.6 that referenced this pull request Oct 30, 2021
Summary:
1. gap_lock_issue254: fails because gap_lock_raise_error is failing mtr warning check. I'm moving gap_lock_raise_error from test option into the test and disable it. TODO: rebase it in earlier diff that updates the error number.
3. bulk_load_errors: the error number is updated to be based start from 500. See inikep#1
4. allow_start_after_corruption: this one fails because MySQL no longer flushes log messages and instead of buffers them . The fix for now is to flush before exit but we should investigate how to turn off this behavior so that log messages are always flushed to ensure those message always show up.

Reviewed By: luqun

Differential Revision: D18671578 (facebook@653cf61)

fbshipit-source-id: e71f705490a
inikep pushed a commit that referenced this pull request Nov 1, 2021
Summary:
MySQL bypass infrastructure change

1. Adds a new hton override `handlerton::handle_single_table_select` to allow hijacking of SELECT statement execution from any storage engine.
2. Add lex parsing for bypass hints in optimizer hint style /*+ bypass */ and /*+ no_bypass */
3. MyRocks overrides handle_single_table_select but does nothing and simply fallback.

This is a port from from bypass feature branch - change #1 of 3. I'm planning to port the changes in the following order:

1. MySQL lexer + SELECT hijacking (this one).
2. SELECT Statement parsing and checking
3. Execution of SELECT statement in bypass

Reference Patch: facebook@2d19dc0

-----
Porting Notes:

* The entry points have changed due to MySQL 8.0 refactoring to SELECT path, and now it is in `Sql_cmd_dml::execute_inner`. I'll also see if overriding the entire execute would be better but this should be good enough for now.
* The check for whether we should go to bypass handler is mostly the same, except st_select_lex_t is now SELECT_LEX.
* The lex parsing for hints was mostly ported as-is. I was also debating whether I should just use optimizer hints instead of hacking the lexer, but decided against using optimizer hint for now so that migration from 5.6 will be easier. We can migrate to optimizer hint once 8.0 migration is complete.

Differential Revision: D22807040

fbshipit-source-id: 6e6aec4c099
inikep pushed a commit that referenced this pull request Jul 19, 2024
Summary:
Almost a direct port of the feature in 5.6 except for minor visual changes and
some minor changes related to 8.0 thread handling

Also brings in the basic raft_listener_queue without the implemented functions

Differential Revision: D21670710

Use DBUG_TRACE instead of DBUG_ENTER

Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418
inikep pushed a commit that referenced this pull request Jul 19, 2024
Summary:
port D21375044 (facebook@7fc9eaa) (facebook@7fc9eaa) and D17291883 (facebook@6a6a35f) (facebook@6a6a35f)
**Summary**
* As part of leadership changes, we will trim binlogs/relay logs on a
running mysql instance. This needs the ability to remove gtids from executed_gtid set.
This diff adds the ability to do the same. One can enqueue an event in the raft
listener queue and the raft thread in the server will take care of removing and
updating executed_gtid on a running instance.
* When raft logs are trimmed as part of TruncateOpsAfter(), the gtid of the trimmed trxs are also removed from executed/logged gtids. However, when binlog files are rotated, the previous gtid written to the new file depends on what files are rotated. During  binlog rotation the prev gtid are the logged/executed gtids of the instance. During relay log rotation prev gtids are the retrieved gtids of the instance. Hence, in addition to trimming logged gtids/executed gtids, we should also trim retrieved gtids. This prevents creation of holes in the instances executed_gtid set as the replicaset goes through multiple promotions over its lifetime.

Reviewed By: luqun

Differential Revision: D24690007

---------------------------------------------------------------------------------------

Use DBUG_TRACE instead of DBUG_ENTER

Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418
inikep pushed a commit that referenced this pull request Jul 19, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jul 29, 2024
Summary:
When COUNT(*) style aggregation appears in queries, the optimizer realizes that a full scan needs to be done. The index doesn't matter - any index will do as it long it satisfies the other query predicates present.

There are two heuristics at play. If there is a choice of multiple indexes, the optimizer will pick the index that has the shortest length since that means less disk scan time, therefore faster query (heuristic #1).

In the case of vector indexes, the current design can have the vector field present in the clustered PK as well as the vector index (eg the flat index). Even though there's no clear advantage, the heuristic is based on INNODB style index layout, and assumes that a secondary index will always be a better choice as long as it does not cover all table columns (heuristic #2). In reality, in both INNODB and MyRocks, the secondary key does contain the PK. But this fact is ignored.

What ends up happening is that by only considered the # of explicitly defined key parts, the vector index ends up getting picked up.

In the case of the IVF index, the vector index may be shorter than the primary clustered index. Thus secondary index will have a shorter key length than the primary, and will end up getting picked up (i.e. if the key part heuristic doesn't already pick the vector index again).

There can be many other index combinations. We want the optimizer to pick another available index. If no other index is available, then pick the primary clustered index.

Differential Revision: D55003211

fbshipit-source-id: 5ce6b34
inikep pushed a commit that referenced this pull request Jul 30, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jul 30, 2024
Summary:
When COUNT(*) style aggregation appears in queries, the optimizer realizes that a full scan needs to be done. The index doesn't matter - any index will do as it long it satisfies the other query predicates present.

There are two heuristics at play. If there is a choice of multiple indexes, the optimizer will pick the index that has the shortest length since that means less disk scan time, therefore faster query (heuristic #1).

In the case of vector indexes, the current design can have the vector field present in the clustered PK as well as the vector index (eg the flat index). Even though there's no clear advantage, the heuristic is based on INNODB style index layout, and assumes that a secondary index will always be a better choice as long as it does not cover all table columns (heuristic #2). In reality, in both INNODB and MyRocks, the secondary key does contain the PK. But this fact is ignored.

What ends up happening is that by only considered the # of explicitly defined key parts, the vector index ends up getting picked up.

In the case of the IVF index, the vector index may be shorter than the primary clustered index. Thus secondary index will have a shorter key length than the primary, and will end up getting picked up (i.e. if the key part heuristic doesn't already pick the vector index again).

There can be many other index combinations. We want the optimizer to pick another available index. If no other index is available, then pick the primary clustered index.

Differential Revision: D55003211

fbshipit-source-id: 5ce6b34
inikep pushed a commit that referenced this pull request Jul 31, 2024
Summary:
This is an upstream bug: https://bugs.mysql.com/bug.php?id=92964

The issue is that if slave instance use sysvar_session_track_gtids == OWN_GTID and also enable MTS, the slave will lag behind master and its performance degrades over time.

The reason for this lag/perf degrade is caused by each slave worker keep adding its own gtid into its own Session_consistency_gtids_ctx.m_gtid_set during each transaction commit. Overtime, some of Session_consistency_gtids_ctx.m_gtid_set may have millions gno_interval and it will take long time to insert 1 gtid into Session_consistency_gtids_ctx.m_gtid_set(time complexity O(n)), see [Gtid_set::add_gno_interval](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_gtid_set.cc#L323).

There are couple related code to this issue:
- During slave worker thread start, there is a [check](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/session_tracker.cc#L1560) to not enable tracker for system threads but that check doesn't work---The THD system_thread field is initialized with default value(NON_SYSTEM_THREAD) during THD::THD.
```
  m_enabled = thd->variables.session_track_gtids != OFF &&
              /* No need to track GTIDs for system threads. */
              thd->system_thread == NON_SYSTEM_THREAD;
```
```
Call stack:
```
- in [Session_gtids_track::Update](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/session_tracker.cc#L1581), the else code doesn't unregister listener due to mysql/mysql-server@1e0844c try to fix another issue.

Solutions:
1.  [Current]add system threads check during collecting data, since THD will be fully initialized at that time.
 -Pro: simple change(easy to port) and perf is ok
2. add `thd->session_track_gtids=OFF; thd->session_tracker.get_tracker(SESSION_GTIDS_TRACKER)->update(thd);` after [init_slave_thread()](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_slave.cc#L6788)
-Pro: During init_slave_thread(), thd->system_thread will be set correctly value.
-Con: it need to add couple places, such as handle_slave_io, handle_slave_sql, handle_slave_worker
3. add `thd->session_track_gtids=OFF;thd->session_tracker.get_tracker(SESSION_GTIDS_TRACKER)->update(thd);` inside [init_slave_thread()](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_slave.cc#L6788)
   -Pro: only add once for slave/slave worker/IO threads
   -Con: [should_collect](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_context.cc#L53) check still return true, thus it is less perf than #1
4. add `thd->session_track_gtids=OFF;thd->rpl_thd_ctx.session_gtids_ctx().unregister_ctx_change_listener(thd->session_tracker.get_tracker(SESSION_GTIDS_TRACKER));` inside init_slave_thread()
   -Pro: perf is better than #3, on par as #1
   -Con: if there are other non-system threads(except IO/sql threads) do transactions, these threads will still collect its own gtids.
5. add another THD overload constructor which takes 'enum_thread_type' as a parameter whose default value is NON_SYSTEM_THREAD
 -Pro: it will initialize correct value with THD and won't enable those session_tracker
 -Con: similar to #2/#4, need to replace THD() with THD(enum_thread_type) couples places.

Differential Revision: D18072672
inikep pushed a commit that referenced this pull request Jul 31, 2024
…batch #1

Summary:
Disabling tests:
1. Transaction related - those are already disabled in 5.6 as they are not supported
2. hotbackup tests - we need Xtrabackup for MyRocks
3. partition tests - we need Percona's partition patch. Note these are partition specific tests - tests that only use partition are properly fixed up to work without partitions to ensure we have good coverage.
4. tests that are requires authentication plugin
5. Missing patches

In 8.0 the disabled.def aren't being used anymore and instead they are located in collections and needs to be specifically added.

Reviewed By: lloyd

Differential Revision: D17555081
inikep pushed a commit that referenced this pull request Jul 31, 2024
Summary:
MySQL bypass infrastructure change

1. Adds a new hton override `handlerton::handle_single_table_select` to allow hijacking of SELECT statement execution from any storage engine.
2. Add lex parsing for bypass hints in optimizer hint style /*+ bypass */ and /*+ no_bypass */
3. MyRocks overrides handle_single_table_select but does nothing and simply fallback.

This is a port from from bypass feature branch - change #1 of 3. I'm planning to port the changes in the following order:

1. MySQL lexer + SELECT hijacking (this one).
2. SELECT Statement parsing and checking
3. Execution of SELECT statement in bypass

Reference Patch: facebook@2d19dc0

-----
Porting Notes:

* The entry points have changed due to MySQL 8.0 refactoring to SELECT path, and now it is in `Sql_cmd_dml::execute_inner`. I'll also see if overriding the entire execute would be better but this should be good enough for now.
* The check for whether we should go to bypass handler is mostly the same, except st_select_lex_t is now SELECT_LEX.
* The lex parsing for hints was mostly ported as-is. I was also debating whether I should just use optimizer hints instead of hacking the lexer, but decided against using optimizer hint for now so that migration from 5.6 will be easier. We can migrate to optimizer hint once 8.0 migration is complete.

Differential Revision: D22807040

---------------------------------------------------------------------------------------------

[bypass] MySQL bypass change #2 - SELECT parsing [non-RocksDB part]

Summary:
This is change #2 that ports the SELECT statement parsing from feature-myrocks-bypass branch, including:

1. select_parser class that parses SELECT statements and validates the scenarios are supported, and error with proper message if not supported
2. You can control bypass policy with rocksdb_select_bypass_policy variable = 0 (ALWAYS OFF) / 1 (ALWAYS ON) / 2 (OPT IN) / 3 (OPT OUT).
3. A few metrics for number of SELECT bypass statements executed/rejected (due to not supported) /failed (something wrong in execution)

At this point this only dumps the SELECT statement parsed results without doing anything useful. The next change will be porting the query plan execution and point/range query execution part.

Please ignore the unused attributes for now - they'll be removed in next change once the execution part is ported.

Reference Patch:

facebook@e79b20a
facebook@8072e3d

 ---
Porting Notes:

There are a lot of small changes in this area but nothing too bad:
* Some bypass scenarios no longer fail or fail with different error message because of the new parse tree -> AST conversion process. For example, `SELECT *` now works correctly
* st_select_lex_t -> SELECT_LEX, and many members has changed name, such as where -> where_cond()
* protocol usage has changed to start_row, store, end_row in 8.0, which is much better.
* thd->query() is now a LEX_STRING
* SELECT option check (such as SELECT_DISTINCT) now uses active_options() and only check for SELECT_DISTINCT as there seems to be default value being set most likely due to side effect of having more parse tree processing being done. I checked the flags and it looks like only SELECT_DISTINCT are interesting and needs to be blocked.
* SELECT INTO OUTFILE check is changed to use results->needs_file_priviledge
* True/False const are now represented as Item_func_true/Item_func_false. For now I'm over-generalizing to functions that are const, and expect val_int/etc would do the right thing

Reviewed By: luqun

Differential Revision: D22808305

-----------------------------------------------------------------------------------------------

Introduce base select parser class

Summary:
To make it easier to add a new bypass rpc parser into existing bypass implementation (nosql_access.cc),
and make minimum change to bypass engine while adding a bypass rpc parser, I introduced a new base class,
base_select_parser. It includes all essential members and methods for parsing bypass sq/rpc queries.
Existing bypass sql parser (select_parser) and a new bypass rpc parser (not included in this diff)
inherits the base function.

Bypass RPC reference diff: D30493654

Reviewed By: yizhang82

Differential Revision: D31870628
inikep added a commit that referenced this pull request Jul 31, 2024
Summary:
mysqlbinlog expects rows_query event only in RBR. It prints an
unflushed event warning when it parses a transaction which has rows_query but
not a rows events with STMT_END_F flag set. Since there is no direct way to
tell the binlog format in mysqlbinlog, we'll just forgo the warning for
rows_event which contain metadata.

Reference Patch: facebook@89e1c03ac

Reviewed By: lloyd

Differential Revision: D10054923

-----------------------------------------------------------------------

optimize Rows_query_log_event::has_trx_meta_data()

Summary:
Rows_query_log_event::has_trx_meta_data() is expensive as it does a bunch
of string manipulation to figure out if it is prepended by
TRX_META_DATA. In a random profile, it shows to take up 2.7% CPU cycles on
secondaries. The function is called multiple times from different
places during applying a trx from worker threads:
1. Rows_query_log_event::do_apply_event(): The 'apply' of this event
stashes the trx_meta_data_json into 'rli' _only_ if the event contains
trx_meta_data. This is the first invocation of
Rows_query_log_event::has_trx_meta_data()
2. Query_log_event::do_apply_event(): It needs to cleanup
the 'rows_query_ev' stashed in 'rli' _only_ if it has trx_meta_data.
Hence it calls Rows_query_log_event::has_trx_metadata() again
3. Log_event::apply_event(): This extracts the timestamp (for
'milli-secs behind master' calculation) from
trx_meta_data (stashed inside rli in #1) and needs to check if the event
contains trx_meta_data. Hence it calls
Rows_query_log_event::has_trx_metadata() again

It is best to cache this information in the event itself so that the cost
of Rows_query_log_event::has_trx_metadata() is paid only once. This
reduces the cost of this method by 3x.

In a subsequent diff, I intend to cache the result of
Query_log_event::extract_trx_metadata() since this method is also called
multiple times and it is expensive as it does a bunch of json parsing
and string construction/manipulation.

Reviewed By: abhinav04sharma

Differential Revision: D31373879
inikep pushed a commit that referenced this pull request Jul 31, 2024
Summary:
Almost a direct port of the feature in 5.6 except for minor visual changes and
some minor changes related to 8.0 thread handling

Also brings in the basic raft_listener_queue without the implemented functions

Differential Revision: D21670710

Use DBUG_TRACE instead of DBUG_ENTER

Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418
inikep pushed a commit that referenced this pull request Jul 31, 2024
Summary:
port D21375044 (facebook@7fc9eaa) (facebook@7fc9eaa) and D17291883 (facebook@6a6a35f) (facebook@6a6a35f)
**Summary**
* As part of leadership changes, we will trim binlogs/relay logs on a
running mysql instance. This needs the ability to remove gtids from executed_gtid set.
This diff adds the ability to do the same. One can enqueue an event in the raft
listener queue and the raft thread in the server will take care of removing and
updating executed_gtid on a running instance.
* When raft logs are trimmed as part of TruncateOpsAfter(), the gtid of the trimmed trxs are also removed from executed/logged gtids. However, when binlog files are rotated, the previous gtid written to the new file depends on what files are rotated. During  binlog rotation the prev gtid are the logged/executed gtids of the instance. During relay log rotation prev gtids are the retrieved gtids of the instance. Hence, in addition to trimming logged gtids/executed gtids, we should also trim retrieved gtids. This prevents creation of holes in the instances executed_gtid set as the replicaset goes through multiple promotions over its lifetime.

Reviewed By: luqun

Differential Revision: D24690007

---------------------------------------------------------------------------------------

Use DBUG_TRACE instead of DBUG_ENTER

Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418
inikep pushed a commit that referenced this pull request Jul 31, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jul 31, 2024
Summary:
When COUNT(*) style aggregation appears in queries, the optimizer realizes that a full scan needs to be done. The index doesn't matter - any index will do as it long it satisfies the other query predicates present.

There are two heuristics at play. If there is a choice of multiple indexes, the optimizer will pick the index that has the shortest length since that means less disk scan time, therefore faster query (heuristic #1).

In the case of vector indexes, the current design can have the vector field present in the clustered PK as well as the vector index (eg the flat index). Even though there's no clear advantage, the heuristic is based on INNODB style index layout, and assumes that a secondary index will always be a better choice as long as it does not cover all table columns (heuristic #2). In reality, in both INNODB and MyRocks, the secondary key does contain the PK. But this fact is ignored.

What ends up happening is that by only considered the # of explicitly defined key parts, the vector index ends up getting picked up.

In the case of the IVF index, the vector index may be shorter than the primary clustered index. Thus secondary index will have a shorter key length than the primary, and will end up getting picked up (i.e. if the key part heuristic doesn't already pick the vector index again).

There can be many other index combinations. We want the optimizer to pick another available index. If no other index is available, then pick the primary clustered index.

Differential Revision: D55003211

fbshipit-source-id: 5ce6b34
inikep pushed a commit that referenced this pull request Aug 2, 2024
Summary:
MySQL bypass infrastructure change

1. Adds a new hton override `handlerton::handle_single_table_select` to allow hijacking of SELECT statement execution from any storage engine.
2. Add lex parsing for bypass hints in optimizer hint style /*+ bypass */ and /*+ no_bypass */
3. MyRocks overrides handle_single_table_select but does nothing and simply fallback.

This is a port from from bypass feature branch - change #1 of 3. I'm planning to port the changes in the following order:

1. MySQL lexer + SELECT hijacking (this one).
2. SELECT Statement parsing and checking
3. Execution of SELECT statement in bypass

Reference Patch: facebook@2d19dc0

-----
Porting Notes:

* The entry points have changed due to MySQL 8.0 refactoring to SELECT path, and now it is in `Sql_cmd_dml::execute_inner`. I'll also see if overriding the entire execute would be better but this should be good enough for now.
* The check for whether we should go to bypass handler is mostly the same, except st_select_lex_t is now SELECT_LEX.
* The lex parsing for hints was mostly ported as-is. I was also debating whether I should just use optimizer hints instead of hacking the lexer, but decided against using optimizer hint for now so that migration from 5.6 will be easier. We can migrate to optimizer hint once 8.0 migration is complete.

Differential Revision: D22807040
inikep pushed a commit that referenced this pull request Aug 2, 2024
Summary:
some MTR failed in ubsan due to num of rows/records calculated in records_in_range() is a negative values which is caused by m_actual_disk_size is a negative value

```
storage/rocksdb/ha_rocksdb.cc:: runtime error: -56.8272 is outside the range of representable values of type 'unsigned long long'
    #0  myrocks::ha_rocksdb::records_in_range_internal(unsigned int, key_range*, key_range*, long, long, unsigned long long*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14855:16
    #1  myrocks::ha_rocksdb::records_in_range(unsigned int, key_range*, key_range*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14760:3
    #2  handler::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/sql/handler.cc:6608:26
    #3  myrocks::ha_rocksdb::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:19002:18
    #4  check_quick_select(THD*, RANGE_OPT_P
```

Due to m_actual_disk_size is an estimated value, always reset to 0 if it i becomes negative.

Differential Revision: D50531919
inikep pushed a commit that referenced this pull request Aug 2, 2024
Summary:
This is an upstream bug: https://bugs.mysql.com/bug.php?id=92964

The issue is that if slave instance use sysvar_session_track_gtids == OWN_GTID and also enable MTS, the slave will lag behind master and its performance degrades over time.

The reason for this lag/perf degrade is caused by each slave worker keep adding its own gtid into its own Session_consistency_gtids_ctx.m_gtid_set during each transaction commit. Overtime, some of Session_consistency_gtids_ctx.m_gtid_set may have millions gno_interval and it will take long time to insert 1 gtid into Session_consistency_gtids_ctx.m_gtid_set(time complexity O(n)), see [Gtid_set::add_gno_interval](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_gtid_set.cc#L323).

There are couple related code to this issue:
- During slave worker thread start, there is a [check](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/session_tracker.cc#L1560) to not enable tracker for system threads but that check doesn't work---The THD system_thread field is initialized with default value(NON_SYSTEM_THREAD) during THD::THD.
```
  m_enabled = thd->variables.session_track_gtids != OFF &&
              /* No need to track GTIDs for system threads. */
              thd->system_thread == NON_SYSTEM_THREAD;
```
```
Call stack:
```
- in [Session_gtids_track::Update](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/session_tracker.cc#L1581), the else code doesn't unregister listener due to mysql/mysql-server@1e0844c try to fix another issue.

Solutions:
1.  [Current]add system threads check during collecting data, since THD will be fully initialized at that time.
 -Pro: simple change(easy to port) and perf is ok
2. add `thd->session_track_gtids=OFF; thd->session_tracker.get_tracker(SESSION_GTIDS_TRACKER)->update(thd);` after [init_slave_thread()](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_slave.cc#L6788)
-Pro: During init_slave_thread(), thd->system_thread will be set correctly value.
-Con: it need to add couple places, such as handle_slave_io, handle_slave_sql, handle_slave_worker
3. add `thd->session_track_gtids=OFF;thd->session_tracker.get_tracker(SESSION_GTIDS_TRACKER)->update(thd);` inside [init_slave_thread()](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_slave.cc#L6788)
   -Pro: only add once for slave/slave worker/IO threads
   -Con: [should_collect](https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.13/sql/rpl_context.cc#L53) check still return true, thus it is less perf than #1
4. add `thd->session_track_gtids=OFF;thd->rpl_thd_ctx.session_gtids_ctx().unregister_ctx_change_listener(thd->session_tracker.get_tracker(SESSION_GTIDS_TRACKER));` inside init_slave_thread()
   -Pro: perf is better than #3, on par as #1
   -Con: if there are other non-system threads(except IO/sql threads) do transactions, these threads will still collect its own gtids.
5. add another THD overload constructor which takes 'enum_thread_type' as a parameter whose default value is NON_SYSTEM_THREAD
 -Pro: it will initialize correct value with THD and won't enable those session_tracker
 -Con: similar to #2/#4, need to replace THD() with THD(enum_thread_type) couples places.

Differential Revision: D18072672
inikep pushed a commit that referenced this pull request Aug 2, 2024
…batch #1

Summary:
Disabling tests:
1. Transaction related - those are already disabled in 5.6 as they are not supported
2. hotbackup tests - we need Xtrabackup for MyRocks
3. partition tests - we need Percona's partition patch. Note these are partition specific tests - tests that only use partition are properly fixed up to work without partitions to ensure we have good coverage.
4. tests that are requires authentication plugin
5. Missing patches

In 8.0 the disabled.def aren't being used anymore and instead they are located in collections and needs to be specifically added.

Reviewed By: lloyd

Differential Revision: D17555081
inikep pushed a commit that referenced this pull request Aug 2, 2024
Summary:
MySQL bypass infrastructure change

1. Adds a new hton override `handlerton::handle_single_table_select` to allow hijacking of SELECT statement execution from any storage engine.
2. Add lex parsing for bypass hints in optimizer hint style /*+ bypass */ and /*+ no_bypass */
3. MyRocks overrides handle_single_table_select but does nothing and simply fallback.

This is a port from from bypass feature branch - change #1 of 3. I'm planning to port the changes in the following order:

1. MySQL lexer + SELECT hijacking (this one).
2. SELECT Statement parsing and checking
3. Execution of SELECT statement in bypass

Reference Patch: facebook@2d19dc0

-----
Porting Notes:

* The entry points have changed due to MySQL 8.0 refactoring to SELECT path, and now it is in `Sql_cmd_dml::execute_inner`. I'll also see if overriding the entire execute would be better but this should be good enough for now.
* The check for whether we should go to bypass handler is mostly the same, except st_select_lex_t is now SELECT_LEX.
* The lex parsing for hints was mostly ported as-is. I was also debating whether I should just use optimizer hints instead of hacking the lexer, but decided against using optimizer hint for now so that migration from 5.6 will be easier. We can migrate to optimizer hint once 8.0 migration is complete.

Differential Revision: D22807040

---------------------------------------------------------------------------------------------

[bypass] MySQL bypass change #2 - SELECT parsing [non-RocksDB part]

Summary:
This is change #2 that ports the SELECT statement parsing from feature-myrocks-bypass branch, including:

1. select_parser class that parses SELECT statements and validates the scenarios are supported, and error with proper message if not supported
2. You can control bypass policy with rocksdb_select_bypass_policy variable = 0 (ALWAYS OFF) / 1 (ALWAYS ON) / 2 (OPT IN) / 3 (OPT OUT).
3. A few metrics for number of SELECT bypass statements executed/rejected (due to not supported) /failed (something wrong in execution)

At this point this only dumps the SELECT statement parsed results without doing anything useful. The next change will be porting the query plan execution and point/range query execution part.

Please ignore the unused attributes for now - they'll be removed in next change once the execution part is ported.

Reference Patch:

facebook@e79b20a
facebook@8072e3d

 ---
Porting Notes:

There are a lot of small changes in this area but nothing too bad:
* Some bypass scenarios no longer fail or fail with different error message because of the new parse tree -> AST conversion process. For example, `SELECT *` now works correctly
* st_select_lex_t -> SELECT_LEX, and many members has changed name, such as where -> where_cond()
* protocol usage has changed to start_row, store, end_row in 8.0, which is much better.
* thd->query() is now a LEX_STRING
* SELECT option check (such as SELECT_DISTINCT) now uses active_options() and only check for SELECT_DISTINCT as there seems to be default value being set most likely due to side effect of having more parse tree processing being done. I checked the flags and it looks like only SELECT_DISTINCT are interesting and needs to be blocked.
* SELECT INTO OUTFILE check is changed to use results->needs_file_priviledge
* True/False const are now represented as Item_func_true/Item_func_false. For now I'm over-generalizing to functions that are const, and expect val_int/etc would do the right thing

Reviewed By: luqun

Differential Revision: D22808305

-----------------------------------------------------------------------------------------------

Introduce base select parser class

Summary:
To make it easier to add a new bypass rpc parser into existing bypass implementation (nosql_access.cc),
and make minimum change to bypass engine while adding a bypass rpc parser, I introduced a new base class,
base_select_parser. It includes all essential members and methods for parsing bypass sq/rpc queries.
Existing bypass sql parser (select_parser) and a new bypass rpc parser (not included in this diff)
inherits the base function.

Bypass RPC reference diff: D30493654

Reviewed By: yizhang82

Differential Revision: D31870628
inikep added a commit that referenced this pull request Aug 2, 2024
Summary:
mysqlbinlog expects rows_query event only in RBR. It prints an
unflushed event warning when it parses a transaction which has rows_query but
not a rows events with STMT_END_F flag set. Since there is no direct way to
tell the binlog format in mysqlbinlog, we'll just forgo the warning for
rows_event which contain metadata.

Reference Patch: facebook@89e1c03ac

Reviewed By: lloyd

Differential Revision: D10054923

-----------------------------------------------------------------------

optimize Rows_query_log_event::has_trx_meta_data()

Summary:
Rows_query_log_event::has_trx_meta_data() is expensive as it does a bunch
of string manipulation to figure out if it is prepended by
TRX_META_DATA. In a random profile, it shows to take up 2.7% CPU cycles on
secondaries. The function is called multiple times from different
places during applying a trx from worker threads:
1. Rows_query_log_event::do_apply_event(): The 'apply' of this event
stashes the trx_meta_data_json into 'rli' _only_ if the event contains
trx_meta_data. This is the first invocation of
Rows_query_log_event::has_trx_meta_data()
2. Query_log_event::do_apply_event(): It needs to cleanup
the 'rows_query_ev' stashed in 'rli' _only_ if it has trx_meta_data.
Hence it calls Rows_query_log_event::has_trx_metadata() again
3. Log_event::apply_event(): This extracts the timestamp (for
'milli-secs behind master' calculation) from
trx_meta_data (stashed inside rli in #1) and needs to check if the event
contains trx_meta_data. Hence it calls
Rows_query_log_event::has_trx_metadata() again

It is best to cache this information in the event itself so that the cost
of Rows_query_log_event::has_trx_metadata() is paid only once. This
reduces the cost of this method by 3x.

In a subsequent diff, I intend to cache the result of
Query_log_event::extract_trx_metadata() since this method is also called
multiple times and it is expensive as it does a bunch of json parsing
and string construction/manipulation.

Reviewed By: abhinav04sharma

Differential Revision: D31373879
inikep pushed a commit that referenced this pull request Aug 2, 2024
Summary:
Almost a direct port of the feature in 5.6 except for minor visual changes and
some minor changes related to 8.0 thread handling

Also brings in the basic raft_listener_queue without the implemented functions

Differential Revision: D21670710

Use DBUG_TRACE instead of DBUG_ENTER

Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418
inikep pushed a commit that referenced this pull request Aug 2, 2024
Summary:
port D21375044 (facebook@7fc9eaa) (facebook@7fc9eaa) and D17291883 (facebook@6a6a35f) (facebook@6a6a35f)
**Summary**
* As part of leadership changes, we will trim binlogs/relay logs on a
running mysql instance. This needs the ability to remove gtids from executed_gtid set.
This diff adds the ability to do the same. One can enqueue an event in the raft
listener queue and the raft thread in the server will take care of removing and
updating executed_gtid on a running instance.
* When raft logs are trimmed as part of TruncateOpsAfter(), the gtid of the trimmed trxs are also removed from executed/logged gtids. However, when binlog files are rotated, the previous gtid written to the new file depends on what files are rotated. During  binlog rotation the prev gtid are the logged/executed gtids of the instance. During relay log rotation prev gtids are the retrieved gtids of the instance. Hence, in addition to trimming logged gtids/executed gtids, we should also trim retrieved gtids. This prevents creation of holes in the instances executed_gtid set as the replicaset goes through multiple promotions over its lifetime.

Reviewed By: luqun

Differential Revision: D24690007

---------------------------------------------------------------------------------------

Use DBUG_TRACE instead of DBUG_ENTER

Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418
inikep pushed a commit that referenced this pull request Aug 2, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Aug 2, 2024
Summary:
When COUNT(*) style aggregation appears in queries, the optimizer realizes that a full scan needs to be done. The index doesn't matter - any index will do as it long it satisfies the other query predicates present.

There are two heuristics at play. If there is a choice of multiple indexes, the optimizer will pick the index that has the shortest length since that means less disk scan time, therefore faster query (heuristic #1).

In the case of vector indexes, the current design can have the vector field present in the clustered PK as well as the vector index (eg the flat index). Even though there's no clear advantage, the heuristic is based on INNODB style index layout, and assumes that a secondary index will always be a better choice as long as it does not cover all table columns (heuristic #2). In reality, in both INNODB and MyRocks, the secondary key does contain the PK. But this fact is ignored.

What ends up happening is that by only considered the # of explicitly defined key parts, the vector index ends up getting picked up.

In the case of the IVF index, the vector index may be shorter than the primary clustered index. Thus secondary index will have a shorter key length than the primary, and will end up getting picked up (i.e. if the key part heuristic doesn't already pick the vector index again).

There can be many other index combinations. We want the optimizer to pick another available index. If no other index is available, then pick the primary clustered index.

Differential Revision: D55003211

fbshipit-source-id: 5ce6b34
inikep added a commit that referenced this pull request Aug 6, 2024
Summary:
mysqlbinlog expects rows_query event only in RBR. It prints an
unflushed event warning when it parses a transaction which has rows_query but
not a rows events with STMT_END_F flag set. Since there is no direct way to
tell the binlog format in mysqlbinlog, we'll just forgo the warning for
rows_event which contain metadata.

Reference Patch: facebook@89e1c03ac

Reviewed By: lloyd

Differential Revision: D10054923

-----------------------------------------------------------------------

optimize Rows_query_log_event::has_trx_meta_data()

Summary:
Rows_query_log_event::has_trx_meta_data() is expensive as it does a bunch
of string manipulation to figure out if it is prepended by
TRX_META_DATA. In a random profile, it shows to take up 2.7% CPU cycles on
secondaries. The function is called multiple times from different
places during applying a trx from worker threads:
1. Rows_query_log_event::do_apply_event(): The 'apply' of this event
stashes the trx_meta_data_json into 'rli' _only_ if the event contains
trx_meta_data. This is the first invocation of
Rows_query_log_event::has_trx_meta_data()
2. Query_log_event::do_apply_event(): It needs to cleanup
the 'rows_query_ev' stashed in 'rli' _only_ if it has trx_meta_data.
Hence it calls Rows_query_log_event::has_trx_metadata() again
3. Log_event::apply_event(): This extracts the timestamp (for
'milli-secs behind master' calculation) from
trx_meta_data (stashed inside rli in #1) and needs to check if the event
contains trx_meta_data. Hence it calls
Rows_query_log_event::has_trx_metadata() again

It is best to cache this information in the event itself so that the cost
of Rows_query_log_event::has_trx_metadata() is paid only once. This
reduces the cost of this method by 3x.

In a subsequent diff, I intend to cache the result of
Query_log_event::extract_trx_metadata() since this method is also called
multiple times and it is expensive as it does a bunch of json parsing
and string construction/manipulation.

Reviewed By: abhinav04sharma

Differential Revision: D31373879
inikep pushed a commit that referenced this pull request Aug 6, 2024
Summary:
Almost a direct port of the feature in 5.6 except for minor visual changes and
some minor changes related to 8.0 thread handling

Also brings in the basic raft_listener_queue without the implemented functions

Differential Revision: D21670710

Use DBUG_TRACE instead of DBUG_ENTER

Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418
inikep pushed a commit that referenced this pull request Aug 6, 2024
Summary:
port D21375044 (facebook@7fc9eaa) (facebook@7fc9eaa) and D17291883 (facebook@6a6a35f) (facebook@6a6a35f)
**Summary**
* As part of leadership changes, we will trim binlogs/relay logs on a
running mysql instance. This needs the ability to remove gtids from executed_gtid set.
This diff adds the ability to do the same. One can enqueue an event in the raft
listener queue and the raft thread in the server will take care of removing and
updating executed_gtid on a running instance.
* When raft logs are trimmed as part of TruncateOpsAfter(), the gtid of the trimmed trxs are also removed from executed/logged gtids. However, when binlog files are rotated, the previous gtid written to the new file depends on what files are rotated. During  binlog rotation the prev gtid are the logged/executed gtids of the instance. During relay log rotation prev gtids are the retrieved gtids of the instance. Hence, in addition to trimming logged gtids/executed gtids, we should also trim retrieved gtids. This prevents creation of holes in the instances executed_gtid set as the replicaset goes through multiple promotions over its lifetime.

Reviewed By: luqun

Differential Revision: D24690007

---------------------------------------------------------------------------------------

Use DBUG_TRACE instead of DBUG_ENTER

Summary:
several testcase failed in asan debug,

==106552==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f38f3590078 at pc 0x0000093aa495 bp 0x7f38f358ffc0 sp 0x7f38f358ffb8
READ of size 4 at 0x7f38f3590078 thread T203
    #0 0x93aa494 in _db_enter_(char const*, int, char const*, unsigned int, _db_stack_frame_*)
    #1 0x8d0b9bb in Relay_log_info::remove_logged_gtids
    #2 0x8b1ec3f in trim_logged_gtid
    #3 0x8c767cf in process_raft_queue

If Use DBUG_ENTER, then you need to use DBUG_RETURN to pop current frame in CODE_STATE.
If use DBUG_TRACE, it will pop current frame during .dtor

ps. small refactor changes for sql/rpl_handler.cc

Reviewed By: bhatvinay

Differential Revision: D28809418
inikep pushed a commit that referenced this pull request Aug 6, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Aug 6, 2024
Summary:
When COUNT(*) style aggregation appears in queries, the optimizer realizes that a full scan needs to be done. The index doesn't matter - any index will do as it long it satisfies the other query predicates present.

There are two heuristics at play. If there is a choice of multiple indexes, the optimizer will pick the index that has the shortest length since that means less disk scan time, therefore faster query (heuristic #1).

In the case of vector indexes, the current design can have the vector field present in the clustered PK as well as the vector index (eg the flat index). Even though there's no clear advantage, the heuristic is based on INNODB style index layout, and assumes that a secondary index will always be a better choice as long as it does not cover all table columns (heuristic #2). In reality, in both INNODB and MyRocks, the secondary key does contain the PK. But this fact is ignored.

What ends up happening is that by only considered the # of explicitly defined key parts, the vector index ends up getting picked up.

In the case of the IVF index, the vector index may be shorter than the primary clustered index. Thus secondary index will have a shorter key length than the primary, and will end up getting picked up (i.e. if the key part heuristic doesn't already pick the vector index again).

There can be many other index combinations. We want the optimizer to pick another available index. If no other index is available, then pick the primary clustered index.

Differential Revision: D55003211

fbshipit-source-id: 5ce6b34
inikep pushed a commit that referenced this pull request Aug 12, 2024
Summary:
When COUNT(*) style aggregation appears in queries, the optimizer realizes that a full scan needs to be done. The index doesn't matter - any index will do as it long it satisfies the other query predicates present.

There are two heuristics at play. If there is a choice of multiple indexes, the optimizer will pick the index that has the shortest length since that means less disk scan time, therefore faster query (heuristic #1).

In the case of vector indexes, the current design can have the vector field present in the clustered PK as well as the vector index (eg the flat index). Even though there's no clear advantage, the heuristic is based on INNODB style index layout, and assumes that a secondary index will always be a better choice as long as it does not cover all table columns (heuristic #2). In reality, in both INNODB and MyRocks, the secondary key does contain the PK. But this fact is ignored.

What ends up happening is that by only considered the # of explicitly defined key parts, the vector index ends up getting picked up.

In the case of the IVF index, the vector index may be shorter than the primary clustered index. Thus secondary index will have a shorter key length than the primary, and will end up getting picked up (i.e. if the key part heuristic doesn't already pick the vector index again).

There can be many other index combinations. We want the optimizer to pick another available index. If no other index is available, then pick the primary clustered index.

Differential Revision: D55003211

fbshipit-source-id: 5ce6b34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants