-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MyRocks] Fixing index_merge query returning wrong results (#624) #626
Conversation
@yoshinorim has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness.
@yoshinorim updated the pull request - view changes - changes since last import |
@yoshinorim has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
1 similar comment
@yoshinorim has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…sults (facebook#624) Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Github Author: Yoshinori Matsunobu <yoshinori@fb.com> Github PR Sync: {sync, type="child", parent="github", parentrepo="facebook/mysql-5.6", parentprnum="626", parentprfbid="174177239779441"} Test Plan: Imported from GitHub, without a `Test Plan:` line. Reviewers: herman Reviewed By: herman Subscribers: herman, webscalesql-eng@fb.com Differential Revision: https://phabricator.intern.facebook.com/D5088547 Signature: t1:5088547:1495137850:d336321f11952106f380e06cc34f56330b3ee02c
Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook/mysql-5.6#626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: f025617
Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook/mysql-5.6#626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: f025617
Upstream commit ID : fb-mysql-5.6.35/ee503a86f470f89e9e43d5d8d39f142fd310940c Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook/mysql-5.6#626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: f025617
Upstream commit ID : fb-mysql-5.6.35/ee503a86f470f89e9e43d5d8d39f142fd310940c Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook/mysql-5.6#626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: f025617
Upstream commit ID : fb-mysql-5.6.35/ee503a86f470f89e9e43d5d8d39f142fd310940c Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook/mysql-5.6#626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: f025617
Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes #626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: 3cc770c
…book#626) Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: 3cc770c
…book#626) Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: 3cc770c
…book#626) Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: 3cc770c
…book#626) Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: 3cc770c
…book#626) Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Reviewed By: hermanlee Differential Revision: D5088547 Pulled By: yoshinorim fbshipit-source-id: 3cc770c
Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](70f2bcd), and [issue 624](#624) and [PR 626](#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reviewed By: lth Differential Revision: D25521518 fbshipit-source-id: a46ed3e71fa
Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes #626 Differential Revision: D5088547 (3b2cc9f) Pulled By: yoshinorim fbshipit-source-id: 9f0aee0da20
Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](70f2bcd), and [issue 624](#624) and [PR 626](#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: 44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Fix buffer management issues during online alter Upstream commit ID : fb-mysql-5.6.35/edce7e9f7518a0a9159a2cf5c63e8e8f340a0481 PS-7731 : Merge percona-202102 Summary: This fixes two issues related to online alter found by the rocksdb.rqg_runtime test. 1. A buffer overflow would happen occasionally, and this was caused by the fact that if an online alter fails for whatever reason (eg. duplicate key error), then we never restore the old buffer from the old schema, and if the old buffers were larger than the new buffers, you could run into buffer overflow in later queries. The fix is to reallocate the old buffers. 2. Memory allocated for holding blobs was sometimes leaked during online alter. This is because in `Rdb_key_def::unpack_record`, the blob is allocated from the handler on the `TABLE` object passed in, and this is usually usually the correct handler (ie. `table->file` matches the handler object that called the unpack function). However, during online alter, we pass in `altered_table` instead, so the buffer ends up being allocated on `altered_table->file`. That handler object was never opened with `ha_rocksdb::open`, so it never gets closed and the buffer leaks. To fix this, call we free the buffers after online alter. Reviewed By: Pushapgl Differential Revision: D26891145 fbshipit-source-id: 2a38e485cd1 Fix a memory leak in "rocksdb_init_internal" fbshipit-source-id: 07ffe574130 Remove some dead code and unused variables Upstream commit ID : fb-mysql-5.6.35/99affe6c5677b2dbdc0aaeb31b8e5182c622fb6d PS-7731 : Merge percona-202102 Summary: 1. dup_ref is unused unless the storage engine supports HA_DUPLICATE_POS as a table flag. This means that m_pk_tuple is also useless, and is removed. 2. Remove unused parameter end_key_packed_size on calc_eq_cond_len. 3. Remove unused m_pk_key_parts member. 4. Remove unused is_ascending declaration. 5. Remove unused parameter on read_key_exact 6. The ha_rocksdb::read_range_first override was only useful when index_read_map_impl required an explicit end_key parameter. However, there is now code to fallback to using end_range, this override is no longer useful. This also obviates the distinction between index_read_map_impl and index_read_map. 7. Remove unused function remove_rows. Reviewed By: luqun Differential Revision: D23300062 fbshipit-source-id: 0220b839690 2x integer key decoding perf Upstream commit ID : fb-mysql-5.6.35/0bf45d644729f4beed6280b38c55a557fbad5698 PS-7731 : Merge percona-202102 Summary: In my testing this reduces the overhead of integer decoding by 2x. The idea of the fix is straight-forward: * Don't call Field* to get information, instead cache as much information as possible in our Rdb_field_packing, similar to InnoDB's row_prebuilt_t. This has a few advantages: * Avoid expensive virtual function calls * More cache friendly * Similarly, don't use field->move_field / move_field_offset to move field to the target buffer and then back - just write to the buffer using pre-calculated field offset directly * Use template integer unpacking function to enable compiler to unroll expensive loops into a constant number of mov instructions. NOTE: FDO data needs to be updated for this diff in order to take full advantage of this change. Reference Patch: facebook/mysql-5.6@9af9aa5 --- Porting Notes: There are some additional changes required to account for covering TEXT/BLOB changes, especially for blob I need to be able to call into ha_rocksdb->get_blob_buffer. I've added a new struct Rdb_unpack_func_context to be able to pass any additional required arguments without having to update all the unpack functions. For now I'm only passing in the table that we can use to get the ha_rocksdb handler. Reviewed By: Pushapgl Differential Revision: D25800694 fbshipit-source-id: d5f347d0a98 Faster value decoding Upstream commit ID : fb-mysql-5.6.35/0e45b20a6a7e205f230314e1c6d1d91b752ac372 PS-7731 : Merge percona-202102 Summary: Similar to integer key decoding improvement, but for values: * Remove dependency to field* and cache those information in Rdb_field_encoder * No more Field::move_field - just write to buffer directly We were originally planning to use Rdb_protocol_value_decoder for MySQL bypass project to eliminate overhead of decoding to integer then convert to string, but in practice it probably makes little difference and it is not going to be needed when we go with bypass RPC (which is thrift binary protocol). So this changes removes Rdb_protocol_value_decoder completely. One thing that worth mentioning is I've unified pack_length and pack_length_in_rec because they are essentially identical in our scenarios and in the case they could be different (more packed BITS format that borrows bits from null byte when storage engine supports it) we don't really support it at all, so unifying it to avoid confusion and assert it. With this MyRocks is now either faster or on par with InnoDB when it comes to SELECT integer columns stored in value. NOTE:FDO data needs to be updated for this diff in order to take full advantage of this change. Reference Patch: facebook/mysql-5.6@f4cefba Reviewed By: Pushapgl Differential Revision: D25880543 fbshipit-source-id: 19faf536554 Improve count(*) performance and fix index merge bug Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab Skip decoding completely during count and improve SK count perf by 20x Upstream commit ID : fb-mysql-5.6.35/c059dde397f327982bc6abe555ca63756f4b9fa9 PS-7731 : Merge percona-202102 Summary: 8.0 is taking a new approach with doing counts - MySQL will call handler::records / records_with_index which will then use rnd_init+rnd_next / index_init+index_next to iterate through everything. If you try count(*) you'll see MyRocks 8.0 is actually much slower when iterating the table with secondary keys than 5.6 (the optimizer picks secondary key for whatever reason). This is because MySQL no longer calls `handler::extra(KEY_READ)` so we no longer get the hints to only decode the key - instead we'll see that the SK is not covering so we fall back to PK, which dramatically slows down by *a lot*. This fixes it by adding a m_iteration_only = true hint - with this hint we'll just quickly iterate through everything without any decoding nor falling back to PK. Note even with this fix, if you compare MyRocks 8.0 with InnoDB 8.0, you'll see that InnoDB 8.0 is also much faster - this is because InnoDB overrides the implementation and iterate the records directly in parallel (default to 4). This is something we can do as well. Reviewed By: Pushapgl Differential Revision: D26272733 fbshipit-source-id: 53b1ae328f6 Bump rocksdb to 6.19 Upstream commit ID : fb-mysql-5.6.35/a69277edd3e9766aead9634dc09253fdff2df908 PS-7731 : Merge percona-202102 Summary: This re-introduces Luqun's get_rocksdb_files.sh (which was reverted due to RocksDB perf regression in 6.17) to support building RocksDB 6.17+ and updates RocksDB to 6.19. The regression in 6.17 was reverted in RocksDB 6.18 so we should be able to proceed with new version. update-submodule: rocksdb Reviewed By: Pushapgl Differential Revision: D27471365 fbshipit-source-id: 75a1ff9b4f3 Integrate rocksdb fault injection framework with myrocks Upstream commit ID : fb-mysql-5.6.35/136b9ecdb2427b7468f832be5ef98a8ed057e2fc PS-7731 : Merge percona-202102 Summary: The change integrates rocksdb::FaultInjectionTestFS with MyRocks code. MyRocks exposes system variable rocksdb_fault_injection_options that takes in options in json format. This is first iteration which exposes write error injection. The injection framework will end up supporting more functionality in the future. ``` rocksdb_fault_injection_options={"retry":true,"onein":1000,"filetypes":["kTableFile","kDescriptorFile"]} ``` Reviewed By: yizhang82 Differential Revision: D27477107 fbshipit-source-id: eff000c3e93 Check for errors during inplace_populate_sk Upstream commit ID : fb-mysql-5.6.35/c7c590acbcbfd60b1afbbc90c8162d8b5cadbc67 PS-7731 : Merge percona-202102 Summary: In `inplace_populate_sk`, `ha_index_init` can return an error if the query has been killed, and if that happens, it's not safe to continue with calling `index_first`/`index_next`. The fix is to check for errors and return early if needed. Reviewed By: luqun Differential Revision: D26960027 (facebook/mysql-5.6@f12eea3) fbshipit-source-id: eafc511dce4 Return stats from storage engine directly without table open Upstream commit ID : fb-mysql-5.6.35/18d7f5ba8304c3bf470b8fc27de601eafe467910 PS-7731 : Merge percona-202102 Summary: In 8.0 query from i_s.tables are *much* slower (~5x) most likely due to table open/close are much slower - profiling data showed that 74% time spent on opening tables and 19.71% time spent on reading histograms, only 2.41% spent on the actual stats inside MyRocks. 8.0 provides a new callback get_table_statistics and get_index_column_cardinality that when "overridden", signal MySQL to go call those functions to fill in stats directly, which should be much faster, with some caveats: * Our stats code path need to work without table open - this fix updates our stat code path to strictly use Rdb_tbl_def and Rdb_key_def. One catch is that we won't be able to update the auto_inc value correctly because we won't know if the table actually has AUTO_INCREMENT without opening it, so there are some corner cases the value might be NULL (such as immediately after creating a table with AUTO_INCREMENT). This actually aligns with ROCKSDB_DDL table behavior and InnoDB behavior as well. Once you open the table, the cached Rdb_tbl_def::m_auto_inc_val should be updated properly. * We also need to make sure the lifetime / locking works properly with regarding to DDL and special cases like drop cf. Fortunately the table_stats code path does take MDL_EXPLICIT lock with the db_name.table_name, so correctness should be straight-forward. One special case if DROP TABLE is running at the same time, the MDL_EXPLICIT lock may succeed *after* drop table succeed, and Rdbl_ddl_manager::find would return null, and we would give default values in the columns and with a error message saying table cannot be found. Note this version only overrides get_table_statistics (and it updates SQL layer to treat them separately) and we can add get_index_column_cardinality in the future if there is a need - right now there is little evidence that we are querying information_schema.statistics much (if not at all). Reviewed By: hermanlee Differential Revision: D26624028 fbshipit-source-id: 3b41f25d61c Fix rows_read and queries_range stats in bypass Upstream commit ID : fb-mysql-5.6.35/50bfa5b89b2240eb84ad81f6c0e90bf8e0cbae60 PS-7767 : Merge percona-202103 Summary: Bypass updates rows_sent and rows_examined but didn't update rows_read. This fixes that and adds tests. We also get rocksdb_queries_point and rocksdb_queries_range for free as we are using MyRocks API to access data. Note I didn't include rows_examined in the test because querying unrelated information_schema tables will affect rows_examined and it is kinda flaky depending on many factors like perf schema. I did found a double counting issue with rocksdb_queries_range and fixed it as well. Reference Patch: facebook/mysql-5.6@6106eba ---- Porting Notes: Removed updates to table stats which is not ported to 8.0. Differential Revision: D25809584 fbshipit-source-id: 4cbb31e5b35 Refactoring index covering check to match MyRocks and add non-covering SK tests and rocksdb_covering_secondary_key stats Upstream commit ID : fb-mysql-5.6.35/7194dfd7679aa70bda91236d0d6455d5324e615b PS-7731 : Merge percona-202102 Summary: When considering when a secondary key is covering, we need to also consider lookup bitmap for some special cases like using covering bitmap w/ prefix keys. This isn't much of an issue with 5.6 today because we don't actually enable covering bitmap format in prod but in 8.0 it might be an issue. Either way this refactors the covering check to be exactly the same as MyRocks does. Also adds more tests for covering / non covering prefix tests. Also in order to catch cases where we incorrectly consider covering secondary keys to be non-covering, I've added rocksdb_covering_secondary_key_count support into bypass (should've done that earlier) and added checks in secondary range key tests to ensure covering check is done correctly. Reference Patch: facebook/mysql-5.6@30e9039 ---- Porting Notes: * Fixed an issue that we need to explicitly compare with Rdb_key_def::KEY_COVERED * The scenario with prefix key is now correctly handled and re-enabled in bypass_select_scenarios.inc Reviewed By: Pushapgl Differential Revision: D26566848 fbshipit-source-id: f149f119da4
Upstream commit ID : fb-mysql-5.6.35/25b42b5d4365b6d2deb6bf40da9e776cd2e56698 PS-7780 : Merge fb-prod202101 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reviewed By: lth Differential Revision: D25521518 fbshipit-source-id: a46ed3e71fa
…book#626) (facebook#626) Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Differential Revision: D5088547 (facebook@3b2cc9f) Pulled By: yoshinorim fbshipit-source-id: 9f0aee0da20
…book#626) [non-RocksDB part] Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Differential Revision: D5088547 Pulled By: yoshinorim
Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook@70f2bcd), and [issue 624](facebook#624) and [PR 626](facebook#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470
…book#626) [non-RocksDB part] Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Differential Revision: D5088547 Pulled By: yoshinorim
Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook@70f2bcd), and [issue 624](facebook#624) and [PR 626](facebook#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470
…book#626) [non-RocksDB part] Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Differential Revision: D5088547 Pulled By: yoshinorim
Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook@70f2bcd), and [issue 624](facebook#624) and [PR 626](facebook#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook@70f2bcd), and [issue 624](facebook#624) and [PR 626](facebook#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
…book#626) [non-RocksDB part] Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Differential Revision: D5088547 Pulled By: yoshinorim
Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook@70f2bcd), and [issue 624](facebook#624) and [PR 626](facebook#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470
…book#626) [non-RocksDB part] Summary: This diff fixed MyRocks returning missing rows if index merge query plan was used. When index merge is selected, table->read_map is cleared at QUICK_RANGE_SELECT::init_ror_merged_scan(). Then ha_rocksdb::setup_read_decoders() wrongly decided not to decode all fields, which made MySQL layer decide that keys did not match. This diff changes ha_rocksdb::setup_read_decoders() to always decode if read_map is cleared. A side effect is decoding all fetched fields on index merge. There is a slight performance penalty but much better than returning wrong results. This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc test cases for MyRocks. Since MyRocks query plan is less stable than InnoDB, it skips using explain and just verifies data correctness. Closes facebook#626 Differential Revision: D5088547 Pulled By: yoshinorim
Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook@70f2bcd), and [issue 624](facebook#624) and [PR 626](facebook#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470
Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook@70f2bcd), and [issue 624](facebook#624) and [PR 626](facebook#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Summary: This diff fixed MyRocks returning missing rows if index
merge query plan was used. When index merge is selected,
table->read_map is cleared at
QUICK_RANGE_SELECT::init_ror_merged_scan(). Then
ha_rocksdb::setup_read_decoders() wrongly decided not to decode
all fields, which made MySQL layer decide that keys did not match.
This diff changes ha_rocksdb::setup_read_decoders() to
always decode if read_map is cleared. A side effect is decoding
all fetched fields on index merge. There is a slight performance
penalty but much better than returning wrong results.
This diff reuses index_merge_ror.inc and index_merge_2sweeps.inc
test cases for MyRocks. Since MyRocks query plan is less stable than
InnoDB, it skips using explain and just verifies data correctness.