Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-7596 [5.7]: MyRocks: Port skipped write_unprepared commits #4227

Merged
merged 4 commits into from
May 28, 2021

Conversation

inikep
Copy link
Collaborator

@inikep inikep commented Mar 10, 2021

  1. Implement write_unprepared feature which was skipped during fb-prod202001 to fb-prod202005 merges.
  2. Updated RocksDB submodule pointer to facebook/rocksdb@e32a64aa54 (v6.16.3)

lth and others added 2 commits April 10, 2021 11:11
Summary:
This adds write unprepared as part of the combinations tested in mtr.

A few things were needed for write unprepared:
1. A new variable `rocksdb_write_batch_flush_threshold` was added that controls how large a write batch needs to be before the transaction starts writing to memtables. This is set to 1 in mtr, so that a flush is forced for every single write, just to stress this functionality.
2. XIDs need to be known at transaction start time, so that writes to the WAL from the same transaction can be correlated together on recovery. There is a debug check now at prepare time to verify that the XID hasn't somehow changed between transaction start and prepare.
3. Some minor test fixes that are documented inline. `rpl_gtid_crash_safe_wal_corrupt.inc` had to be updated because the WAL format is different.

update-submodule: rocksdb

Reviewed By: luqun

Differential Revision: D19909832

fbshipit-source-id: be3f71a73f8
Summary: In a previous change (D19909832), we needed the transaction to know its xid before prepare time for write unprepared, so a call was added in `rocksdb_register_tx` to set the transaction name. However, we don't need this for read only transactions, so we were doing a bit of extra work. To fix this, only call SetName on the first write we make.

Reviewed By: atish2196

Differential Revision: D21666856

fbshipit-source-id: d6e1459ab79

-----------------------

Fix CommitTimeWriteBatch without prepare error in MyRocks DDL

Summary:
This is discovered with rqg_runtime test - it issues a DDL `alter table t1 add column a int` operation that implicit commits current transaction and starts a new transaction for the table copy operation with temp tables.

In this case, the following event happens:
1. Start of every command we reset binlog position in THD to NULL/0
2. DDL calls trans_commit_implicit to implicitly commit current on-going transaction. The transaction go through 2pc as usual, updates binlog position in THD, and writes it to RocksDB as part of CommitTimeWriteBatch during commit
3. DDL starts a new transaction for the in-memory temp table copy operation that doesn't go through binlog (which makes sense since it is purely in memory).
4. DDL commits this transaction and go to ha_commit_low directly without going through ordered_commit
5. As a result, rocksdb_prepare is skippped, since this isn't a MySQL multi-engine 2pc transaction
6. As part of rocksdb_commit, we get the binlog position from earlier transaction, and tries to update with CommitTimeWriteBatch.
7. When RocksDB commits, it notices that this isn't a RocksDB 2pc (prepare isn't called) and yet CommitTimeWriteBatch has non-0 items, and returns a failure

I can see 3 issues here:
1. We should not leak binlog position in THD when a statement implicitly commits current transaction. Usually this is OK if the next transaction is a MySQL 2pc transaction since it'll get overwritten
2. In a non MySQL 2pc transaction where binlog isn't updated and prepare haven't been called, we should never return and write to a CommitTimeWriteBatch.
3. We should not update binlog position in a non-2pc operation with leaked binlog position

This fixes 1) & 3) by clearing binlog position in implicit commit case. 2) should be always correct once 1) is fixed, and if ever there is an issue RocksDB would report an error.

Reference patch: facebook/mysql-5.6@24579b3

Porting Notes:
--------
I had to enable binlog as 5.6 by default has no binlog, in order for this to repro.

Reviewed By: george-reynya

Differential Revision: D21717223

fbshipit-source-id: 4b0b4ba12c1

------------------

Revert "Call SetName only if the transaction is doing writes"

Summary:
This reverts an earlier optimization to only call `SetName` for transactions that do writes, by instrumenting all the places where writes were happening to call `SetName` on demand. The concern is that we might be missing places, so it was just safer to set the name at the start of the transaction.

This revert adds 2 additional things:
1. Calling `SetName` at the beginning only happens for write unprepared, so that we limit the read query regression to that write policy. For other policies, we still call `SetName` at prepare time.
2. Some internal operations also start transactions but without a valid xid (this was exposed after master-info-repository was set to TABLE). These don't go through 2pc though, so it's okay to skip setting name.

Reviewed By: hermanlee

Differential Revision: D22801035

fbshipit-source-id: 2ff451c0f3a
@inikep
Copy link
Collaborator Author

inikep commented Apr 12, 2021

Update submodule pointer to fix issues with `rocksdb.index_merge_rocksdb`.
@inikep inikep changed the title [WIP] PS-7596 [5.7]: MyRocks: Port skipped write_unprepared commits PS-7596 [5.7]: MyRocks: Port skipped write_unprepared commits Apr 28, 2021
@inikep
Copy link
Collaborator Author

inikep commented May 7, 2021

@inikep inikep requested a review from george-lorch May 7, 2021 10:14
Copy link
Contributor

@george-lorch george-lorch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@inikep inikep merged commit 1ed216f into percona:5.7 May 28, 2021
@inikep inikep deleted the PS-7596-5.7 branch May 28, 2021 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants