Use non-atomic flushing with block replay #2207

furszy · 2021-02-16T14:38:31Z

This patch adds an extra "head blocks" to the chainstate, which gives the range of blocks for writes may be incomplete. At the start of a flush, we write this record, write the dirty dbcache entries in 16 MiB batches, and at the end we remove the heads record again. If it is present at startup it means we crashed during flush, and we rollback/roll forward blocks inside of it to get a consistent tip on disk before proceeding.

If a flush completes succesfully, the resulting database is compatible with previous versions. If the node crashes in the middle of a flush, a version of the code with this patch is needed to recovery.

An adaptation of the following PRs with further modifications to the feature_dbcrash.py test to be up-to-date with upstream and solve RPC related bugs.

This requires that we not access pcoinsTip in InitBlockIndex's FlushStateToDisk (so we just skip it until later in AppInitMain) and the LoadChainTip in LoadBlockIndex (which there is already one later in AppinitMain, after ReplayBlocks, so skipping it there is fine). Includes some simplifications by Suhas Daftuar and Pieter Wuille.

>>> Adaptation of btc@176c021d085f5a45bc9e038e760942aa648dd797 up to the present. Adds new functional test, dbcrash.py, which uses -dbcrashratio to exercise the logic for recovering from a crash during chainstate flush. dbcrash.py is added to the extended tests, as it may take ~10 minutes to run Use _Exit() instead of exit() for crash simulation This eliminates stderr output such as: terminate called without an active exception or Assertion failed: (!pthread_mutex_destroy(&m)), function ~recursive_mutex, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp, line 104. Eliminating the stderr output on crash simulation allows testing with test_runner.py, which reports a test as failed if stderr is produced.

This should fix a very rare travis failure in zapwallettxes, but is also more correct, as you can currently race ReacceptWalletTransactions with stop RPC calls to get bitcoind to (IMO) eroneously return a non-0 exit code.

A rare race condition may trigger while awaiting the body of a message, see upsteam commit 5ff8eb26371c4dc56f384b2de35bea2d87814779 for details. This may fix some reported rpc hangs/crashes.

The bug was introduced in 2.1.6-beta, versions before that don't need the workaround.

This prevents a potential race condition if control flow ends up in `ShutdownHTTPServer` before the thread gets to `queue->Run()`, deleting the work queue while workers are still going to use it. Meant to fix bitcoin#12362. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>

This function, which waits for all threads to exit, is no longer needed now that threads are joined instead. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>

The HTTP worker thread counter, as well as the RAII object that was used to maintain it, is unused now, so can be removed. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>

Adaptation from btc@fa5b440971a0dfdd64c1b86748a573fcd7dc65d3

furszy · 2021-02-18T13:17:43Z

Added two more commits solving the RPC timeout, GA should be good now. Ready for review.

random-zebra

Good stuff. Code review ACK with some points.

src/init.cpp

src/validation.cpp

src/chain.cpp

src/chain.h

src/validation.cpp

furszy · 2021-02-19T21:06:42Z

Done @random-zebra, commit cherry-picked.

…element.

random-zebra

ACK aab15d7

Fuzzbawls

ACK aab15d7

sipa and others added 5 commits February 16, 2021 11:33

[MOVEONLY] Move LastCommonAncestor to chain

8d6625f

Non-atomic flushing using the blockchain as replay journal

8540113

Adapt memory usage estimation for flushing

72f3b17

Random db flush crash simulator

93f2b15

furszy self-assigned this Feb 16, 2021

furszy force-pushed the 2020_feature_dbcrash branch from b392971 to be3da45 Compare February 16, 2021 14:43

furszy added Backport Bug Upstream UTXO DBs and Indexes and removed Backport labels Feb 16, 2021

furszy and others added 9 commits February 18, 2021 10:03

Always return true if AppInitMain got to the end

50e5833

This should fix a very rare travis failure in zapwallettxes, but is also more correct, as you can currently race ReacceptWalletTransactions with stop RPC calls to get bitcoind to (IMO) eroneously return a non-0 exit code.

rpc: work-around an upstream libevent bug

75af065

A rare race condition may trigger while awaiting the body of a message, see upsteam commit 5ff8eb26371c4dc56f384b2de35bea2d87814779 for details. This may fix some reported rpc hangs/crashes.

rpc: further constrain the libevent workaround

7d68769

The bug was introduced in 2.1.6-beta, versions before that don't need the workaround.

http: Remove WaitExit from WorkQueue

e24c710

This function, which waits for all threads to exit, is no longer needed now that threads are joined instead. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>

http: Remove numThreads and ThreadCounter

67aebbf

The HTTP worker thread counter, as well as the RAII object that was used to maintain it, is unused now, so can be removed. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>

shutdown: Stop threads before resetting ptrs

0f832e3

qa: Extract rpc_timewait as test param

c76fa04

Adaptation from btc@fa5b440971a0dfdd64c1b86748a573fcd7dc65d3

furszy force-pushed the 2020_feature_dbcrash branch from be3da45 to c76fa04 Compare February 18, 2021 13:03

furszy changed the title ~~[WIP] Use non-atomic flushing with block replay~~ Use non-atomic flushing with block replay Feb 18, 2021

furszy added this to the 5.1.0 milestone Feb 18, 2021

random-zebra reviewed Feb 19, 2021

View reviewed changes

src/init.cpp Outdated Show resolved Hide resolved

src/validation.cpp Outdated Show resolved Hide resolved

src/chain.cpp Outdated Show resolved Hide resolved

src/chain.h Outdated Show resolved Hide resolved

src/validation.cpp Outdated Show resolved Hide resolved

[Refactoring] Use const CBlockIndex* where appropriate

e898353

ReplayBlocks: use find instead of brackets operator to access to the …

aab15d7

…element.

furszy force-pushed the 2020_feature_dbcrash branch from 45aa5b7 to aab15d7 Compare February 19, 2021 21:44

furszy requested review from Fuzzbawls and random-zebra February 20, 2021 12:36

random-zebra approved these changes Feb 20, 2021

View reviewed changes

Fuzzbawls approved these changes Feb 21, 2021

View reviewed changes

random-zebra merged commit ac52366 into PIVX-Project:master Feb 21, 2021

furszy deleted the 2020_feature_dbcrash branch November 29, 2022 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use non-atomic flushing with block replay #2207

Use non-atomic flushing with block replay #2207

Uh oh!

furszy commented Feb 16, 2021 •

edited

Loading

Uh oh!

furszy commented Feb 18, 2021

Uh oh!

random-zebra left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

furszy commented Feb 19, 2021

Uh oh!

random-zebra left a comment

Uh oh!

Fuzzbawls left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Use non-atomic flushing with block replay #2207

Use non-atomic flushing with block replay #2207

Uh oh!

Conversation

furszy commented Feb 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

furszy commented Feb 18, 2021

Uh oh!

random-zebra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

furszy commented Feb 19, 2021

Uh oh!

random-zebra left a comment

Choose a reason for hiding this comment

Uh oh!

Fuzzbawls left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

furszy commented Feb 16, 2021 •

edited

Loading