Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dash 20.0.1 stop sync at verifying last block #5741

Closed
lcgogo opened this issue Nov 28, 2023 · 6 comments
Closed

dash 20.0.1 stop sync at verifying last block #5741

lcgogo opened this issue Nov 28, 2023 · 6 comments

Comments

@lcgogo
Copy link

lcgogo commented Nov 28, 2023

Expected behavior

I run some dash in docker, after docker stop/start, several dash do not sync with followed log. The others restart are ok. But I am not sure whether they are ok at next restart.
Never found this issue in dash 19.x.x

2023-11-28T14:34:09Z LoadBlockIndexDB: timestamp index disabled
2023-11-28T14:34:09Z LoadBlockIndexDB: spent index disabled
2023-11-28T14:34:09Z Opening LevelDB in /data/dash/chainstate
2023-11-28T14:34:09Z Opened LevelDB successfully
2023-11-28T14:34:09Z Using obfuscation key for /data/dash/chainstate: f6f230e5bd669995
2023-11-28T14:34:10Z Loaded best chain: hashBestChain=000000000000002cb95fcd940f3fd3e9aa0648b39e3bb2630a0d93fa52885eb8 height=1978082 date=2023-11-27T02:15:08Z progress=0.999671
2023-11-28T14:34:10Z CDeterministicMNManager::MigrateDBIfNeeded -- upgrading DB to migrate MN type
2023-11-28T14:34:10Z CDeterministicMNManager::MigrateDBIfNeeded -- migration already done. skipping.
2023-11-28T14:34:10Z CDeterministicMNManager::MigrateDBIfNeeded2 -- upgrading DB to migrate MN state bls version
2023-11-28T14:34:10Z CDeterministicMNManager::MigrateDBIfNeeded2 -- migration already done. skipping.
2023-11-28T14:34:10Z init message: Verifying blocks...
2023-11-28T14:34:10Z AppInitMain: bls_legacy_scheme=0
2023-11-28T14:34:10Z Verifying last 6 blocks at level 3
^C

To reproduce

System information

@lcgogo
Copy link
Author

lcgogo commented Nov 30, 2023

@PastaPastaPasta Please help to provide some suggestion.

@qwizzie
Copy link

qwizzie commented Nov 30, 2023

If this is Dashmate related (you mentioned running dash in docker) , you could contact pshenmic on dash.org/forum through 'conversation' or post your Dashmate issue here : https://www.dash.org/forum/index.php?threads/dashmate-discussion.53951/#post-235627

To be used if you don't get a timely response here....

@UdjinM6
Copy link

UdjinM6 commented Nov 30, 2023

@lcgogo I believe I had an issue like that once but I couldn't reproduce it. Pls try removing following folders from .dashcore/llmq one by one in the exact order I specified below. Try starting a node each time you deleted a folder to check which one is potentially causing the issue:

  1. isdb
  2. dkgdb
  3. recsigdb

@UdjinM6
Copy link

UdjinM6 commented Nov 30, 2023

Ideally, if you want to help debugging this, backup .dashcore/llmq folder before removing internals to restore it later and test if removing subfolders in another order solves the issue on the exact same one again (if it does at all ofc).

@lcgogo
Copy link
Author

lcgogo commented Nov 30, 2023

@lcgogo I believe I had an issue like that once but I couldn't reproduce it. Pls try removing following folders from .dashcore/llmq one by one in the exact order I specified below. Try starting a node each time you deleted a folder to check which one is potentially causing the issue:

  1. isdb
  2. dkgdb
  3. recsigdb

Thanks, will try it. But i am not sure 100% reproduce it because I tried to manually restart some dash containers, but not meet the issue.

@lcgogo
Copy link
Author

lcgogo commented Dec 1, 2023

Ideally, if you want to help debugging this, backup .dashcore/llmq folder before removing internals to restore it later and test if removing subfolders in another order solves the issue on the exact same one again (if it does at all ofc).

Meet the issue again, after rm -rf llmq/dkgdb/ and start the dash can continue to sync. Many thanks!

PastaPastaPasta pushed a commit that referenced this issue Dec 5, 2023
…atures in coinbase (#5752)

## Issue being fixed or feature implemented
Now that we have ChainLock sigs in coinbase `VerifyDB()` have to process
them. It works most of the time because usually we simply read
contributions from quorum db
https://github.com/dashpay/dash/blob/develop/src/llmq/quorums.cpp#L385.
However, sometimes these contributions aren't available so we try to
re-build them
https://github.com/dashpay/dash/blob/develop/src/llmq/quorums.cpp#L388.
But by the time we call `VerifyDB()` bls worker threads aren't started
yet, so we keep pushing jobs into worker's queue but it can't do
anything and it halts everything.

backtrace:
```
  * frame #0: 0x00007fdd85a2873d libc.so.6`syscall at syscall.S:38
    frame #1: 0x0000555c41152921 dashd_testnet`std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) + 225
    frame #2: 0x0000555c40e22bd2 dashd_testnet`CBLSWorker::BuildQuorumVerificationVector(Span<std::shared_ptr<std::vector<CBLSPublicKey, std::allocator<CBLSPublicKey> > > >, bool) at atomic_futex.h:102:36
    frame #3: 0x0000555c40d35567 dashd_testnet`llmq::CQuorumManager::BuildQuorumContributions(std::unique_ptr<llmq::CFinalCommitment, std::default_delete<llmq::CFinalCommitment> > const&, std::shared_ptr<llmq::CQuorum> const&) const at quorums.cpp:419:65
    frame #4: 0x0000555c40d3b9d1 dashd_testnet`llmq::CQuorumManager::BuildQuorumFromCommitment(Consensus::LLMQType, gsl::not_null<CBlockIndex const*>) const at quorums.cpp:388:37
    frame #5: 0x0000555c40d3c415 dashd_testnet`llmq::CQuorumManager::GetQuorum(Consensus::LLMQType, gsl::not_null<CBlockIndex const*>) const at quorums.cpp:588:37
    frame #6: 0x0000555c40d406a9 dashd_testnet`llmq::CQuorumManager::ScanQuorums(Consensus::LLMQType, CBlockIndex const*, unsigned long) const at quorums.cpp:545:64
    frame #7: 0x0000555c40937629 dashd_testnet`llmq::CSigningManager::SelectQuorumForSigning(Consensus::LLMQParams const&, llmq::CQuorumManager const&, uint256 const&, int, int) at signing.cpp:1038:90
    frame #8: 0x0000555c40937d34 dashd_testnet`llmq::CSigningManager::VerifyRecoveredSig(Consensus::LLMQType, llmq::CQuorumManager const&, int, uint256 const&, uint256 const&, CBLSSignature const&, int) at signing.cpp:1061:113
    frame #9: 0x0000555c408e2d43 dashd_testnet`llmq::CChainLocksHandler::VerifyChainLock(llmq::CChainLockSig const&) const at chainlocks.cpp:559:53
    frame #10: 0x0000555c40c8b09e dashd_testnet`CheckCbTxBestChainlock(CBlock const&, CBlockIndex const*, llmq::CChainLocksHandler const&, BlockValidationState&) at cbtx.cpp:368:47
    frame #11: 0x0000555c40cf75db dashd_testnet`ProcessSpecialTxsInBlock(CBlock const&, CBlockIndex const*, CMNHFManager&, llmq::CQuorumBlockProcessor&, llmq::CChainLocksHandler const&, Consensus::Params const&, CCoinsViewCache const&, bool, bool, BlockValidationState&, std::optional<MNListUpdates>&) at specialtxman.cpp:202:60
    frame #12: 0x0000555c40c00a47 dashd_testnet`CChainState::ConnectBlock(CBlock const&, BlockValidationState&, CBlockIndex*, CCoinsViewCache&, bool) at validation.cpp:2179:34
    frame #13: 0x0000555c40c0e593 dashd_testnet`CVerifyDB::VerifyDB(CChainState&, CChainParams const&, CCoinsView&, CEvoDB&, int, int) at validation.cpp:4789:41
    frame #14: 0x0000555c40851627 dashd_testnet`AppInitMain(std::variant<std::nullopt_t, std::reference_wrapper<NodeContext>, std::reference_wrapper<WalletContext>, std::reference_wrapper<CTxMemPool>, std::reference_wrapper<ChainstateManager>, std::reference_wrapper<CBlockPolicyEstimator>, std::reference_wrapper<LLMQContext> > const&, NodeContext&, interfaces::BlockAndHeaderTipInfo*) at init.cpp:2098:50
    frame #15: 0x0000555c4082fe11 dashd_testnet`AppInit(int, char**) at bitcoind.cpp:145:54
    frame #16: 0x0000555c40823c64 dashd_testnet`main at bitcoind.cpp:173:20
    frame #17: 0x00007fdd85934083 libc.so.6`__libc_start_main(main=(dashd_testnet`main at bitcoind.cpp:160:1), argc=3, argv=0x00007ffcb8ca5b88, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007ffcb8ca5b78) at libc-start.c:308:16
    frame #18: 0x0000555c4082f27e dashd_testnet`_start + 46
```

Fixes #5741

## What was done?
Start LLMQContext early. Alternative solution could be moving bls worker
Start/Stop into llmq context ctor/dtor.

## How Has This Been Tested?
I had a node with that issue. This patch fixed it.

## Breaking Changes
Not sure, hopefully none.

## Checklist:
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have added or updated relevant unit/integration/functional/e2e
tests
- [ ] I have made corresponding changes to the documentation
- [x] I have assigned this pull request to a milestone _(for repository
code-owners and collaborators only)_
@UdjinM6 UdjinM6 closed this as completed in 02c5edc Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants