Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexes: Read the locator's top block during init, allow interaction with reindex-chainstate #25193

Merged

Conversation

mzumsande
Copy link
Contributor

@mzumsande mzumsande commented May 23, 2022

This makes two improvements to the index init phase:

1) Prevent index corruption in case a reorg happens when the index was switched off:
This is done by reading in the top block stored in the locator instead of looking for a fork point already in BaseIndex::Init().
Before, we'd just go back to the fork point by calling FindForkInGlobalIndex(), which would have corrupted the coinstatsindex because its saved muhash needs to be reverted step by step by un-applying all blocks in between, which wasn't done before. This is now being done a bit later in ThreadSync(), which has existing logic to call the custom Rewind() method when going back along the chain to the forking point (thanks ryanofsky for pointing this out to me!).

2) Allow using the -reindex-chainstate option without needing to disabling indexes:
With BaseIndex::Init() not calling FindForkInGlobalIndex() anymore, we can allow reindex-chainstate with active indexes. reindex-chainstate deletes the chain and rebuilds it later in ThreadImport, so there is no chain available during BaseIndex::Init(), which would lead to problems (see #24789).
But now we'll only need the chain a bit later in BaseIndex::ThreadSync, which will wait for the reindex-chainstate in ThreadImport to finish and will continue syncing after that.

@mzumsande
Copy link
Contributor Author

One thing I'm unsure about is that part 1) of this PR will now make it impossible to go back if we don't know the top block of the locator for some reason - although I don't know how this would be possible because I don't know of a process that would prune stale blocks from the Block Index (except the contrib/linearize script).

The old code could recover an index with a no-longer existing best block from that (at the cost of corrupting the coinstatsindex), but I wonder if it would be good to still call FindForkInGlobalIndex(), maybe as a fallback, for the other indexes.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 23, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK ryanofsky
Concept ACK jonatack
Stale ACK willcl-ark, pinheadmz

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #27607 (init: verify blocks data existence only once for all the indexers by furszy)
  • #27596 (assumeutxo (2) by jamesob)
  • #27125 (refactor, kernel: Decouple ArgsManager from blockstorage by TheCharlatan)
  • #25302 (build: Check usages of #if defined(...) by brokenprogrammer)
  • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)
  • #19792 (rpc: Add dumpcoinstats by fjahr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jonatack
Copy link
Member

jonatack commented May 23, 2022

Concept ACK

src/index/base.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Light code review ACK 1e598cf. First commit looks good. I had some questions and suggestions about some things in the second and third commits, but all the code looked ok and seemed like it should work

@@ -885,6 +886,7 @@ void ThreadImport(ChainstateManager& chainman, std::vector<fs::path> vImportFile
StartShutdown();
return;
}
fReindexChainState = false;
Copy link
Contributor

@ryanofsky ryanofsky Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "node: add fReindexChainState flag to node" (8898261)

I understand that what this is doing with the new fReindexChainState global is similar to what happens with the existing fReindex global, but I think what happens with the fReindex global is not ideal. I think it's unnecessarily fragile how the value starts off false, then switches to true, then switches back to false again, and think a one way latch would be better. I think it's bad the the variable doesn't have any straightforward meaning but is some combination "was reindexing requested?" and "is reindexing in progress?". Also in this commit, fReindexChainState is different local variables subtly replaced by a global so you can't tell just looking at a diff whether all the existing fReindexChainState references were updated correctly.

I'd suggest leaving fReindexChainState alone, and adding a simpler atomic_bool g_indexes_ready_to_sync = false global variable, and setting it to true here, and in the normal code path. I think this would make the diff and the overall initialization sequence simpler and easier to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I dropped the commit that made fReindexChainState a global and added g_indexes_ready_to_sync as suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making g_indexes_ready_to_sync a one-way switch requires to manually set it to true in the unit tests though. I did that in TestingSetup::TestingSetup() to avoid setting it in each unit tests that uses indexes separately.

@@ -76,7 +78,7 @@ bool BaseIndex::Init()
SetBestBlockIndex(locator_index);
}
m_synced = m_best_block_index.load() == active_chain.Tip();
if (!m_synced) {
if (!m_synced && fPruneMode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: Enable reindex-chainstate with active indexes" (1e598cf)

This is pretty opaque and could use a comment. I'm also not sure it is better to be looking at fPruneMode here, than just checking whether reindexing is happening, when it sounds like reindexing is the real problem, and checking for fPruneMode is just a proxy for avoiding the real source of the problem?

Also it is not obvious to me that just because pruning isn't currently enabled doesn't mean pruning wasn't previously enabled and there couldn't be missing blocks worth checking for here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it is not obvious to me that just because pruning isn't currently enabled doesn't mean pruning wasn't previously enabled and there couldn't be missing blocks worth checking for here.

I think that this shouldn't be possible: We check in Init when loading the chainstate (chainman.LoadBlockIndex()) that we either have all the blocks from genesis, or allow for pruning - otherwise, we abort with an InitError and never get to the point where the indexes are started.

I replaced the check with the indirect g_indexes_ready_to_sync as suggested and added a comment. This way, the pruning check will continue to be executed regardless of pruning status (unless -reindex is specified).

What I don't like is that this would throw a confusing InitError message if it somehow failed on a non-pruning node ("block of the index goes beyond pruned data"). Maybe the check should assert instead if fPrune==false?

SetBestBlockIndex(m_chainstate->FindForkInGlobalIndex(locator));
// Setting the best block to the locator's top block. If it is not part of the
// best chain, we will rewind to the fork point during index sync
const CBlockIndex* locator_index{m_chainstate->m_blockman.LookupBlockIndex(locator.vHave.front())};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving as a note, but I think this allows the if (!active_chain.Contains(block_to_test)) { case to be hit. Previously FindForkInGlobalIndex would return a CBlockIndex only in the current chain and so active_chain.Contains would always be true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #25193 (comment)

In commit "index: Use first block from locator instead of looking for fork point" (20bd221)

Yes, this seems to be the case. This is a good indication the original author or this code expecting was expecting to FindForkInGlobalIndex to just return a block based on the locator, not necessarily a block on the current chain. So new code is probably closer to intent of original code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So new code is probably closer to intent of original code.

True. Although to be fair, this also changes the reading of the best block such that we don't use anything but the top block of the locator, which probably wasn't the original intent because otherwise there would have been no need to store a locator in the first place instead of a single block hash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this patch, is it possible that the old code goes back to genesis in this rare, kind of contrived scenario?:

  • index best block committed at height h
  • a reorg occurs starting at h-1 to h+1
  • notifications are queued
  • chain state happens to be flushed (say the coins cache size is critical)
  • node crashes after flush so notifications aren't delivered
  • on startup, Tip=h+1, so FindForkInGlobalIndex for the index best block returns genesis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #25193 (comment)

Without this patch, is it possible that the old code goes back to genesis in this rare, kind of contrived scenario?:

Maybe I need to think about this more, but I don't think it can go back to genesis. It can just go back to the last common block before the reorg. Also, the problem which this patch fixes isn't going backwards in general, but going backwards without rewinding. The problem with the FindForkInGlobalIndex is that it goes backwards without making needed Rewind() call to update indexes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my reading, if FindForkInGlobalIndex doesn't find pindex in the chain or doesn't find Tip() as an ancestor of pindex, it returns genesis. But yeah a bit orthogonal to the issue at hand

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #25193 (comment)

From my reading, if FindForkInGlobalIndex doesn't find pindex in the chain or doesn't find Tip() as an ancestor of pindex, it returns genesis. But yeah a bit orthogonal to the issue at hand

Wow, you are right. The FindForkInGlobalIndex behavior is much different than the FindFork behavior. I assumed FindForkInGlobalIndex would try to find the last common ancestor between the locator block and the chain like FindFork does, but it doesn't even try to do this. Instead it will literally only return the exact locator block, or the chain tip, or the genesis block. It doesn't make any sense to me why the function would be implemented this way, but I guess it's good that this PR removes one usage of it...

@mzumsande mzumsande force-pushed the 202205_index_allow_reindex_chainstate branch from 1e598cf to 9bcd930 Compare June 23, 2022 19:31
@mzumsande
Copy link
Contributor Author

mzumsande commented Jun 23, 2022

1e598cf to 91947e4
addressed feedback by @ryanofsky - thanks!

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 91947e4. In first commit vector front() call is replaced by at(0) call to avoid undefined behavior in case null locator is loaded. In second commit fReindexChainState variable is replaced by simpler g_indexes_ready_to_sync that only changes from false to true

src/init.cpp Outdated Show resolved Hide resolved
SetBestBlockIndex(m_chainstate->FindForkInGlobalIndex(locator));
// Setting the best block to the locator's top block. If it is not part of the
// best chain, we will rewind to the fork point during index sync
const CBlockIndex* locator_index{m_chainstate->m_blockman.LookupBlockIndex(locator.vHave.front())};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #25193 (comment)

Without this patch, is it possible that the old code goes back to genesis in this rare, kind of contrived scenario?:

Maybe I need to think about this more, but I don't think it can go back to genesis. It can just go back to the last common block before the reorg. Also, the problem which this patch fixes isn't going backwards in general, but going backwards without rewinding. The problem with the FindForkInGlobalIndex is that it goes backwards without making needed Rewind() call to update indexes

@mzumsande
Copy link
Contributor Author

91947e4 to 702f481:
rebased and addressed comment by @ryanofsky

@mzumsande
Copy link
Contributor Author

0577bd6 to 974140f: rebased due to conflict with #25781

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the global flag that requires manual sets at different locations and the indexes threads active wait, what if we move the indexes threads start after the loading process? e.g. furszy@1525e0a.

It makes code shorter and more robust. Plus, it let us keep the pruning checks as well.

Copy link
Member

@willcl-ark willcl-ark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 974140f

Can confirm via testing that this fixes the majority of #27558, although I was not able to reliably abort my node during an invalidateblock call as OP did in that issue...

I also extracted the feature_coinstatsindex test from 5fafeec and checked that it did indeed fail without the corresponding changes to BaseIndex::Init().

Comment on lines +98 to +103
// Setting the best block to the locator's top block. If it is not part of the
// best chain, we will rewind to the fork point during index sync
const CBlockIndex* locator_index{m_chainstate->m_blockman.LookupBlockIndex(locator.vHave.at(0))};
if (!locator_index) {
return InitError(strprintf(Untranslated("%s: best block of the index not found. Please rebuild the index."), GetName()));
}
SetBestBlockIndex(locator_index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another side effect of this PR is I don't think we use the locator at all anymore besides its tip hash ?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct! I didn't want to change the db format though to not break compatibility.

@pinheadmz
Copy link
Member

In first commit vector front() call is replaced by at(0) call to avoid undefined behavior in case null locator is loaded.

This happens again later too:

const CBlockIndex* locator_index{m_chainstate->m_blockman.LookupBlockIndex(locator.vHave.at(0))};

Copy link
Member

@pinheadmz pinheadmz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 974140f

code review and local testing. verified the tests fail without the patches. great bug catch on the rewinding muhash! I also like @furszy idea about dropping the global atomic bool for a rerranged init sequence. I'll be happy to re-review if you included that.

Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

ACK 974140f9e721740f857b45d10d7dbab62fdbbe53
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmRdKuQACgkQ5+KYS2KJ
yTrWzRAAi1TzWndTKGoM6WwPvnXacA3OkluidZqIoa6zkeEnYzE7voAkRs2ZVmZp
T9FJ90ouWfkIfoaC1SewYGpJ2dykVSP41dmCgo0F06ceigiOAHTENDqJxkjFNL3e
+3sau/iPaGSOw8LU863kfABCI0KSqffnW9drQPySCt6lhCqSEhUJBpdBHHtaXS2z
/pF7nTUbvhV7hKbUKWB6bdLLkDO8w54yZXQkt3iNQDFWATFmutqxc7nhpdNe1fQW
vOqAOdw3ZdmPVNJB5H9nzmx46HDJemEr/w/8jy+6Hl6RAaxlhIs2a4gR64jLEeg6
8OD69S4XpCxu4FosZwoJyp7FmHL0ZINh4tR6bAPDO9Oevg5SFtsRP3zy01NI0PRW
wS3drLRG7PVkuFuwUHcExhPQyQIUoGnhLopfPjtngRTjfEuNUlUoYaZHzzO7Yjl6
eR6WG16vgtCHilnV1vypBX06U5gnq2Lu5VHoNTFuh9cveaIxXBKXtnJVLPhgaAmI
mMTfl+2A/2rvS3PgVXRBa+16e42h8y6rJB+6oHghhAD4Arz6sfn48K1blWk1Sp1j
NgNtREdWeqMmRVPTutn7Aoxa3/k7NeBC5qeipZdviSOqTusvuzt2mYZsIzVw7Aie
bwoX0quWu8tftAI4oEfGnRjXmJ9+XTJourAfZ9RDCJWzdYG6PXo=
=HWSL
-----END PGP SIGNATURE-----

pinheadmz's public key is on keybase

@mzumsande
Copy link
Contributor Author

Thanks! I'll rebase and address furszy's comments next week!

@ryanofsky
Copy link
Contributor

Thanks! I'll rebase and address furszy's comments next week!

I think it'd be good to just rebase this and merge it and not try to do the "move the indexes threads start after the loading process" change here. This PR is pretty simple, has had a good amount of review and testing, and I think that change would make it more complicated. furszy also implemented that change separately in #27607, and it should simplify both PRs to base that change on top of this one.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 974140f. Confirmed this is just a clean rebase since my last review. This needs another rebase now, but after that I would like to merge it.

mzumsande added 2 commits May 17, 2023 11:14
The index sync code has logic to go back the chain to the forking point, while
also updating index-specific state, which is necessary to prevent
possible corruption of the coinstatsindex.

Also add a test for this (a reorg happens while the index is deactivated)
that would not pass before this change.
This is achieved by letting the index sync thread wait until
reindex-chainstate is finished.

This also disables the pruning check when reindexing the chainstate (which is
incompatible with prune mode) because there would be no chain at this point
in init.
@mzumsande mzumsande force-pushed the 202205_index_allow_reindex_chainstate branch from 974140f to 97844d9 Compare May 17, 2023 15:31
@mzumsande
Copy link
Contributor Author

974140f to 97844d9: rebased

I think it'd be good to just rebase this and merge it and not try to do the "move the indexes threads start after the loading process" change here.

Ok, I only rebased.

@furszy I like your suggestion and will review/test it when you include them in #27607, which I believe will change init order more anyway.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 97844d9. Just simple rebase since last review

@DrahtBot DrahtBot requested review from pinheadmz and willcl-ark May 17, 2023 17:22
@ryanofsky ryanofsky merged commit 4e8a765 into bitcoin:master May 17, 2023
@mzumsande mzumsande deleted the 202205_index_allow_reindex_chainstate branch May 17, 2023 19:17
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 18, 2023
@bitcoin bitcoin locked and limited conversation to collaborators Sep 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coinstats index corrupted after invalidateblock and clean shutdown
9 participants