-
Notifications
You must be signed in to change notification settings - Fork 36.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indexes: Read the locator's top block during init, allow interaction with reindex-chainstate #25193
indexes: Read the locator's top block during init, allow interaction with reindex-chainstate #25193
Conversation
One thing I'm unsure about is that part 1) of this PR will now make it impossible to go back if we don't know the top block of the locator for some reason - although I don't know how this would be possible because I don't know of a process that would prune stale blocks from the Block Index (except the contrib/linearize script). The old code could recover an index with a no-longer existing best block from that (at the cost of corrupting the coinstatsindex), but I wonder if it would be good to still call |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK |
c7c3494
to
1e598cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Light code review ACK 1e598cf. First commit looks good. I had some questions and suggestions about some things in the second and third commits, but all the code looked ok and seemed like it should work
src/node/blockstorage.cpp
Outdated
@@ -885,6 +886,7 @@ void ThreadImport(ChainstateManager& chainman, std::vector<fs::path> vImportFile | |||
StartShutdown(); | |||
return; | |||
} | |||
fReindexChainState = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "node: add fReindexChainState flag to node" (8898261)
I understand that what this is doing with the new fReindexChainState
global is similar to what happens with the existing fReindex
global, but I think what happens with the fReindex
global is not ideal. I think it's unnecessarily fragile how the value starts off false, then switches to true, then switches back to false again, and think a one way latch would be better. I think it's bad the the variable doesn't have any straightforward meaning but is some combination "was reindexing requested?" and "is reindexing in progress?". Also in this commit, fReindexChainState
is different local variables subtly replaced by a global so you can't tell just looking at a diff whether all the existing fReindexChainState
references were updated correctly.
I'd suggest leaving fReindexChainState
alone, and adding a simpler atomic_bool g_indexes_ready_to_sync = false
global variable, and setting it to true here, and in the normal code path. I think this would make the diff and the overall initialization sequence simpler and easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I dropped the commit that made fReindexChainState
a global and added g_indexes_ready_to_sync
as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making g_indexes_ready_to_sync
a one-way switch requires to manually set it to true in the unit tests though. I did that in TestingSetup::TestingSetup()
to avoid setting it in each unit tests that uses indexes separately.
src/index/base.cpp
Outdated
@@ -76,7 +78,7 @@ bool BaseIndex::Init() | |||
SetBestBlockIndex(locator_index); | |||
} | |||
m_synced = m_best_block_index.load() == active_chain.Tip(); | |||
if (!m_synced) { | |||
if (!m_synced && fPruneMode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "index: Enable reindex-chainstate with active indexes" (1e598cf)
This is pretty opaque and could use a comment. I'm also not sure it is better to be looking at fPruneMode
here, than just checking whether reindexing is happening, when it sounds like reindexing is the real problem, and checking for fPruneMode
is just a proxy for avoiding the real source of the problem?
Also it is not obvious to me that just because pruning isn't currently enabled doesn't mean pruning wasn't previously enabled and there couldn't be missing blocks worth checking for here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it is not obvious to me that just because pruning isn't currently enabled doesn't mean pruning wasn't previously enabled and there couldn't be missing blocks worth checking for here.
I think that this shouldn't be possible: We check in Init when loading the chainstate (chainman.LoadBlockIndex()
) that we either have all the blocks from genesis, or allow for pruning - otherwise, we abort with an InitError and never get to the point where the indexes are started.
I replaced the check with the indirect g_indexes_ready_to_sync
as suggested and added a comment. This way, the pruning check will continue to be executed regardless of pruning status (unless -reindex is specified).
What I don't like is that this would throw a confusing InitError message if it somehow failed on a non-pruning node ("block of the index goes beyond pruned data"). Maybe the check should assert instead if fPrune==false
?
src/index/base.cpp
Outdated
SetBestBlockIndex(m_chainstate->FindForkInGlobalIndex(locator)); | ||
// Setting the best block to the locator's top block. If it is not part of the | ||
// best chain, we will rewind to the fork point during index sync | ||
const CBlockIndex* locator_index{m_chainstate->m_blockman.LookupBlockIndex(locator.vHave.front())}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving as a note, but I think this allows the if (!active_chain.Contains(block_to_test)) {
case to be hit. Previously FindForkInGlobalIndex
would return a CBlockIndex
only in the current chain and so active_chain.Contains
would always be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #25193 (comment)
In commit "index: Use first block from locator instead of looking for fork point" (20bd221)
Yes, this seems to be the case. This is a good indication the original author or this code expecting was expecting to FindForkInGlobalIndex to just return a block based on the locator, not necessarily a block on the current chain. So new code is probably closer to intent of original code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So new code is probably closer to intent of original code.
True. Although to be fair, this also changes the reading of the best block such that we don't use anything but the top block of the locator, which probably wasn't the original intent because otherwise there would have been no need to store a locator in the first place instead of a single block hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this patch, is it possible that the old code goes back to genesis in this rare, kind of contrived scenario?:
- index best block committed at height
h
- a reorg occurs starting at
h-1
toh+1
- notifications are queued
- chain state happens to be flushed (say the coins cache size is critical)
- node crashes after flush so notifications aren't delivered
- on startup,
Tip=h+1
, soFindForkInGlobalIndex
for the index best block returns genesis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #25193 (comment)
Without this patch, is it possible that the old code goes back to genesis in this rare, kind of contrived scenario?:
Maybe I need to think about this more, but I don't think it can go back to genesis. It can just go back to the last common block before the reorg. Also, the problem which this patch fixes isn't going backwards in general, but going backwards without rewinding. The problem with the FindForkInGlobalIndex
is that it goes backwards without making needed Rewind()
call to update indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my reading, if FindForkInGlobalIndex
doesn't find pindex
in the chain or doesn't find Tip()
as an ancestor of pindex
, it returns genesis. But yeah a bit orthogonal to the issue at hand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #25193 (comment)
From my reading, if
FindForkInGlobalIndex
doesn't findpindex
in the chain or doesn't findTip()
as an ancestor ofpindex
, it returns genesis. But yeah a bit orthogonal to the issue at hand
Wow, you are right. The FindForkInGlobalIndex
behavior is much different than the FindFork
behavior. I assumed FindForkInGlobalIndex
would try to find the last common ancestor between the locator block and the chain like FindFork
does, but it doesn't even try to do this. Instead it will literally only return the exact locator block, or the chain tip, or the genesis block. It doesn't make any sense to me why the function would be implemented this way, but I guess it's good that this PR removes one usage of it...
1e598cf
to
9bcd930
Compare
1e598cf to 91947e4 |
9bcd930
to
91947e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 91947e4. In first commit vector front()
call is replaced by at(0)
call to avoid undefined behavior in case null locator is loaded. In second commit fReindexChainState
variable is replaced by simpler g_indexes_ready_to_sync
that only changes from false to true
src/index/base.cpp
Outdated
SetBestBlockIndex(m_chainstate->FindForkInGlobalIndex(locator)); | ||
// Setting the best block to the locator's top block. If it is not part of the | ||
// best chain, we will rewind to the fork point during index sync | ||
const CBlockIndex* locator_index{m_chainstate->m_blockman.LookupBlockIndex(locator.vHave.front())}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #25193 (comment)
Without this patch, is it possible that the old code goes back to genesis in this rare, kind of contrived scenario?:
Maybe I need to think about this more, but I don't think it can go back to genesis. It can just go back to the last common block before the reorg. Also, the problem which this patch fixes isn't going backwards in general, but going backwards without rewinding. The problem with the FindForkInGlobalIndex
is that it goes backwards without making needed Rewind()
call to update indexes
91947e4
to
702f481
Compare
91947e4 to 702f481: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the global flag that requires manual sets at different locations and the indexes threads active wait, what if we move the indexes threads start after the loading process? e.g. furszy@1525e0a.
It makes code shorter and more robust. Plus, it let us keep the pruning checks as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 974140f
Can confirm via testing that this fixes the majority of #27558, although I was not able to reliably abort my node during an invalidateblock
call as OP did in that issue...
I also extracted the feature_coinstatsindex test from 5fafeec and checked that it did indeed fail without the corresponding changes to BaseIndex::Init()
.
// Setting the best block to the locator's top block. If it is not part of the | ||
// best chain, we will rewind to the fork point during index sync | ||
const CBlockIndex* locator_index{m_chainstate->m_blockman.LookupBlockIndex(locator.vHave.at(0))}; | ||
if (!locator_index) { | ||
return InitError(strprintf(Untranslated("%s: best block of the index not found. Please rebuild the index."), GetName())); | ||
} | ||
SetBestBlockIndex(locator_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another side effect of this PR is I don't think we use the locator
at all anymore besides its tip hash ?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct! I didn't want to change the db format though to not break compatibility.
This happens again later too: Line 100 in 974140f
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 974140f
code review and local testing. verified the tests fail without the patches. great bug catch on the rewinding muhash! I also like @furszy idea about dropping the global atomic bool for a rerranged init sequence. I'll be happy to re-review if you included that.
Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
ACK 974140f9e721740f857b45d10d7dbab62fdbbe53
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmRdKuQACgkQ5+KYS2KJ
yTrWzRAAi1TzWndTKGoM6WwPvnXacA3OkluidZqIoa6zkeEnYzE7voAkRs2ZVmZp
T9FJ90ouWfkIfoaC1SewYGpJ2dykVSP41dmCgo0F06ceigiOAHTENDqJxkjFNL3e
+3sau/iPaGSOw8LU863kfABCI0KSqffnW9drQPySCt6lhCqSEhUJBpdBHHtaXS2z
/pF7nTUbvhV7hKbUKWB6bdLLkDO8w54yZXQkt3iNQDFWATFmutqxc7nhpdNe1fQW
vOqAOdw3ZdmPVNJB5H9nzmx46HDJemEr/w/8jy+6Hl6RAaxlhIs2a4gR64jLEeg6
8OD69S4XpCxu4FosZwoJyp7FmHL0ZINh4tR6bAPDO9Oevg5SFtsRP3zy01NI0PRW
wS3drLRG7PVkuFuwUHcExhPQyQIUoGnhLopfPjtngRTjfEuNUlUoYaZHzzO7Yjl6
eR6WG16vgtCHilnV1vypBX06U5gnq2Lu5VHoNTFuh9cveaIxXBKXtnJVLPhgaAmI
mMTfl+2A/2rvS3PgVXRBa+16e42h8y6rJB+6oHghhAD4Arz6sfn48K1blWk1Sp1j
NgNtREdWeqMmRVPTutn7Aoxa3/k7NeBC5qeipZdviSOqTusvuzt2mYZsIzVw7Aie
bwoX0quWu8tftAI4oEfGnRjXmJ9+XTJourAfZ9RDCJWzdYG6PXo=
=HWSL
-----END PGP SIGNATURE-----
pinheadmz's public key is on keybase
Thanks! I'll rebase and address furszy's comments next week! |
I think it'd be good to just rebase this and merge it and not try to do the "move the indexes threads start after the loading process" change here. This PR is pretty simple, has had a good amount of review and testing, and I think that change would make it more complicated. furszy also implemented that change separately in #27607, and it should simplify both PRs to base that change on top of this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 974140f. Confirmed this is just a clean rebase since my last review. This needs another rebase now, but after that I would like to merge it.
The index sync code has logic to go back the chain to the forking point, while also updating index-specific state, which is necessary to prevent possible corruption of the coinstatsindex. Also add a test for this (a reorg happens while the index is deactivated) that would not pass before this change.
This is achieved by letting the index sync thread wait until reindex-chainstate is finished. This also disables the pruning check when reindexing the chainstate (which is incompatible with prune mode) because there would be no chain at this point in init.
974140f
to
97844d9
Compare
Ok, I only rebased. @furszy I like your suggestion and will review/test it when you include them in #27607, which I believe will change init order more anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 97844d9. Just simple rebase since last review
This makes two improvements to the index init phase:
1) Prevent index corruption in case a reorg happens when the index was switched off:
This is done by reading in the top block stored in the locator instead of looking for a fork point already in
BaseIndex::Init()
.Before, we'd just go back to the fork point by calling
FindForkInGlobalIndex()
, which would have corrupted the coinstatsindex because its saved muhash needs to be reverted step by step by un-applying all blocks in between, which wasn't done before. This is now being done a bit later inThreadSync()
, which has existing logic to call the customRewind()
method when going back along the chain to the forking point (thanks ryanofsky for pointing this out to me!).2) Allow using the
-reindex-chainstate
option without needing to disabling indexes:With
BaseIndex::Init()
not callingFindForkInGlobalIndex()
anymore, we can allowreindex-chainstate
with active indexes.reindex-chainstate
deletes the chain and rebuilds it later inThreadImport
, so there is no chain available duringBaseIndex::Init()
, which would lead to problems (see #24789).But now we'll only need the chain a bit later in
BaseIndex::ThreadSync
, which will wait for the reindex-chainstate inThreadImport
to finish and will continue syncing after that.