-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Receipt indexing and lookup performance in Lotus #19
Comments
This path is taken when servicing
|
I think there is a “quick” fix to slow This means the If we somehow maintained a mapping of tx_hashes/msg_CIDs to epoch/tipsetkey and were able to pass in the correct tipset or one much closer to the target one I think that would speed things up dramatically without having to change any of the internals of We can test this hypothesis quickly by hardcoding passing in the correct or a much closer tipsetkey to |
@i-norden - FYI - filecoin-project/lotus#10807 - The Lotus/FVM team also identified this problem and plan to implement these additional indexes. However their ETA for an MVP fix is 2-3 weeks. If you're able to propose any nearer-term improvements that could be released as a patch that'd be incredibly helpful, as this is currently blocking a number of users of archival nodes in our ecosystem. |
We definitely need to have better guarantees that the relevant sql indices are properly maintained. For example, the logic for backfilling the sqlite index relevant to |
Thanks @eshon and @fridrik01! I think guaranteeing the Loose proposal for quick fixes for this and #20:
|
Clear the We need two operating modes, one which allows the node to operate in an optimistic manner that is allowed to fall back into the slower head-back searching mode if index is hit, and a mode which ensures the indexes are properly indexed and that this pathway will never be taken (node falls over into "maintenance" mode than repairs the indexes- if index is missed, enter said mode and also have some background process which monitors it?). Moment the cache is known/hinted at to be inconsistent- node moves into mode that focuses on rebuilding the cache/index. Flag that turns on/off the above property. |
A message can be in the local blockstore but not in the chain, so we can't simply revert to maintenance mode when we can fidn a message in blockstore but not in the messages index.
DO THE BELOW: Hashing/snapshotting/checkpointing the database periodically so that we can check if it is correct/complete? Create a new empty db, attach to current db, pop regions that predefined for hashing over into the new DB, hash the new DB, should get the same hash. Need some way to store and expose these checkpoints, canonical hash tree? Would this be a sidecar process or part of Lotus? Eventually add Lotus API to support this? Map epoch boundaries to the canonical hash. Need to define an attestation schema. ETL sql statement to take msgindex and populate the new schema. Run hash over these segments. Service runs alongside Lotus and performs this algo, checks the hashes it generates vs the ones exposed by other SPs. If it is not a canonical hash, stop the node, fix the index, restart it, re-run test. Service needs to run this algo every "period". "period": 2880 epochs (1 day) Trailing process that probes the index in the background and checks that the messages it contains are sufficient to satisfy state transitions? |
if you already have the state but not the index |
Thanks @jennijuju, we did but we were running off of the latest release which didn't include msgindex support 🙃. We have switched to building off of main, in order to also capture the new shed backfill commands. We have ran the backfill on one of our nodes that was synced off of a filesystem transfer we received from a partner, unfortunately this node appears to have been synced starting from a light snapshot above the FEVM activation height and so does not have all the messages necessary to completely rebuild the index. But this has allowed us to move forward with some testing. We are simultaneously syncing new nodes from light snapshots right below the FEVM activation blockheight in order to capture the full FEVM msg history. These syncs were (re)started about 2 weeks ago but have not yet completed (around epoch 2750000 now; at current rate will take 5-6 more weeks). If you are aware of anybody who could provide a full snapshot or transfer the data dir for a non-pruning node synced from a light snapshot below 2683348 we would be very interested in that. The machines for the light snapshot syncs are NVMe with 10Gib ethernet.
|
While syncing from the light snapshots starting below 2683348, we have noticed at startup that the node's struggle to find peers. They eventually do, but then continue to run into these issues with a high frequency:
and
and
Have not dug into these messages in depth yet, but on the surface this behavior suggests to me that the nodes are struggling to find peers that can provide the historical data needed. This would sort of makes sense to me at a high level since the standard way of syncing a node is to sync from the latest official light snapshots that are provided- such peers won't have historical transaction or state data below the snapshot they started at. It seems this could have been exasperated further by the introduction of the SplitStore, since the official docs recommend migrating to the SplitStore by throwing away your current datadir and starting an entirely new sync from a new light snapshot with Note: we are not running the node using systemd, as we were unable to get it to sync at all in that context. Related: filecoin-project/lotus#8001. |
There is one potential provider AFAIK & they are doing some testing. @eshon might have more updated info to share here. |
Small good news i hope: i expect the sync to be faster passing epoch 2809800 due to a protocol improvement we shipped to the network! |
Thanks @jennijuju, we successfully transferred an archive node. Between what you shared here and the recent work that has been done on the skip list and otherwise by @fridrik01 I'm closing this. Thanks!! |
Background:
Lotus maintains sqlite indexes for fast transaction lookup: https://github.com/filecoin-project/lotus/blob/master/chain/index/msgindex.go
We can see this index being leveraged in the
SearchForMessages
method: https://github.com/filecoin-project/lotus/blob/master/chain/stmgr/searchwait.go#L140In
SearchForMessages
we first load the message directly from the blockstore using the provided CID: https://github.com/filecoin-project/lotus/blob/master/chain/store/messages.go#L43.Once we have the message, we attempt to find the corresponding receipt in the
head-1
("head" being the tipset provided to the method) tipset by calling tipsetExecutedMessage. This method loads the parent-of-head tipset, loads the messages for the parent tipset, scans backwards through these messages (in order to account for replaceable message CIDs). Once it has identified the correct (potentially replaced) message through this process, it takes the index of that message and uses it to lookup the corresponding receipt. The receipt lookup itself involves loading and searching through the AMT for the receipts.If we couldn't find the receipt in the
head-1
tipset using the above process, then we attempt to lookupmsgInfo
in the sqlite message index by calling searchForIndexedMsg. This method searches the index for the messageCID and the index maps this CID to the appropriate tipset to lookup the receipts with. We then call the tipsetExecutedMessage method again, this time using the tipset that we identified from the index.If
msgInfo
cannot be found in the index, then we fall through to the most pessimistic search path which involves calling the searchBackForMsg. As the name implies, this method involves iterating backwards over tipsets from the provided head, searching through all the messages until the target message and corresponding receipt are found (or until the search limit is reached).Ideas (WIP):
[]byte("r")
) to a message CID to directly lookup the corresponding receipt or receipt CID.searchBackForMsg
path.allowReplaced
. Even whenallowReplaced
is false, the access pattern proceeds down a non-optimal path in order to accommodate the possibility for it to be true. If we know it can't be replaced, then we should be able to avoid some of the work done in tipsetExecutedMessage. Need to investigate this further.The text was updated successfully, but these errors were encountered: