Enable efficient indexing of historical chain data #10807

fridrik01 · 2023-05-03T12:58:12Z

See https://github.com/filecoin-project/fvm-pm/issues/299

Lotus currently uses the following sqlite databases:

sqlite/events.db: Stores events sent by actors in the FVM
sqlite/txhash.db: Stores mappings of Eth tx hash to Filecoin message cid
sqlite/msgindex.db: Stores block messages cid and their tipset cid for faster lookup

We should try to unify all the different databases into one which should make maintaining correctness and doing recovery/backfilling simpler.

Also, we make sure we handle the following:

Make sure we handle all edge cases (forks, reverts, config changes that require pruning, etc)
Allow enabling/disabling what to index
~~Be able to configure lookback so these indices don't grow endlessly~~ Not needed as the indexes are tiny compared to chain data (less than 0.25%)
Enable automatic backfilling of sqlite databases (msgindex, events, txhash) #11007

raulk · 2023-05-03T16:21:23Z

populating on snapshot import if not already doing

BigLep · 2024-08-06T19:20:19Z

Newbie question: what was the reason for having 3 separate databases to begin with?

Stebalien · 2024-08-07T20:19:27Z

I believe the idea was that it made it easy to tell if one was getting large and remove/disable it. I'd kind of like to introduce (even in shed) some form of GC command before we unify them, but I do think unifying them is the way to go (that and re-organizing our tables to massively reduce the amount of duplicate data).

Stebalien · 2024-08-07T20:22:57Z

In terms of: why multiple observers? We wanted to keep these subsystems separate. But I wouldn't be opposed to a new architecture here (that, e.g., lets us cleanly keep track of what we've indexed and what we haven't) as long as we can make it somewhat pluggable. HOWEVER, if we want to be able to enable/disable these indices independently... we'll need to track what has been indexed and what has not been indexed independently as well.

fridrik01 self-assigned this May 3, 2023

eshon mentioned this issue May 24, 2023

Improve Receipt indexing and lookup performance in Lotus vulcanize/filecoin-indexing#19

Closed

fridrik01 mentioned this issue May 25, 2023

Improve eth_getBlockByNumber query performance. vulcanize/filecoin-indexing#20

Closed

jennijuju added this to the Lotus Scalability milestone May 29, 2023

jennijuju linked a pull request Jun 1, 2023 that will close this issue

fix: improve perf of msgindex backfill #10941

Merged

jennijuju removed a link to a pull request Jun 1, 2023

fix: improve perf of msgindex backfill #10941

Merged

fridrik01 mentioned this issue Jun 23, 2023

Add support to backfill actor events + resuming when left off #10939

Closed

fridrik01 mentioned this issue Jul 20, 2023

Add new lotus-shed command for backfillling actor events #11088

Merged

aarshkshah1992 added the area/eth-api label Jul 23, 2024

BigLep mentioned this issue Aug 6, 2024

Meta Issue: Fixing high impact correctness and performance problems in ETH RPC API for snapshot synced nodes #12293

Open

akaladarshi mentioned this issue Aug 12, 2024

Implement a lotus-shed command to garbage collect all the indices not available in the state store #12377

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable efficient indexing of historical chain data #10807

Enable efficient indexing of historical chain data #10807

fridrik01 commented May 3, 2023 •

edited

Loading

raulk commented May 3, 2023

BigLep commented Aug 6, 2024

Stebalien commented Aug 7, 2024

Stebalien commented Aug 7, 2024

Enable efficient indexing of historical chain data #10807

Enable efficient indexing of historical chain data #10807

Comments

fridrik01 commented May 3, 2023 • edited Loading

raulk commented May 3, 2023

BigLep commented Aug 6, 2024

Stebalien commented Aug 7, 2024

Stebalien commented Aug 7, 2024

fridrik01 commented May 3, 2023 •

edited

Loading