Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable efficient indexing of historical chain data #10807

Open
fridrik01 opened this issue May 3, 2023 · 4 comments
Open

Enable efficient indexing of historical chain data #10807

fridrik01 opened this issue May 3, 2023 · 4 comments
Assignees

Comments

@fridrik01
Copy link
Contributor

fridrik01 commented May 3, 2023

See https://github.com/filecoin-project/fvm-pm/issues/299

Lotus currently uses the following sqlite databases:

  • sqlite/events.db: Stores events sent by actors in the FVM
  • sqlite/txhash.db: Stores mappings of Eth tx hash to Filecoin message cid
  • sqlite/msgindex.db: Stores block messages cid and their tipset cid for faster lookup

We should try to unify all the different databases into one which should make maintaining correctness and doing recovery/backfilling simpler.

Also, we make sure we handle the following:

@fridrik01 fridrik01 self-assigned this May 3, 2023
@raulk
Copy link
Member

raulk commented May 3, 2023

  • populating on snapshot import if not already doing

@BigLep
Copy link
Member

BigLep commented Aug 6, 2024

Newbie question: what was the reason for having 3 separate databases to begin with?

@Stebalien
Copy link
Member

I believe the idea was that it made it easy to tell if one was getting large and remove/disable it. I'd kind of like to introduce (even in shed) some form of GC command before we unify them, but I do think unifying them is the way to go (that and re-organizing our tables to massively reduce the amount of duplicate data).

@Stebalien
Copy link
Member

In terms of: why multiple observers? We wanted to keep these subsystems separate. But I wouldn't be opposed to a new architecture here (that, e.g., lets us cleanly keep track of what we've indexed and what we haven't) as long as we can make it somewhat pluggable. HOWEVER, if we want to be able to enable/disable these indices independently... we'll need to track what has been indexed and what has not been indexed independently as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants