Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off-chain database migration #1619

Closed
xgreenx opened this issue Jan 24, 2024 · 3 comments
Closed

Off-chain database migration #1619

xgreenx opened this issue Jan 24, 2024 · 3 comments
Assignees
Labels
epic An epic is a high-level master issue for large pieces of work. upgradability

Comments

@xgreenx
Copy link
Collaborator

xgreenx commented Jan 24, 2024

No description provided.

@xgreenx xgreenx added the epic An epic is a high-level master issue for large pieces of work. label Feb 4, 2024
@xgreenx xgreenx assigned MujkicA and Dentosal and unassigned bvrooman Mar 13, 2024
@bvrooman bvrooman mentioned this issue Mar 21, 2024
5 tasks
@MitchTurner
Copy link
Member

I'm not sure if this is an issue or a feature. I think we should make a clearer distinction between those. Typically Scrum has Issues, Features, and Epochs, where issues are the smallest and features capture a group of issues and epochs capture a long-term business objective.

@xgreenx xgreenx assigned segfault-magnet and unassigned MujkicA Mar 22, 2024
@xgreenx
Copy link
Collaborator Author

xgreenx commented Mar 22, 2024

It is a master issue to track progress on the migration of the off-chain table. Since we decided to split migration into two parts, on-chain and off-chain, it has its own deliverables. The whole database migration belongs to the regenesis feature, but because the regenesis feature itself took 4 months, we need smaller tickets to track work=)

xgreenx added a commit that referenced this issue Mar 29, 2024
Mostly refactoring to allow for #1619 . Not final but close. Some
cleanup remains here or there. That will be bundled with the actual
feature itself.

The `SnapshotReader` and `SnapshotWriter` (previously the `StateReader`
and `StateWriter`) now:
* have a single `read` (or `write`) method generic over the table type
* no longer use `*Config` structs in the interface -- all reading and
writing is to be done over tables
* `ChainConfig` is now written and read by the `SnapshotWriter` and
`SnapshotReader` (since it is actually part of the snapshot).

The genesis progress is now a generic `String` -> `u64` mapping with no
enumerations for the key (previously we had an enum with a variant for
each table).

Every table gets its own parquet file. Json is still in a single file
with the `StateConfig` schema. Depending on whether we want the offchain
tables in the `StateConfig` we can either drop them or include them in
the next PR.

I avoided enumerating the tables as much as possible to lessen coupling.
A new table should ideally require:
1. a call to write::<NewTable> when generating the snapshot
2. an implementation of `ProcessState<NewTable>` to describe how to
import it on regenesis
3. an implementation of `AsTable` to describe how (if at all) this table
can be extracted from a in-memory `StateConfig`
4. a call to `workers.spawn::<NewTable>` to run a worker to import it
from the snapshot.

After this PR we'll add in the off-chain tables and identify and
regenerate dependent tables (ideally with batching + resumability).

---------

Co-authored-by: xgreenx <xgreenx9999@gmail.com>
segfault-magnet added a commit that referenced this issue Mar 29, 2024
related to: #1619 

Doesn't close the issue, still need to migrate a few tables.

The following tables are now part of regenesis:
OnChain:
* Transactions (saved in snapshot)

OffChain:
* TransactionStatuses (saved in snapshot)
* OwnedTransactions (saved in snapshot)
* OwnedMessageIds (derived from Messages in snapshot)
* OwnedCoins (derived from Coins in snapshot)
* ContractsInfo (derived from Transactions in snapshot)

We have open questions to @xgreenx: 
1. Should we regenesize `FuelBlockIdsToHeights`? We attempted it but it
caused issues with the "don't commit changes related to more than one
block" guard.
2. Also what about the restoring the following tables:
* Metadata
* Statistics
* All relayer tables
* ProcessedTransactions

There are opportunities for optimization, namely we're reading some
snapshot data twice (e.g. Transactions are read once to restore the
`Transactions` table and once to derive the `ContractsInfo` table). That
could probably be done in one go writing to both on chain and off chian
tables at once.

---------

Co-authored-by: Hannes Karppila <hannes.karppila@gmail.com>
Co-authored-by: xgreenx <xgreenx9999@gmail.com>
@xgreenx
Copy link
Collaborator Author

xgreenx commented May 2, 2024

Is done as part of the #1545

@xgreenx xgreenx closed this as completed May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic An epic is a high-level master issue for large pieces of work. upgradability
Projects
None yet
Development

No branches or pull requests

6 participants