Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking Issue] Flat storage for resharding V3 #12174

Closed
31 of 33 tasks
Trisfald opened this issue Sep 30, 2024 · 2 comments
Closed
31 of 33 tasks

[Tracking Issue] Flat storage for resharding V3 #12174

Trisfald opened this issue Sep 30, 2024 · 2 comments
Assignees

Comments

@Trisfald
Copy link
Contributor

Trisfald commented Sep 30, 2024

Part of #11881

Goal

Support resharding V3 for Flat Storage. On shard layout, Flat Storage must be able to take a shard and split it into two children.
The entire process will happen in the background with no strict time requirement. Additional processing resources consumption should be minimal because, in the meantime, the node will continue to apply blocks and chunks.

Sub-tasks

  • Implementation
    • Support for splitting a Flat storage shard
      • Handle all Flat storage key-value pairs
        • Simple keys
        • Receipts-like keys
      • Possibility to interrupt resharding
      • Background processing
      • Creating children shards and applying deltas
      • Handle parent shard deletion
    • Add observability
      • Log progress of background task
      • Log progress of catchup
      • Time shard deletions operations
      • Metric for status of flat storage (?)
      • Metric for progress for split / catchup
    • Integration with ReshardingManager
    • Handle parent flat storage deltas split
    • Better limits on background task iteration, like key size
    • Trigger memtrie rebuild
    • Configuration option to control shard catchup batch size
    • Delay shard split until resharding block is final
    • Chain forks handling
      • Forks
      • [ ] Double signing corner cases
        We decided to not implement any safeguard. The scenario is very rare and impacted node can be recovered from an healthy snapshot.
    • Tune resharding config for mainnet load
  • Tests

    • Unit tests
      • Basic flat storage resharder functionality
      • Splitting simple keys
      • More complex resharding with other types of keys
      • Catchup children
      • Parent flat state deltas handling
      • Test for multiple resharding events
      • Test forks and double signing
    • Integration tests
      • To be covered with test loop resharding test
    • Test with forknet: resharding, restart
      • To be done for resharding as a whole
  • Nice to have

    • Refactoring to avoid sending an instance of FlatStorageResharder inside actor messages.
    • Refactoring: merging FlatStorageResharder and ReshardingActor
@Trisfald
Copy link
Contributor Author

merged #12164 tackling the introduction of FlatStorageResharder and simple key splitting

@Trisfald
Copy link
Contributor Author

merged #12223 to handle all key types

github-merge-queue bot pushed a commit that referenced this issue Oct 17, 2024
Adding log entries telling the time spent on the longer operations
happening during flat storage resharding.

Part of flat storage resharding issue (#12174).
github-merge-queue bot pushed a commit that referenced this issue Oct 18, 2024
…2246)

This PR improves the way batches are handled in the background task that
splits a shard. In particular, I'm re-using the good old `batch_size`
and `batch_delay` to throttle processing.

Part of #12174
github-merge-queue bot pushed a commit that referenced this issue Oct 21, 2024
Implementation of flat storage deltas split for flat storage resharding.

Before this change, only flat storage key pairs at flat head height were
split during resharding. Now flat storage deltas are part of the
splitting process as well.
On a high level, it works by concatenating the iterator over flat
storage and the iterator over deltas for all block heights between chain
head an flat storage head.

Part of #12174
github-merge-queue bot pushed a commit that referenced this issue Nov 4, 2024
PR to add children catchup step for flat storages created as a result of
a parent shard split.

In previous iterations, the two children shards were populated in a
background task from the flat storage of the parent at height `last
block of old shard layout` (post-processing).

Since the task mentioned above takes a long time and the children are
active shards in the `first block of the new shard layout` their flat
storage accumulates a lot of deltas.

The catchup step applies delta in the background, then finalizes
creation of child flat storage, and triggers a possible memtrie rebuild.

Part of #12174
github-merge-queue bot pushed a commit that referenced this issue Nov 11, 2024
…ock is final (#12415)

Contents of this PR:
- New features
- Actual splitting of flat storage is delayed until the target
resharding block becomes final
- Scheduled resharding event can be overridden. This makes resharding
work in many chain fork scenarios (not all of them though)
- Added `FlatStorageReshardingTaskSchedulingStatus` to express the
current state of scheduled tasks waiting for resharding block finality
- Changes
- Shard catchup doesn't wait anymore for resharding block finality. It
is now a consequence of the fact that the shard split happens on a final
block.
- `FlatStorageReshardingTaskStatus` renamed into
`FlatStorageReshardingTaskResult` for clarity
  - `ReshardingActor` now takes care of re-trying "postponed" tasks.



Part of #12174
github-merge-queue bot pushed a commit that referenced this issue Dec 11, 2024
Adding two new metrics to monitor resharding:
- `near_flat_storage_resharding_status`
- `near_flat_storage_resharding_split_shard_processed_batches`

Reusing the existing metric `near_flat_storage_head_height` to monitor
shard catchup.

Part of #12174
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant