-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(resharding): flat storage resharding mvp #12164
Conversation
@@ -176,6 +177,8 @@ pub struct Client { | |||
/// Cached precomputed set of TIER1 accounts. | |||
/// See send_network_chain_info(). | |||
tier1_accounts_cache: Option<(EpochId, Arc<AccountKeys>)>, | |||
/// Takes care of performing resharding on the flat storage. | |||
pub flat_storage_resharder: FlatStorageResharder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be moved inside the resharder manager @shreyan-gupta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree if it's simple enough
/// UId of the right child shard. Will contain everything greater or equal than boundary account. | ||
pub right_child_shard: ShardUId, | ||
/// The new shard layout. | ||
pub shard_layout: ShardLayout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was on the fence about storing shard_layout
here, but it makes things so much more self-contained in FlatStorageResharder
and testing easier as well.
/// * `block_hash`: block hash of the block in which the split happens | ||
/// * `scheduler`: component used to schedule the background tasks | ||
/// * `controller`: manages the execution of the background tasks | ||
pub fn start_resharding_from_new_shard_layout( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API can be improved if event_type_from_shard_layout
or an equivalent is moved into the resharding manager
|
||
/// Struct used to destructure a new shard layout definition into the resulting resharding event. | ||
#[cfg_attr(test, derive(PartialEq, Eq))] | ||
enum ReshardingEventType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found this object useful to convey the essence of the resharding event. Yes, only Split
is available.. for now
Maybe this and event_type_from_shard_layout
could be moved in the resharding manager as they aren't really flat storage specific @shreyan-gupta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one hand side I love the idea. On the other hand I feel like that it only makes sense if we do it everywhere. It would be strange to have it in flat storage but not in memtrie etc. I'm undecided so I'll leave it to you ;)
/// Helps control the flat storage resharder operation. More specifically, | ||
/// it has a way to know when the background task is done or to interrupt it. | ||
#[derive(Clone)] | ||
pub struct FlatStorageResharderController { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extends ReshardingHandle
with notification capability. If it doesn't prove useful it can be easily be swapped with the handle directly
} | ||
|
||
/// Represent the capability of scheduling the background tasks spawned by flat storage resharding. | ||
pub trait FlatStorageResharderScheduler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came up with this trait to abstract the task scheduling. It works well in tests, I hope it can be plugged nicely with actors or whatever else we will end up using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just grepping for the "scheduler" keyword it seems that those are typically Senders e.g.:
state_parts_task_scheduler: &Sender<ApplyStatePartsRequest>,
load_memtrie_scheduler: &Sender<LoadMemtrieRequest>,
block_catch_up_task_scheduler: &Sender<BlockCatchUpRequest>,
This API should work out of the box in TestLoop
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12164 +/- ##
==========================================
+ Coverage 71.76% 71.83% +0.07%
==========================================
Files 825 827 +2
Lines 165868 166626 +758
Branches 165868 166626 +758
==========================================
+ Hits 119028 119700 +672
- Misses 41647 41716 +69
- Partials 5193 5210 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great stuff!
I left a few comments but nothing major. I'll have another look and approve once it all settles in my head.
// TODO(Trisfald): call resume | ||
// flat_storage_resharder.resume(shard_uid, &status, ...)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a todo for this PR or later? I'm fine with later but I just saw that the resume method is already implemented so might as well call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is meant for a second PR, I'll need to see how to link everything together
} | ||
|
||
/// Represent the capability of scheduling the background tasks spawned by flat storage resharding. | ||
pub trait FlatStorageResharderScheduler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just grepping for the "scheduler" keyword it seems that those are typically Senders e.g.:
state_parts_task_scheduler: &Sender<ApplyStatePartsRequest>,
load_memtrie_scheduler: &Sender<LoadMemtrieRequest>,
block_catch_up_task_scheduler: &Sender<BlockCatchUpRequest>,
This API should work out of the box in TestLoop
|
||
/// Struct used to destructure a new shard layout definition into the resulting resharding event. | ||
#[cfg_attr(test, derive(PartialEq, Eq))] | ||
enum ReshardingEventType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one hand side I love the idea. On the other hand I feel like that it only makes sense if we do it everywhere. It would be strange to have it in flat storage but not in memtrie etc. I'm undecided so I'll leave it to you ;)
|
||
#[derive(Debug)] | ||
#[cfg_attr(test, derive(PartialEq, Eq))] | ||
struct ReshardingSplitParams { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised that the boundary account is not part of it. Do you get it from elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReshardingSplitParams
is the output of processing the shard layout and the latter has already the boundary. Well, conceptually it would be nice to have the boundary in ReshardingSplitParams
(I started with that), but I found the boundary to be pretty useless on its own: one needs the shard layout and the account-id to get the destination shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the split method, it takes the params and the shard layout. Would it make sense to shove the shard layout inside of the params? You can then get rid of left/ right child because they can be obtained from the shard layout.
BorshSerialize, | ||
BorshDeserialize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need borsh serialize and protocol schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Borsh
because I'm storing ShardLayout
inside the flat storage resharding status and ProtocolSchema
as a consequence to make devs more aware of changes in serialization
There are alternatives such as retrieving ShardLayout
from epoch manager. In such case I'd need to add a dependency on EpochManager
in FlatStorageResharder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, borsh always scares me but we should be good since ShardLayout is already versioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EpochManager is everywhere anywya so I wouldn't mind using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly unclear about how the flow of resharding the flat storage would look like exactly, like, how are the different components like flat storage creator, resharding manager, any potential actors, etc. working? We can discuss this during the offsite, or in the resharding meeting.
BorshSerialize, | ||
BorshDeserialize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, borsh always scares me but we should be good since ShardLayout is already versioned
CreatingChild, | ||
/// We apply deltas from disk until the head reaches final head. | ||
/// Includes block hash of flat storage head. | ||
CatchingUp(CryptoHash), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, I'm curious if we need a specific catching up phase or whether FlatStorage should automatically be able to handle this after unlocking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends what we want to do
Option 1: do catchup phase for children and their flat storage will be created at chain head height or very close
Option 2: no catchup, create children at parent flat store height. This means the first block postprocessing has to go through quite a lot of deltas
Any opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In option 2, would it happen in the main client thread? I don't think this would be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think option 2 is "rate limited" in implementation so should be fine if it simplifies code.
(Note: We may need to do catchup manually in case we are looking to load memtrie from flat storage; to be checked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I'm happy with either.
@@ -266,6 +270,12 @@ impl<'a> FlatStoreUpdateAdapter<'a> { | |||
self.remove_range_by_shard_uid(shard_uid, DBCol::FlatStateDeltaMetadata); | |||
} | |||
|
|||
pub fn remove_flat_storage(&mut self, shard_uid: ShardUId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we fix the nomenclature a bit, remove_all
and remove_flat_storage
can get confusing
&self, | ||
parent_shard: ShardUId, | ||
status: &SplittingParentStatus, | ||
scheduler: &dyn FlatStorageResharderScheduler, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing seems to implement FlatStorageResharderScheduler right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for now it's used only in test
I could make this into a sender or have the actor impl FlatStorageResharderScheduler 🤔
pub fn new( | ||
epoch_manager: Arc<dyn EpochManagerAdapter>, | ||
runtime: Arc<dyn RuntimeAdapter>, | ||
chain_store: &ChainStore, | ||
flat_storage_resharder: &FlatStorageResharder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't completely understand why is flat_storage_resharder a part of FlatStorageCreator? The FlatStorageCreator is created at the time of calling new() in client. The Self::create_flat_storage_for_current_epoch
is also thus called only in the beginning right? How do we later set the status of some flat storage to FlatStorageStatus::Resharding
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the FlatStorageResharder
is part of client.
FlatStorageCreator
uses a reference to the resharder to handle the case when the node restarts and flat storage resharding was in progress. I did like this because the creator already has logic to handle resuming flat storage from the previous state.
In normal flow the resharding of flat storage would be initiated by the resharding manager, but that part is not done yet
/// * `status`: resharding status of the shard | ||
/// * `scheduler`: component used to schedule the background tasks | ||
/// * `controller`: manages the execution of the background tasks | ||
pub fn resume( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remind me when is resume
called and by who? Also, when is start_resharding_from_new_shard_layout
called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resume
will be called by flat storage creator at node restart, if necessary
start_resharding_from_new_shard_layout
will be called by resharding manager
|
||
// Prepare the store object for commits and the iterator over parent's flat storage. | ||
let flat_store = resharder.runtime.store().flat_store(); | ||
let mut iter = flat_store.iter(parent_shard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what Wac means here is that we need to check the following
- Flat head is at height which is the first block of the new epoch with the new shard layout
- Flat head is indeed locked/frozen
Btw, I don't think the assumption where we say flat head is same as chain head is true. Flat head when locked can be behind chain head.
store_update: &mut FlatStoreUpdateAdapter, | ||
status: &SplittingParentStatus, | ||
) -> Result<(), std::io::Error> { | ||
if key.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well just panic here, this should never ever ever happen.
value: FlatStateValue, | ||
store_update: &mut FlatStoreUpdateAdapter, | ||
account_id_parser: impl FnOnce(&[u8]) -> Result<AccountId, std::io::Error>, | ||
) -> Result<(), std::io::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsure why are we using std::io::Error instead of a more concrete type like resharding error? If we really do want to use a generic error type, we can go for anyhow::Error here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the error of parse_account_id_
functions, but I'm fine to use resharding error
/// A value of `true` means that the operation completed successfully. | ||
completion_sender: Sender<FlatStorageReshardingTaskStatus>, | ||
/// Corresponding receiver for `completion_sender`. | ||
pub completion_receiver: Receiver<FlatStorageReshardingTaskStatus>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, why exactly would the FlatStorageResharderController
hold the completion_receiver? Shouldn't that be sent to someone who is waiting for the call the resharding of flat storage to be finished? Someone in client?
Also, is this the patter we would like to follow? I thought the convention was to send actix messages once jobs were done? Btw, where is the job of flat storage resharding being done? As in, which actor is it a part of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was anyone interested could clone the controller or the receiver
The actor is not done yet. I planned to add that part in a second PR, maybe reusing a shared resharding actor. The actor anyway should be very a very thin layer, I hope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we reload memtrie immediately after FS is ready or delay until node restart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current understanding is to do memtrie reload FS is ready. Not set in stone though
I did some refactoring aside from applying suggestions. Main change is that now |
Cool, please re-request review once it's ready for review. |
#[derive( | ||
BorshSerialize, BorshDeserialize, Clone, Debug, PartialEq, Eq, serde::Serialize, ProtocolSchema, | ||
)] | ||
pub struct SplittingParentStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It took me a while to figure out that this is to describe a status of a shard's flat storage rather than the status of resharding. Can you explain that in the comment please?
.. | ||
} = split_params; | ||
info!(target: "resharding", ?split_params, "initiating flat storage shard split"); | ||
self.check_no_resharding_in_progress()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this pr
Eventually we'd want to handle the case when there is a at the end of the epoch and resharding is triggered multiple times. In that case you may want to clean up everything and start over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it could be done through by interrupting through ReshardingHandle or automatically depending on the block hash
|
||
#[derive(Debug)] | ||
#[cfg_attr(test, derive(PartialEq, Eq))] | ||
struct ReshardingSplitParams { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the split method, it takes the params and the shard layout. Would it make sense to shove the shard layout inside of the params? You can then get rid of left/ right child because they can be obtained from the shard layout.
let flat_head = if let FlatStorageStatus::Ready(FlatStorageReadyStatus { flat_head }) = | ||
store | ||
.get_flat_storage_status(parent_shard) | ||
.map_err(|err| Into::<StorageError>::into(err))? | ||
{ | ||
flat_head | ||
} else { | ||
let err_msg = "flat storage parent shard is not ready!"; | ||
error!(target: "resharding", ?parent_shard, err_msg); | ||
return Err(Error::ReshardingError(err_msg.to_owned())); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you move it to a helper method and multi-line it a bit?
|
||
/// Struct used to destructure a new shard layout definition into the resulting resharding event. | ||
#[derive(Debug)] | ||
#[cfg_attr(test, derive(PartialEq, Eq))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need the conditional derive? Are those not used outside of tests for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only in tests for now :)
BorshSerialize, | ||
BorshDeserialize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EpochManager is everywhere anywya so I wouldn't mind using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, mostly just styling comments, for the rest feel free to follow up separately
Self { runtime, resharding_event } | ||
} | ||
|
||
/// Resumes a resharding event that was in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ... that was interrupted
It feels strange to be resuming something that is in progress.
Can you also add what can cause resharding to be halted and resumed? Is node restart / crashing the only event that may cause this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering my own question - I think a fork may also cause interruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made more clear the distinction between cancelling (to be used in the case forks) and interrupting (when the node crashes).
) -> Result<(), Error> { | ||
match status { | ||
FlatStorageReshardingStatus::CreatingChild => { | ||
// Nothing to do here because the parent will take care of resuming work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be surprised to see some cleanup logic here but it's also fine to do it from parent.
#[derive( | ||
BorshSerialize, BorshDeserialize, Clone, Debug, PartialEq, Eq, serde::Serialize, ProtocolSchema, | ||
)] | ||
pub enum FlatStorageReshardingStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about the future, the next event that we'll be adding will be merging. Can you make the names of events future proof - e.g. think what will be good names for the merging parents and children and make it so that it's consistent when we add those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about MergingShards
but it's still very far away concept in my mind
/// Hash of the first block having the new shard layout. | ||
pub block_hash: CryptoHash, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash of the first block with new shard layout is not known at the time when the flat storage resharding is triggered (unless maybe I'm missing something). My thinking was that this will be triggered together with all of the rest of resharding logic in post process of the last block of old shard layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revisit this comment in a following PR!
} | ||
// Parent shard is no longer part of this shard layout. | ||
let parent_shard = | ||
ShardUId { version: shard_layout.version(), shard_id: *parent_id as u32 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're relying here on the version not changing starting from the ShardLayoutV2. Just saying because it looked sus at first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah indeed
// Find the boundary account between the two children. | ||
let Some(boundary_account_index) = | ||
shard_layout.shard_ids().position(|id| id == left_child_shard.shard_id()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would move that to shard layout method.
maybe even the entire method
fine to do later
return Ok(()); | ||
} | ||
assert_eq!(children_shard_uids.len(), 2); | ||
let Ok(Some(ReshardingEventType::SplitShard(split_shard_event))) = resharding_event_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not handling the error case here - Err and Ok(None) are treated the same.
- if intentional - please comment
- if accidental - please add
?
or handle it explicitly
The PR adds early MVP capabilities of resharding flat storage (V3).
Main addition is
FlatStorageResharder
and all the toolings around that. Also, you can see traces of an early attempt to tie-in the resharder to existing flat storage code, mainly the flat storage creator.FlatStorageResharder
takes care of everything related to resharding the flat storage.ReshardingEventType
is an utility enum to represent types of resharding. There's one for now, but it makes easier adding more.Achievements
Missing pieces
Missing pieces will likely be done in another PR.
EDIT: integrated with ShardLayoutV2, fixed all unit tests, re-arranged description.