Move ShardsManager chunk management logic to a separate Actix thread #7662

robin-near · 2022-09-21T20:46:34Z

Right now, all chunk management logic is handled by the ClientActor thread, including the handling of partial encoded chunk messages, sending partial encoded chunk requests, responding to partial encoded chunk requests, handling partial encoded chunk responses, decoding chunks, etc. This unnecessarily clogs the ClientActor thread. We should move these to a separate actor.

matklad · 2022-09-22T17:09:54Z

Consider re-using existing sync jobs actor:

nearcore/chain/client/src/client_actor.rs

Lines 149 to 156 in dc404fe

    
           let sync_jobs_actor_addr = SyncJobsActor::start_in_arbiter( 
        
               &state_parts_arbiter.handle(), 
        
               move |ctx: &mut Context<SyncJobsActor>| -> SyncJobsActor { 
        
                   ctx.set_mailbox_capacity(SyncJobsActor::MAILBOX_CAPACITY); 
        
                   SyncJobsActor { client_addr: self_addr } 
        
               }, 
        
           ); 
        
           wait_until_genesis(&chain_genesis.time);

That is, I think we can re-name this actor as "stuff we do in background actor".

I don't have enough context to understand whether that 's actually a good idea or not, but seems worth consideration

robin-near · 2022-09-22T18:22:05Z

Hmm, I also don't have much context on what sync jobs actor does, but chunks management is also a critical task (as it affects validator latency) and not less important than ClientActor, so I don't think it should share with a generic background queue.

This is a big refactoring, mostly due to tests. Some things that are worth mentioning: - The ShardsManagerActor is a new actor that takes the ShardsManagerRequest message and returns nothing. - This does mean that integration tests that penetrate the ShardsManager abstraction would now struggle to assert some detailed results. Luckily there aren't really many legitimate ones. - The head and header_head are now supplied via the ShardsManagerRequest::UpdateChainHeads message, and no longer supplied via the request_chunks_* calls or the resend_chunk_requests call. The latter is no longer originated by a timer in the Client which necessitates maintaining the heads in the ShardsManager, and then the former is no longer required as a result. This shouldn't be a problem, because the head is only used to validate signatures or decide forwarding recipients in the case where the prev_hash isn't available (i.e. it's best-effort to begin with), and the header_head is used to decide who exactly to fetch a chunk from (whether from archival, etc.), which is inherently also a best-effort determination. Besides, the updating of heads is serialized with all other client-initiated calls, which means that the only case when we might have a behavior change is when the head or header_head isn't exactly the same in the resend_chunk_requests call (which is initiated by a timer in ShardsManagerActor now). - Because the header and header_head are needed, I've included them as parameters in the ShardsManager constructor. (The alternative would be to keep them optional until we first hear from the client, but that adds complexity to handle the None case which is obscure and hard to test). This means that the store must be initialized with at least one block (from genesis) before ShardsManager is constructed, which in today's code means we must construct the Client before ShardsManager (since Client constructs Chain). In a future refactoring, the Chain should be constructed before the Client and everything else, so then initialization order wouldn't be a problem. In the meantime, we ensure the construction order, by moving Client::new call out of ClientActor constructor so it is synchronous, (and for testing ensure this on a framework-by-framework basis). - Much of the refactoring is also due to replacing (Addr<ClientActor>, Addr<ViewClientActor>) with ActorHandlesForNetworkMessageMocking, so that we can add a third, Arc<dyn ShardsManagerAdapter>. This is pretty simple. - For synchronous tests like TestEnv and setup_client, instead of constructing a ShardsManager and then propagating ShardsManagerRequest messages to it, I've made a SynchronousShardsManagerAdapter which just calls ShardsManager directly when it is called, as opposed to posting a message to a queue (like MockPeerManagerAdapter). This makes us mostly not have to change anything in these test cases, because the behavior is identical to before the refactoring. #7662

This is a big refactoring, mostly due to tests. Some things that are worth mentioning: - The ShardsManagerActor is a new actor that takes the ShardsManagerRequest message and returns nothing. - This does mean that integration tests that penetrate the ShardsManager abstraction would now struggle to assert some detailed results. Luckily there aren't really many legitimate ones. - The head and header_head are now supplied via the ShardsManagerRequest::UpdateChainHeads message, and no longer supplied via the request_chunks_* calls or the resend_chunk_requests call. The latter is no longer originated by a timer in the Client which necessitates maintaining the heads in the ShardsManager, and then the former is no longer required as a result. This shouldn't be a problem, because the head is only used to validate signatures or decide forwarding recipients in the case where the prev_hash isn't available (i.e. it's best-effort to begin with), and the header_head is used to decide who exactly to fetch a chunk from (whether from archival, etc.), which is inherently also a best-effort determination. Besides, the updating of heads is serialized with all other client-initiated calls, which means that the only case when we might have a behavior change is when the head or header_head isn't exactly the same in the resend_chunk_requests call (which is initiated by a timer in ShardsManagerActor now). - Because the header and header_head are needed, I've included them as parameters in the ShardsManager constructor. (The alternative would be to keep them optional until we first hear from the client, but that adds complexity to handle the None case which is obscure and hard to test). This means that the store must be initialized with at least one block (from genesis) before ShardsManager is constructed, which in today's code means we must construct the Client before ShardsManager (since Client constructs Chain). In a future refactoring, the Chain should be constructed before the Client and everything else, so then initialization order wouldn't be a problem. In the meantime, we ensure the construction order, by moving Client::new call out of ClientActor constructor so it is synchronous, (and for testing ensure this on a framework-by-framework basis). - Much of the refactoring is also due to replacing (Addr<ClientActor>, Addr<ViewClientActor>) with ActorHandlesForNetworkMessageMocking, so that we can add a third, Arc<dyn ShardsManagerAdapter>. This is pretty simple. - For synchronous tests like TestEnv and setup_client, instead of constructing a ShardsManager and then propagating ShardsManagerRequest messages to it, I've made a SynchronousShardsManagerAdapter which just calls ShardsManager directly when it is called, as opposed to posting a message to a queue (like MockPeerManagerAdapter). This makes us mostly not have to change anything in these test cases, because the behavior is identical to before the refactoring. near#7662

robin-near linked a pull request Sep 21, 2022 that will close this issue

[chunks] Move chunk completion out of ShardsManager #7622

Merged

matklad added the T-core Team: issues relevant to the core team label Sep 22, 2022

robin-near closed this as completed in #7622 Sep 24, 2022

mzhangmzz reopened this Oct 6, 2022

This was referenced Oct 6, 2022

[chunks] Move tx_pool out of ShardsManager. #7783

Merged

[chunks] Move chain_head maintenance into ShardsManager #7784

Merged

[chunks] Move maintenance of ready-to-include chunks out of ShardsManager #7785

Merged

robin-near mentioned this issue Oct 10, 2022

[chunks] Fix bug that produced partial chunks were not inserted into cache #7795

Merged

robin-near mentioned this issue Dec 2, 2022

Tracking issue for codebase modularization and other dev velocity improvements #8153

Open

12 tasks

robin-near mentioned this issue Jan 11, 2023

[ShardsManager] Move ShardsManager to its own actor #8329

Merged

robin-near closed this as completed Feb 6, 2023

robin-near mentioned this issue Feb 6, 2023

Tracking issue: Codebase allows engineers to write tests quickly and has 80% unit test coverage for chunk distribution logic #8512

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move ShardsManager chunk management logic to a separate Actix thread #7662

Move ShardsManager chunk management logic to a separate Actix thread #7662

robin-near commented Sep 21, 2022

matklad commented Sep 22, 2022 •

edited

Loading

robin-near commented Sep 22, 2022

Move ShardsManager chunk management logic to a separate Actix thread #7662

Move ShardsManager chunk management logic to a separate Actix thread #7662

Comments

robin-near commented Sep 21, 2022

matklad commented Sep 22, 2022 • edited Loading

robin-near commented Sep 22, 2022

matklad commented Sep 22, 2022 •

edited

Loading