Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ShardsManager] Move ShardsManager to its own actor #8329

Merged
merged 2 commits into from
Jan 31, 2023

Conversation

robin-near
Copy link
Contributor

This is a big refactoring, mostly due to tests.

Some things that are worth mentioning:

  • The ShardsManagerActor is a new actor that takes the ShardsManagerRequest message and returns nothing.
    • This does mean that integration tests that penetrate the ShardsManager abstraction would now struggle to assert some detailed results. Luckily there aren't really many legitimate ones.
  • The head and header_head are now supplied via the ShardsManagerRequest::UpdateChainHeads message, and no longer supplied via the request_chunks_* calls or the resend_chunk_requests call. The latter is no longer originated by a timer in the Client which necessitates maintaining the heads in the ShardsManager, and then the former is no longer required as a result. This shouldn't be a problem, because the head is only used to validate signatures or decide forwarding recipients in the case where the prev_hash isn't available (i.e. it's best-effort to begin with), and the header_head is used to decide who exactly to fetch a chunk from (whether from archival, etc.), which is inherently also a best-effort determination. Besides, the updating of heads is serialized with all other client-initiated calls, which means that the only case when we might have a behavior change is when the head or header_head isn't exactly the same in the resend_chunk_requests call (which is initiated by a timer in ShardsManagerActor now).
  • Because the header and header_head are needed, I've included them as parameters in the ShardsManager constructor. (The alternative would be to keep them optional until we first hear from the client, but that adds complexity to handle the None case which is obscure and hard to test). This means that the store must be initialized with at least one block (from genesis) before ShardsManager is constructed, which in today's code means we must construct the Client before ShardsManager (since Client constructs Chain). In a future refactoring, the Chain should be constructed before the Client and everything else, so then initialization order wouldn't be a problem. In the meantime, we ensure the construction order, by moving Client::new call out of ClientActor constructor so it is synchronous, (and for testing ensure this on a framework-by-framework basis).
  • Much of the refactoring is also due to replacing (Addr, Addr) with ActorHandlesForNetworkMessageMocking, so that we can add a third, Arc. This is pretty simple.
  • For synchronous tests like TestEnv and setup_client, instead of constructing a ShardsManager and then propagating ShardsManagerRequest messages to it, I've made a SynchronousShardsManagerAdapter which just calls ShardsManager directly when it is called, as opposed to posting a message to a queue (like MockPeerManagerAdapter). This makes us mostly not have to change anything in these test cases, because the behavior is identical to before the refactoring.

#7662

@robin-near robin-near requested a review from mzhangmzz January 11, 2023 06:18
@robin-near robin-near requested a review from a team as a code owner January 11, 2023 06:18
chain/client/src/client.rs Show resolved Hide resolved
chain/client/src/client.rs Outdated Show resolved Hide resolved
@@ -1253,9 +1246,11 @@ impl Client {
apply_chunks_done_callback: DoneApplyChunkCallback,
) {
let chunk_header = partial_chunk.cloned_header();
self.chain.blocks_delay_tracker.mark_chunk_completed(&chunk_header, Clock::utc());
self.block_production_info
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think this is a bug fix. It used to be called whenever we receive a PartialEncodedChunk, but that's not the same as collecting the chunk for block production purposes.

@robin-near robin-near force-pushed the sm branch 4 times, most recently from b612511 to 4b6bc1a Compare January 11, 2023 22:42
chain/chunks/src/adapter.rs Show resolved Hide resolved
);
fn process_partial_encoded_chunk_request(
&self,
partial_encoded_chunk_request: PartialEncodedChunkRequestMsg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add docs for the functions here since they are the interface of ShardsManager

}
}

pub struct ShardsManagerAdapterAsAdapterForNetwork {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments

}
}

impl ShardsManagerAdapterForNetwork for ShardsManagerAdapterAsAdapterForNetwork {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ShardsManagerAdapterForNetwork need to be a trait? Can you just implement ShardsManagerForNetwork as a struct and avoid having both ShardsManagerAdapterForNetwork and ShardsManagerAdapterAsAdapterForNetwork? I think having too many adapters makes the code convoluted unnecessarily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advocate for just making network use ShardsManagerAdapter directly instead of create a new adapter for it. The advantage of creating a different interface for network is to restrict the proper usage of network code of ShardsManager. The disadvantage is the code gets too complicated with some many adapters. Since the APIs and how it can be used at call sites are very simple, I don't think there is much risk of network code misusing ShardsManager. I would prefer to have simpler code here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeaaaaa... the proliferation of adapters is seriously confusing. Well, the immediate problem with having network use ShardsManagerAdapter is that near-chunks depends on near-network, so that would create a circular dependency. It can be avoided by moving ShardsManagerAdapter (or perhaps PeerManagerAdapter) into a lower crate, but I'm not sure where that should be - doesn't feel like it should be in primitives - maybe we don't have such a crate today.

Copy link
Contributor

@mzhangmzz mzhangmzz Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. That is indeed annoying. What if we create two adapters, ShardsManagerAdapterForNetwork and ShardsManagerAdapterForClient and implement

impl<A: MsgRecipient<ShardsManagerRequest>> ShardsManagerAdapterForNetwork for A

and

impl<A: MsgRecipient<ShardsManagerRequest>> ShardsManagerAdapterForClient for A

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... as we discussed offline, I really like this suggestion, and I think this is the direction we want to go. But I ran into some rather subtle roadblocks when trying to make this work.

The most problematic issue is that for testing (at least the current framework), it would be convenient to have a single interface that supports both of these adapter interfaces. For that I need a third trait ShardsManagerAdapterForTest that has both of these traits as bounds, and have the A:MsgRecipient implement ShardsManagerAdapterForTest as well - all of that is fine until I need to take a Arc and convert it to an Arc - I was surprised that this is actually an experimental feature in Rust.

Hmm, I don't think this is going to be easy to solve within this PR, so let me try a couple more things, but in the meantime let's just defer this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I've pushed a revision where I found a workaround to this. Let me know what you think. It's basically what you suggested, plus the following:

  • ShardsManagerAdapterForTest that I mentioned above, plus for_client and for_network to help with trait upcasting
  • I also split the Request enum. This is due to a subtle technicality too, because there's no way to do impl<A: MsgRecipient<ShardsManagerRequest>> ShardsManagerAdapterForNetwork for A in the near-chunks crate, because none of these traits belong to near-chunks. So, I have to do that impl in the near-network crate, but then I need the ShardsManagerRequest to also be referenceable from that crate, and therefore I moved that part of the request enum to that crate. This doesn't have much of an effect - what this means is basically that ShardsManager now has handle_client_request and handle_network_request. Nothing fancy.

chain/client/src/test_utils.rs Outdated Show resolved Hide resolved
chain/client/src/test_utils.rs Outdated Show resolved Hide resolved
@robin-near robin-near force-pushed the sm branch 6 times, most recently from 5b49db5 to e80db4e Compare January 25, 2023 04:31
Copy link
Contributor

@mzhangmzz mzhangmzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you for the change.

Please also run nayduck before merging this since this is a large PR.

@robin-near
Copy link
Contributor Author

Nayduck has passed: https://nayduck.near.org/#/run/2840

@near-bulldozer near-bulldozer bot merged commit 2a80275 into near:master Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants