Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce messages size used to communicate between subsystems #617

Open
vstakhov opened this issue Jun 8, 2023 · 6 comments · May be fixed by #4814
Open

Reduce messages size used to communicate between subsystems #617

vstakhov opened this issue Jun 8, 2023 · 6 comments · May be fixed by #4814
Assignees
Labels
R0-silent Changes should not be mentioned in any release notes

Comments

@vstakhov
Copy link
Contributor

vstakhov commented Jun 8, 2023

Orchestra joins all message types into a single enumeration AllMessages. It means that all channels will have element size equal to the maximum size of any message in this enumeration. With async_channel it also means that we preallocate memory rings equal to channel_capacity * element_size. Currently, the size of AllMessages is equal to 1008 bytes, that leads to huge overhead in terms of memory usage, latency and channels efficiency. We need to Box messages where their size is more than, e.g. a cache line (64 bytes).

Here is a list of message sizes:

message type: polkadot_node_subsystem_types::messages::ApprovalDistributionMessage; size: 1008
message type: polkadot_node_subsystem_types::messages::ApprovalVotingMessage; size: 152
message type: polkadot_node_subsystem_types::messages::AvailabilityDistributionMessage; size: 112
message type: polkadot_node_subsystem_types::messages::AvailabilityRecoveryMessage; size: 344
message type: polkadot_node_subsystem_types::messages::AvailabilityStoreMessage; size: 120
message type: polkadot_node_subsystem_types::messages::BitfieldDistributionMessage; size: 216
message type: polkadot_node_subsystem_types::messages::CandidateBackingMessage; size: 512
message type: polkadot_node_subsystem_types::messages::CandidateValidationMessage; size: 432
message type: polkadot_node_subsystem_types::messages::ChainApiMessage; size: 56
message type: polkadot_node_subsystem_types::messages::ChainSelectionMessage; size: 48
message type: polkadot_node_subsystem_types::messages::CollationGenerationMessage; size: 280
message type: polkadot_node_subsystem_types::messages::CollatorProtocolMessage; size: 592
message type: polkadot_node_subsystem_types::messages::DisputeCoordinatorMessage; size: 368
message type: polkadot_node_subsystem_types::messages::DisputeDistributionMessage; size: 500
message type: polkadot_node_subsystem_types::messages::GossipSupportMessage; size: 136
message type: polkadot_node_subsystem_types::messages::NetworkBridgeRxMessage; size: 64
message type: polkadot_node_subsystem_types::messages::NetworkBridgeTxMessage; size: 544
message type: polkadot_node_subsystem_types::messages::ProvisionerMessage; size: 1008
message type: polkadot_node_subsystem_types::messages::RuntimeApiMessage; size: 152
message type: polkadot_node_subsystem_types::messages::StatementDistributionMessage; size: 592
@vstakhov vstakhov added A3-in_progress R0-silent Changes should not be mentioned in any release notes labels Jun 8, 2023
@vstakhov vstakhov self-assigned this Jun 8, 2023
@rphmeier
Copy link
Contributor

rphmeier commented Jun 8, 2023

We could even provide something like a fast arena allocator for elements which must be boxed, so the subsystem processing the incoming messages mostly stays within the same memory pages.

@vstakhov
Copy link
Contributor Author

vstakhov commented Jun 9, 2023

I think it is automatically done by jemalloc.

@rphmeier
Copy link
Contributor

rphmeier commented Jun 26, 2023

Let's add a test as well that asserts mem::size_of::<AllMessages>() < SOME_CONSTANT_TARGET so we can prevent regressions here once the implementation is updated.

@vstakhov
Copy link
Contributor Author

In the current approach, orchestra can just transparently box all messages that pass over channels, so the size of AllMessages is not important. Furthermore, the actual channels are actually created over the concrete types of messages (I was under wrong impression that they are not), and the AllMessage wrapper is used merely to pass a message into the orchestra.

In our case, we still have ApprovalDistributionMessage that takes 1008 bytes each and the size of the corresponding channel is 64000 of messages, so the ring uses around 64Mb with unboxed message channel and around 512k with boxed messages. Unfortunately, boxing of messages does not fix the issue with subsystems stall, however, such a boxing has clearly improved large messages performance in conjunction with large channels in the stress tests assuming that jemalloc is used. I'm trying to figure out other possible issues.

@eskimor
Copy link
Member

eskimor commented Mar 1, 2024

@sandreim status on this? Was this done?

@sandreim
Copy link
Contributor

sandreim commented Mar 1, 2024

Orchestra support is implemented, we just need to pass boxed_messages=true to orchestra proc-macro but this is not implemented in polkadot yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

6 participants