Compress the State Response Message in State Sync #112

liuchengxu · 2024-08-14T03:16:36Z

This RFC proposes compressing the state response message during the state syncing process to reduce the amount of data transferred. Gigabytes of state sync data can be expected to be saved for the chains with a huge state.

alindima

While not familiar with this part of the protocol, I left some general thoughts

alindima · 2024-08-15T11:05:05Z

text/0112-compress-state-response-message-in-state-sync.md

+
+None.
+
+### Compatibility


It looks to me that this needs some network protocol versioning. For example what happens if the receiver is not expecting compressed data? Or if the receiver is expecting compressed data but gets uncompressed data?

This is a valid concern, and I agree that network versioning would be the ideal solution. However, it seems that Substrate doesn't currently support network protocol versioning, which might deserve another RFC.

In the meantime, we can introduce a backward-compatible workaround. For example, we could define a new StateRequestV1 with an additional compress_response: bool field. This field would default to false if not provided. Here's how it would work in practice, assuming an old node (OldNode) and an upgraded node (NewNode):

OldNode to NewNode (OldNode => NewNode):
When OldNode sends a StateRequest payload to NewNode, the NewNode interprets it as StateRequestV1 { ..., compress_response: false }. Based on this field, NewNode will decide whether to compress the response, ensuring that OldNode receives the data in an uncompressed format, maintaining compatibility.

NewNode to OldNode (NewNode => OldNode):
If NewNode sends a StateRequestV1 payload to OldNode, OldNode will fail to decode it, leading to an aborted sync. This is because OldNode does not recognize the new StateRequestV1 structure.

OldNode to OldNode and NewNode to NewNode:
Syncing between nodes of the same version (OldNode => OldNode or NewNode => NewNode) will proceed as expected, with OldNodes using uncompressed data and NewNodes using compressed data when possible.

The main consequence of this approach is that NewNodes won't be able to sync state from OldNodes, but they can still sync with other NewNodes. This workaround isn't perfect, but it ensures some level of compatibility until a more comprehensive network versioning solution can be implemented in Substrate.

I added some support for req/response protocol versioning in paritytech/polkadot-sdk#2771 and used it for the implementation of #47. Although it's not optimal. It would first try a request on the new version and if that's not supported try on the old protocol (so making 2 round trips if the remote does not support the new version). It was implemented this way mostly due to libp2p not having the most flexible interfaces.

In any case, the RFC should clearly state the interactions between new/old nodes

CC: @lexnv

@bkchr Any more comments apart from the interactions between the old and new nodes? If the workaround I proposed in my last comment is fine, I can update the RFC.

This optimization makes sense from my perspective. We are trading off bandwidth for CPU cycles in hopes that this will yield an overall faster warp sync.

It would be interesting to see the comparison with times as well.
For example the sub-triage-logs tool has a newly added command that measures the warp-sync time:

# Provide the path to the logs outputted by a substrate-based node cargo run -- warp-time --file ../substrate-log.txt

Indeed we'd need to think about versioning as well. We'd like to deprecate the older request-response protocols, so its worth adding as few breaking changes as possible and having a deprecation plan in mind:

when the NewNode sends out the V2 format to the OldNode, the old node will fail to decode. After a decoding error the OldNode will change the reputation of the NewNode and possibly disconnect the NewNode from its network

NewNode can try to send out the V1 format after the V2 was dropped, but ideally, we should avoid the first case

Offhand, it looks like a similar approach to paritytech/polkadot-sdk#2771 should solve this issue. Then if we are to introduce a new request-response version, should we also consider other compression algorithms? I think this complicates things, but would be interesting to see some comparison between different compression algorithms and polkadot specific data (ie maybe we end up saving 20% bandwidth instead of 33% but we are faster to compress)

alindima · 2024-08-15T11:06:02Z

text/0112-compress-state-response-message-in-state-sync.md

+- Node Operators.
+- Substrate Users.
+
+## Explanation


I think this requires some implementation details, like what compression algorithm is going to be used

As per the code changes link in L42, We'll use zstd as the compression tool, which has already been used in other places in the substrate. I can add a sentence to make it more explicit.

dimartiro · 2024-11-01T18:05:13Z

text/0112-compress-state-response-message-in-state-sync.md

+
+This RFC proposes compressing the state response message during the state syncing process to reduce the amount of data transferred.
+
+## Motivation


Is the motivation to send less data or to improve state sync performance? Based on what you're saying, I think that implementing a persistence feature would be enough to solve this issue on slower network connections, as it allows you to request subsequent portions of the state incrementally.

Perhaps a better way to solve it is a combination of a persistence layer to store the state you are retrieving and to be able to stop/restart your node, and perhaps changing the request to ask for a certain amount of state.

Currently, the state request is like this

message StateRequest { // Block header hash. bytes block = 1; // Start from this key. // Multiple keys used for nested state start. repeated bytes start = 2; // optional // if 'true' indicates that response should contain raw key-values, rather than proof. bool no_proof = 3; }

What I'm suggesting is to add a new field with a limit:

message StateRequest { // Block header hash. bytes block = 1; // Start from this key. // Multiple keys used for nested state start. repeated bytes start = 2; // optional // if 'true' indicates that response should contain raw key-values, rather than proof. bool no_proof = 3; // amount of keys we want to receive in the response uint32 max_keys = 4; }

I know this could be configured as a different proposal, but I think it could be a better way to solve what you are saying without relying on compression algorithms. In the worst case, you could use a combination of both, asking for a partial state + compressing the response.

The trade-off might be that the number of requests could increase. wdyt?

Thanks for your feedback. There is already a persistent state sync tracking issue paritytech/polkadot-sdk#4 that I'm actively working on. I initiated this RFC because this is a low-hanging fruit to reduce the bandwidth that can lead to immediate improvements in state sync performance, while persistent state sync requires much more effort and complexity (may take too long to wrap it up). As you can see from the patch, a few change lines will significantly improve the bandwidth consumption. The proposed changes in the RFC are minimal yet impactful in terms of bandwidth savings.

In contrast, your suggestion to add a max_keys field, while valid, does not contribute to reducing any bandwidth. It may even lead to increased request frequency, which could negate some of the performance gains we seek.

Add compress-state-response-message-in-state-sync.md

7ebb2fe

liuchengxu mentioned this pull request Aug 14, 2024

Compress the state response to reduce the state sync data transfer paritytech/polkadot-sdk#5312

Open

2 tasks

alindima reviewed Aug 15, 2024

View reviewed changes

kamilsa mentioned this pull request Aug 15, 2024

[Feature Request]: Sync with compression qdrvm/kagome#2184

Open

anaelleltd added the Proposed Is awaiting 3 formal reviews. label Sep 16, 2024

liuchengxu mentioned this pull request Nov 1, 2024

Improve the state sync progress report paritytech/polkadot-sdk#5397

Open

2 tasks

dimartiro reviewed Nov 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compress the State Response Message in State Sync #112

Compress the State Response Message in State Sync #112

liuchengxu commented Aug 14, 2024

alindima left a comment

alindima Aug 15, 2024

liuchengxu Aug 16, 2024

alindima Aug 16, 2024 •

edited

Loading

liuchengxu Sep 5, 2024

lexnv Sep 5, 2024

alindima Aug 15, 2024

liuchengxu Aug 15, 2024

dimartiro Nov 1, 2024

liuchengxu Nov 2, 2024


		This RFC proposes compressing the state response message during the state syncing process to reduce the amount of data transferred.

		## Motivation


		None.

		### Compatibility

Compress the State Response Message in State Sync #112

Are you sure you want to change the base?

Compress the State Response Message in State Sync #112

Conversation

liuchengxu commented Aug 14, 2024

alindima left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alindima Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alindima Aug 16, 2024 •

edited

Loading