-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPC Websocket buffer overflowing #966
Comments
@msgmaxim can you give us an example of the packets (I'm mostly interested in their sizes) sent at each stage of the Keygen / Signing processes? We determined yesterday that a KeygenRequested event with 150 * 32 byte Validator Ids is nearly 5 |
Don't you mean 5KB? |
#954 actually mitigates the problem somewhat as it reduces the sizes of all crypto primities by a factor of 2 or 3 when serialized. Pre #954 each Point on the curve is 144 bytes, while a scalar is 71 bytes. With the new serialization scheme Point is 41 bytes, Scalar: 40 bytes. (Annoyingly each point and scalar is serialized with "purpose" text field, which doesn't seem all that useful to me and adds about 10 bytes to every point/scalar, so we could probably drop them to optimise further). To give some idea, the message sent in Stage 1 in keygen has Signing seems to be much more manageable: |
I merged a possible fix into the merge-config-option branch. So we can try it out. |
Thank you @msgmaxim! That makes sense. So during a broadcast verification stage, we have 150 * 2.2mb (330mb) of both outgoing and incoming messages, totalling 660mb. I can see why that would cause the 15mb buffer to have a bad time. We'll get your PR merged today and give it a crack, ideally with some profiling of the bandwidth used during the ceremony if possible @tomjohnburton. |
I think the only currently feasible change to further decrease this problem is to batch the p2p message, which will make all the broadcast stages very small, and only the single private stage would be large. This is a refactor I would like to do anyway. Also due to #983 I think we would like to move handling the connections between networks ourselves instead of using substrate. So we wouldn't use the RPC for p2p messages anymore at all. |
I think so. I would like to make a batching of p2p messages refactor ticket also (Low priority). It's basically an interface change between the p2p and the multisig code. Instead of the multisig passing each message to be sent individually on the channel to the p2p code, it could give the p2p code all the messages to be sent (for a single stage) at once (As a single item in the channel), this allows us to avoid duplicating the message in the broadcast. This also will simply the multisig tests as it makes it simple for the tests to retrieve all the output messages from a given stage with a single wait on the channel, instead of having wait of the channel once for each expected message separately. Although there probably needs to be a ticket for what @msgmaxim is doing as well, if there isnt one. |
Ok cheers - feel free to make the ticket and the plan 👍 |
Description
Currently it appears as though there's a potential buffer overlow in our usage of the Websocket RPC connection between the CFE and the State Chain Node. The issue is visible when attempting Keygen with a large enough set of participants (~40+).
Others have reported issues arising due to the size of RPC responses.
The max buffer size was added as a config option to Substrate recently, we should cherry pick this into our version of Substrate to see if it can work as a bandaid solution for now.
As a long term solution, we should evaluate the amount of data we're trying to push over the RPC connection at any one time, and determine whether we can decrease this.
@AlastairHolmes giving this a p0 since we want to test the bandaid solution ASAP.
cc @dandanlen @msgmaxim
The text was updated successfully, but these errors were encountered: