Limit message sizes in multisig stages #1540
Labels
CFE
discussion
Discussions about proposed changes or ideas
effort-1
Easy, but maybe tedious
p2-somedaysoon
Milestone
Description
Recent discoveries of high memory usage by CFE (causing crashes and ceremony failures) made me think about what's causing it, but also whether this is something that can be exploited by some adversary. First thing that comes to mind is how we store the all stage messages in memory before we are ready to process them (either when we get message from all peers or on timeout).
Based on my rough estimates in #966 (comment) it looks like we could be storing on the order of 100Mb for the "heaviest" stage in keygen (this does not include next stage messages that we delay processing of). Not sure how much this contributed to the high memory usage observed on testnet, but I realised that we can end up storing much larger messages in the presence of an adversary: the size of messages for some stages isn't bounded.
Every
BroadcastVerificationMessage
, for example, can contain millions of elements and still be parsed as valid bybincode
(and thus be accepted initially):Note that we only process the messages when we receive all of them (which is often necessary), so a party could send us 1GB+ messages (unless there is some limit in the p2p layer to prevent this) causing us to run out of memory before we even get a chance to discard them as invalid.
Proposed solution
We should be able to perform simple pre-validation of all received messages, for example, by comparing the number of elements in the
HashMap
/Vec
with what we expect for the given ceremony.Alternatives
Set a hard limit in bytes on the size of every message somewhere higher in the stack (i.e. p2p layer), so we don't need a separate pre-validation step for each type of message. I prefer not to do this as it is too crude and can lead to difficult to identify bugs, e.g. if we change the contents of stage messages in the protocol but forget to update the size limit in the p2p layer.
The text was updated successfully, but these errors were encountered: