Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trial parsing is not the most efficient #474

Closed
huitema opened this issue Jun 24, 2024 · 12 comments
Closed

Trial parsing is not the most efficient #474

huitema opened this issue Jun 24, 2024 · 12 comments
Assignees
Labels
Needs PR Wire Format Related to how messages are serialized and parsed

Comments

@huitema
Copy link
Contributor

huitema commented Jun 24, 2024

The MoQ transport messages are encoded as a combination of varints, bit strings, and t-l-v objects. In the absence of a "header length" field, applications have to resort to trial parsing: accumulate some incoming bytes from the stream, try to parse those bytes as a message, and if that fail accumulate more bytes and retry. This may not be a huge problem in normal operation. The peer endpoint will compose a message and typically send those bytes all at once; the QUIC stack will deliver them all at once, but even so there may be cases where the QUIC stack cannot send them in a single STREAM_DATA messages, so the logic for "accumulate and try" has to be present.

Suppose a peer sending its data one byte at a time, because it is either malicious or buggy. The length of the message cannot be computed before all components are parsed: the first byte of each varint defines its length, the length field of bits or bytes strings need to be parsed in full, then number of parameters or versions needs to be known. To obtain the length of a component, all components before it need to be parsed, which means that the number of trial parsing is typically a function of the number of components -- maybe an O(N2) function since the cost of each trial depends on the number of components before it. That number can be very large for messages including parameters, because the number of parameters cannot be known in advance -- the peer could send an arbitrary number of them in an attempt to "greasing". Parsing is a somewhat CPU intensive operation, which implies that trial parsing can be exploited for a mild DOS attack.

Prepend a length field to messages would mitigate trial parsing, but I am not sure that we need to fix this. Having limits on message size would impose a de-facto limit on the processing cost of each message. A malicious attackers could send series of short messages such as "subscribe updates" to trigger the same CPU consumption. In typical QUIC stacks, the main CPU load comes form passing messages through the OS/app boundary in socket calls, and an attacker could DOS the target by sending series of short messages. The next component is the cost of encryption and decryption, compared to which the cost of trial decryption is probably small. We may decide to do nothing, but it would be nice in that case to have a documentation of the reasoning (as in this issue), and maybe a discussion of why the attack does not matter in the security section.

Addressing the "limits" issue (#473) would also somewhat mitigate the problem, as limiting the "number of parameters" appears to be the main way to increase the number of components in messages.

@kixelated
Copy link
Collaborator

MoQ's message encoding is similar to QUIC's, but the difference is that QUIC datagrams have a finite size and are not susceptible to a trickle attack.

I've implemented two types of MoQ parsers:

  1. Async state machine.
  2. Trial parser.

The async parser is easy in some languages (JavaScript, Go). There could be excessive wake ups if the stream is written one byte at a time but I that's a fundamental QUIC problem (userspace wakeup).

My trial parser (Rust) returns a More(min_bytes_needed) error to help mitigate this attack. You can return the remaining size (ex. string/VarInt) in addition to the minimum size of any remaining fields (usually 1 byte).

You could implement a state machine manually too, but a trial parser will be fine if you don't allocate on the heap.

I'm on the fence if we should length prefix or not. I've implemented both and think streaming is easier in general across multiple languages, but I do think this decoding gotcha is real.

@huitema
Copy link
Contributor Author

huitema commented Jun 24, 2024

I wrote a trial parser in C that also returns the min number of additional bytes needed, like your Rust parser, but there can be a big difference between the min number and the max number: for a varint, 1 byte versus 8 bytes; for a TLV, 2 bytes for 2 varints versus 8+8+length. That's how I came to the estimate O(number_of_components^2).

I am also on the fence, as I said in the top post. The simplest mitigation would be to limit the number of parameters allowed in Setup or Subscribe parameters. Currently, the lists contain one or two values, but given greasing the max is effectively unlimited.

@LPardue
Copy link
Contributor

LPardue commented Jun 24, 2024

This sounds like what HTTP/3 frame and stream parsing already has to deal with, or am I overlooking something unique to MoQT?

@suhasHere
Copy link
Collaborator

suhasHere commented Jun 24, 2024

I would like to put a limit on number of parameters per version. For the draft-04 it is 2. If in the future drafts we add more, we can increase the max as needed. Any one using more than max for a given version will result in protocol error.

@afrind
Copy link
Collaborator

afrind commented Jun 24, 2024

This sounds like what HTTP/3 frame and stream parsing already has to deal with, or am I overlooking something unique to MoQT?

H3 has an upfront frame length, so you can avoid dribbling byte/reparsing attacks

@LPardue
Copy link
Contributor

LPardue commented Jun 24, 2024

OK, I think this is a real problem then. +1 to fixing it

@afrind afrind added Wire Format Related to how messages are serialized and parsed labels Jun 25, 2024
@fluffy
Copy link
Contributor

fluffy commented Jul 4, 2024

+1 on we should fix this.

My proposal would be a that message other than object have a one new length field at start of message and object have a length of all the headers not including the object data plus just before the object data the length of the object payload

@kixelated
Copy link
Collaborator

kixelated commented Jul 6, 2024

A size prefix can be annoying for encoding because you need* to buffer messages to compute their total size. In TLS it's okay since the size is a uint32 and you can populate it afterwards. For MoQ it's more annoying since the size itself is a VarInt and takes up 1-8 bytes.

I think a size prefix for each message is fine so long as we don't nest messages like TLS. That means no upfront size for params, although a count is fine. Or we can save a byte and have params continue until the end of a message.

@vasilvv
Copy link
Collaborator

vasilvv commented Jul 7, 2024

For control messages on bidi streams, we could just use fixed-size 32-bit length prefix, e.g.:

Control Message {
  Message Size Including Type (32),
  Type (i),
  Payload (...)
}

Object messages shouldn't really be called messages anyways, those are effectively data stream headers.

@kixelated
Copy link
Collaborator

kixelated commented Jul 7, 2024

For control messages on bidi streams, we could just use fixed-size 32-bit length prefix

I think we keep it a VarInt. If an encoder wants to use a fixed size length, it can use a 30-bit varint (first two bits are 10).

I would instead allocate/reuse a buffer for each message with 8 bytes at the start, skipping the first N bytes based on the final varint size. It's a little more complicated but worth the savings since most of these messages will need a 1 byte size anyway.

I just really want to avoid nested messages like MP4 and TLS, where the majority of the bytes used are 32-bit id/size pairs. It's both a waste and annoying to encode id1, size, id2, size-4, id3, size-8 .... It's also the reason why the fMP4 overhead is not suitable for audio; MoQ can't repeat the same mistake.

@fluffy
Copy link
Contributor

fluffy commented Jul 8, 2024

+1 luke

@ianswett
Copy link
Collaborator

I believe this was fixed by #520

@github-project-automation github-project-automation bot moved this from Needs PR to Done in MoQT Tracking Board Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs PR Wire Format Related to how messages are serialized and parsed
Projects
Development

No branches or pull requests

8 participants