[WIP] Add "read with limit" functionality to `DataStream` #990

Nilix007 · 2019-05-06T11:59:16Z

Hey there,

as suggested in #973 (comment), Rocket should provide an easy way to read from the HTTP body with an upper limit on the body size. (As this is needed for Form, JSON, etc.)

Let's start off with this proof of concept which adds DataStream::read_to_string_with_limit and DataStream::read_to_end_with_limit together with a new error type LimitReadError.

@jebrosen: What do you think?

core/lib/src/data/data_stream.rs

jebrosen

I've implemented but not fully committed to something similar before (https://git.jebrosen.com/jeb/sirus/src/commit/73faadda126187d88836e865dd8b4138742c71df/sirus-server/src/utils/limited_data.rs). I think doing something like this in Data or DataStream is a good idea, and potentially using it in Form and JSON as well.

core/lib/src/data/data_stream.rs

jebrosen · 2019-05-07T02:29:07Z

core/lib/src/data/data_stream.rs

+        limit: usize,
+        f: F,
+    ) -> Result<T, LimitReadError> {
+        let mut r = self.by_ref().take(limit as u64 + 1);


Using take with 1 more than the real limit is clever. Maybe a bit too clever -- I want to double check the edge cases.

Nilix007 · 2019-05-08T19:46:22Z

Let me add some motivation for my API proposal:

All the code I've seen so far that deals with body data in Rocket uses either Read::read_to_end or Read::read_to_string. Also, all of these have some kind of size limit (for obvious reasons).
Probably most of the code wants to handle the specific error case of a too large payload, eg. for returning a 429 status code. io::Error does not seem a good error type for this case because the API consumers would need to check the error kind and try to downcast the inner error. A separate enum based error type is IMHO more convenient to use.

All in all, I'd start off with just the two functions and keep the do_with_limit private. If different use cases evolve, we can add a more comprehensive API.

Nilix007 · 2019-05-08T19:53:41Z

@jebrosen: If you're OK with this approach, I would continue with adding documentation, some tests for the edge cases, and update all the body data consumers in Rocket and contrib.

(I'm also unsure whether take(limit + 1).read_to_end(&mut buf) might overallocate by a factor of 2 when the input data exceeds the limit)

jebrosen · 2019-05-14T05:13:03Z

I think it's a good idea to add this in some form, maybe with some name/API changes. Users have often wanted Vec<u8> or String in release builds; this will provide that flexibility without the same degree of risk and also make it easier to write certain FromData/FromDataSimple implementations with limit support.

In the documentation, I would definitely want to 1) encourage any potential users of these functions to use a user-configurable Limit and 2) strongly caution against using something like usize::max_value() as the limit because it's a trivial denial of service vector.

jebrosen · 2020-02-17T22:05:42Z

I want to revisit this, and I have some quick thoughts on the API:

I think the methods should be on Data and take self by move
I also think these methods should allocate a new String or Vec themselves

For example, rocket_contrib::msgpack's FromData implementation:

   let size_limit = r.limits().get("msgpack").unwrap_or(LIMIT);
   let mut buf = Vec::new();
   let mut reader = d.open().take(size_limit);
   match reader.read_to_end(&mut buf).await {
       Ok(_) => Borrowed(Success(buf)),
       Err(e) => Borrowed(Failure((Status::BadRequest, Error::InvalidDataRead(e)))),
   }

becomes

    let size_limit = r.limits().get("msgpack").unwrap_or(LIMIT);
    match d.read_to_end_with_limit(size_limit).await {
        Ok(buf) => Borrowed(Success(buf)),
        Err(e) => Borrowed(Failure((Status::BadRequest, Error::InvalidDataRead(e))))
    }

This way the methods would do what a lot of users do already, and anyone who needs to do something more complicated can still call open and read_to_end manually.

@SergioBenitez any thoughts on the above?

SergioBenitez · 2020-02-28T03:03:20Z

I think we should go even further. Here's my proposed API:

impl Data {
    fn open(limit: usize) -> DataStream;
    fn peek(&self) -> &[u8];
    fn peek_complete(&self) -> bool;
}

impl DataStream {
    fn stream_to<W: Write>(self, writer: &mut W) -> io::Result<u64>;
    fn stream_to_file<P: AsRef<Path>>(self, path: P) -> io::Result<u64>;
    fn stream_to_string(self) -> io::Result<String>;
    fn stream_to_vec(self) -> io::Result<Vec<u8>>;
}

impl Read for DataStream { /* ... */ }

I can't think of a valid reason to ever read without a limit. So, let's enforce it.

We might even consider abusing unsafe by adding an open_unsafe() method that sets no limit on the returned DataStream.

jebrosen · 2020-02-28T18:01:33Z

I like those methods as well. I suspect open(usize::MAX) (please never do that) is close enough to open_unsafe it's not worth a separate method.

The name stream_to_string feels a bit wrong to me compared to read_to_string -- but I do like that it gives four methods whose names all start with stream_to_.

SergioBenitez · 2020-02-28T20:03:44Z

The name stream_to_string feels a bit wrong to me compared to read_to_string -- but I do like that it gives four methods whose names all start with stream_to_.

Given that DataStream implements Read, it'll also have the read_to methods. I wanted to avoid the naming conflict.

Another thing to consider here is that limit is not a sufficient...limit...to prevent DoS attacks. While it helps mitigates memory-exhaustion based attacks, it does nothing to prevent slow-loris style attacks. On sync, this means completely tying up resources, though there we have a read-timeout, which helps but doesn't solve the problem. On async, this translates to a bunch of idling futures, which in-turn means consuming memory, which takes us right back to memory-exhaustion.

I think what we'd really like to do is expose an API that requires limits to be set on several properties, with a hard, indelible limit. In particular:

read timeouts - how long we we're willing to wait between byte reads
data limits - how many bytes we're willing to read in all, irrespective of time
connection timeouts - how long we're willing to keep the connection open, irrespective of whether data is being received or not.

My guess is that a bandwidth-minimum approach to the connection timeout might actually make a bit more sense, especially when we consider very purposefully long-lived connections. That is, after a chosen period of time, assuming the other timeouts/limits haven't been exceeded, require that the bandwidth over a sliding window of time exceeds some factor. Otherwise, kill the connection, ideally in a graceful manner.

This approach seems fairly easy to implement in async. Combined with the API I proposed above, this should significantly decrease the opportunity for DoS based attacks on Rocket applications, and in the common case, make them impossible.

SergioBenitez · 2020-06-05T01:14:04Z

I'd like a comprehensive approach to this. I've opened #1325 to track such an effort. Let's move the discussion there until we have a concrete plan.

Add read functions with limits to DataStream

f9c82ae

Nilix007 mentioned this pull request May 6, 2019

Add error handling for too large JSON payloads #973

Closed

hellow554 reviewed May 6, 2019

View reviewed changes

core/lib/src/data/data_stream.rs Show resolved Hide resolved

jebrosen reviewed May 7, 2019

View reviewed changes

jebrosen self-assigned this May 9, 2019

jebrosen mentioned this pull request Nov 24, 2019

Documentation of FromDataSimple doesn't say it's only for debugging #1171

Closed

SergioBenitez force-pushed the master branch 2 times, most recently from 74113c3 to 95c981d Compare February 15, 2020 12:02

SergioBenitez mentioned this pull request Jun 5, 2020

Harden and Enforce Data Read Limits #1325

Closed

SergioBenitez closed this Jun 5, 2020

SergioBenitez added the pr: closed This pull request was not merged label Jun 5, 2020

SergioBenitez mentioned this pull request Aug 6, 2020

Minimize Slow-Loris Style DoS Risk with Adaptive Bandwidth-Limited Timeouts #1405

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add "read with limit" functionality to `DataStream` #990

[WIP] Add "read with limit" functionality to `DataStream` #990

Nilix007 commented May 6, 2019

jebrosen left a comment •

edited

Loading

jebrosen May 7, 2019

Nilix007 commented May 8, 2019

Nilix007 commented May 8, 2019

jebrosen commented May 14, 2019

jebrosen commented Feb 17, 2020

SergioBenitez commented Feb 28, 2020

jebrosen commented Feb 28, 2020

SergioBenitez commented Feb 28, 2020 •

edited

Loading

SergioBenitez commented Jun 5, 2020

[WIP] Add "read with limit" functionality to DataStream #990

[WIP] Add "read with limit" functionality to DataStream #990

Conversation

Nilix007 commented May 6, 2019

jebrosen left a comment • edited Loading

Choose a reason for hiding this comment

jebrosen May 7, 2019

Choose a reason for hiding this comment

Nilix007 commented May 8, 2019

Nilix007 commented May 8, 2019

jebrosen commented May 14, 2019

jebrosen commented Feb 17, 2020

SergioBenitez commented Feb 28, 2020

jebrosen commented Feb 28, 2020

SergioBenitez commented Feb 28, 2020 • edited Loading

SergioBenitez commented Jun 5, 2020

[WIP] Add "read with limit" functionality to `DataStream` #990

[WIP] Add "read with limit" functionality to `DataStream` #990

jebrosen left a comment •

edited

Loading

SergioBenitez commented Feb 28, 2020 •

edited

Loading