Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add "read with limit" functionality to DataStream #990

Closed
wants to merge 1 commit into from

Conversation

Nilix007
Copy link

@Nilix007 Nilix007 commented May 6, 2019

Hey there,

as suggested in #973 (comment), Rocket should provide an easy way to read from the HTTP body with an upper limit on the body size. (As this is needed for Form, JSON, etc.)

Let's start off with this proof of concept which adds DataStream::read_to_string_with_limit and DataStream::read_to_end_with_limit together with a new error type LimitReadError.

@jebrosen: What do you think?

Copy link
Collaborator

@jebrosen jebrosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented but not fully committed to something similar before (https://git.jebrosen.com/jeb/sirus/src/commit/73faadda126187d88836e865dd8b4138742c71df/sirus-server/src/utils/limited_data.rs). I think doing something like this in Data or DataStream is a good idea, and potentially using it in Form and JSON as well.

core/lib/src/data/data_stream.rs Show resolved Hide resolved
limit: usize,
f: F,
) -> Result<T, LimitReadError> {
let mut r = self.by_ref().take(limit as u64 + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using take with 1 more than the real limit is clever. Maybe a bit too clever -- I want to double check the edge cases.

@Nilix007
Copy link
Author

Nilix007 commented May 8, 2019

Let me add some motivation for my API proposal:

  • All the code I've seen so far that deals with body data in Rocket uses either Read::read_to_end or Read::read_to_string. Also, all of these have some kind of size limit (for obvious reasons).
  • Probably most of the code wants to handle the specific error case of a too large payload, eg. for returning a 429 status code. io::Error does not seem a good error type for this case because the API consumers would need to check the error kind and try to downcast the inner error. A separate enum based error type is IMHO more convenient to use.

All in all, I'd start off with just the two functions and keep the do_with_limit private. If different use cases evolve, we can add a more comprehensive API.

@Nilix007
Copy link
Author

Nilix007 commented May 8, 2019

@jebrosen: If you're OK with this approach, I would continue with adding documentation, some tests for the edge cases, and update all the body data consumers in Rocket and contrib.

(I'm also unsure whether take(limit + 1).read_to_end(&mut buf) might overallocate by a factor of 2 when the input data exceeds the limit)

@jebrosen jebrosen self-assigned this May 9, 2019
@jebrosen
Copy link
Collaborator

I think it's a good idea to add this in some form, maybe with some name/API changes. Users have often wanted Vec<u8> or String in release builds; this will provide that flexibility without the same degree of risk and also make it easier to write certain FromData/FromDataSimple implementations with limit support.

In the documentation, I would definitely want to 1) encourage any potential users of these functions to use a user-configurable Limit and 2) strongly caution against using something like usize::max_value() as the limit because it's a trivial denial of service vector.

@jebrosen
Copy link
Collaborator

I want to revisit this, and I have some quick thoughts on the API:

  • I think the methods should be on Data and take self by move
  • I also think these methods should allocate a new String or Vec themselves

For example, rocket_contrib::msgpack's FromData implementation:

   let size_limit = r.limits().get("msgpack").unwrap_or(LIMIT);
   let mut buf = Vec::new();
   let mut reader = d.open().take(size_limit);
   match reader.read_to_end(&mut buf).await {
       Ok(_) => Borrowed(Success(buf)),
       Err(e) => Borrowed(Failure((Status::BadRequest, Error::InvalidDataRead(e)))),
   }

becomes

    let size_limit = r.limits().get("msgpack").unwrap_or(LIMIT);
    match d.read_to_end_with_limit(size_limit).await {
        Ok(buf) => Borrowed(Success(buf)),
        Err(e) => Borrowed(Failure((Status::BadRequest, Error::InvalidDataRead(e))))
    }

This way the methods would do what a lot of users do already, and anyone who needs to do something more complicated can still call open and read_to_end manually.

@SergioBenitez any thoughts on the above?

@SergioBenitez
Copy link
Member

I think we should go even further. Here's my proposed API:

impl Data {
    fn open(limit: usize) -> DataStream;
    fn peek(&self) -> &[u8];
    fn peek_complete(&self) -> bool;
}

impl DataStream {
    fn stream_to<W: Write>(self, writer: &mut W) -> io::Result<u64>;
    fn stream_to_file<P: AsRef<Path>>(self, path: P) -> io::Result<u64>;
    fn stream_to_string(self) -> io::Result<String>;
    fn stream_to_vec(self) -> io::Result<Vec<u8>>;
}

impl Read for DataStream { /* ... */ }

I can't think of a valid reason to ever read without a limit. So, let's enforce it.

We might even consider abusing unsafe by adding an open_unsafe() method that sets no limit on the returned DataStream.

@jebrosen
Copy link
Collaborator

I like those methods as well. I suspect open(usize::MAX) (please never do that) is close enough to open_unsafe it's not worth a separate method.

The name stream_to_string feels a bit wrong to me compared to read_to_string -- but I do like that it gives four methods whose names all start with stream_to_.

@SergioBenitez
Copy link
Member

SergioBenitez commented Feb 28, 2020

The name stream_to_string feels a bit wrong to me compared to read_to_string -- but I do like that it gives four methods whose names all start with stream_to_.

Given that DataStream implements Read, it'll also have the read_to methods. I wanted to avoid the naming conflict.

Another thing to consider here is that limit is not a sufficient...limit...to prevent DoS attacks. While it helps mitigates memory-exhaustion based attacks, it does nothing to prevent slow-loris style attacks. On sync, this means completely tying up resources, though there we have a read-timeout, which helps but doesn't solve the problem. On async, this translates to a bunch of idling futures, which in-turn means consuming memory, which takes us right back to memory-exhaustion.

I think what we'd really like to do is expose an API that requires limits to be set on several properties, with a hard, indelible limit. In particular:

  • read timeouts - how long we we're willing to wait between byte reads
  • data limits - how many bytes we're willing to read in all, irrespective of time
  • connection timeouts - how long we're willing to keep the connection open, irrespective of whether data is being received or not.

My guess is that a bandwidth-minimum approach to the connection timeout might actually make a bit more sense, especially when we consider very purposefully long-lived connections. That is, after a chosen period of time, assuming the other timeouts/limits haven't been exceeded, require that the bandwidth over a sliding window of time exceeds some factor. Otherwise, kill the connection, ideally in a graceful manner.

This approach seems fairly easy to implement in async. Combined with the API I proposed above, this should significantly decrease the opportunity for DoS based attacks on Rocket applications, and in the common case, make them impossible.

@SergioBenitez
Copy link
Member

I'd like a comprehensive approach to this. I've opened #1325 to track such an effort. Let's move the discussion there until we have a concrete plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr: closed This pull request was not merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants