Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement automatic decompression of responses #125

Merged
merged 18 commits into from
Feb 9, 2022
Merged

Conversation

acw
Copy link
Contributor

@acw acw commented Feb 8, 2022

Fixes #124.

The flag marking this use case is held inside the extensions for the request, in the ViceroyRequestMetadata structure -- said structure can be extended in the future for other use cases -- and then queried in the final parts of send_request in src/upstream.rs. The test cases work through an echo server, which just bounces the request right back, which seemed the easiest way to get the various test cases back.

The error test case -- in which the response comes backed marked as gzip-encoded but isn't -- matches the implementation in C@E.

acw added 3 commits February 7, 2022 16:22
These tests are currently broken, but should succeed once decompression
works. They should also catch a couple error cases, although it's
unclear if the behavior I'm testing for in the case that someone sends
a bad response (one marked as compressed that actually isn't) is ideal.
These will probably need to be extended to support streaming / async
cases in the future.
Doing this as an extension on the request avoids the need to either (a)
manufacture a fake header, or (b) add a new field to the Session type.
Thanks to @aturon for the pointer!
…he fly.

I don't love the buffering behavior of this, and it almost certainly
does a completely unnecessary copy that should be easy to remove; that
being said, this is where I got to before I closed up for the night.
@@ -658,6 +658,9 @@ impl Session {
}
}

#[derive(Clone, Copy, Eq, Hash, PartialEq)]
pub struct AutoDecompressResponse {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool way to represent the flag, but there is a drawback for the future: we'll need to add additional metadata to requests and responses to keep up with C@E proper. (I'm doing some work that expands those out right now).

I think we should consider more closely following C@E's approach of introducing FastlyRequestMetadata/FastlyResponseMetadata structs with fields representing the various knobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7aa1e6d ! Except that I called it ViceroyRequestMetadata.

lib/src/body.rs Outdated
}
Poll::Ready(Some(item)) => {
// put the body back, so we can poll it again next time
self.chunks.push_front(body.into());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to push the entire CompressedHttpBody chunk to the front of the queue, to continue reading into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Good catch. Should be addressed in 88146fb.

Comment on lines 1 to 68
use fastly::{Backend, Request};
use fastly::http::request::SendError;

static HELLO_WORLD: &'static str = include_str!("../../data/hello_world");
static HELLO_WORLD_GZ: &'static [u8] = include_bytes!("../../data/hello_world.gz");

fn main() -> Result<(), SendError> {
let echo_server = Backend::from_name("echo").expect("Could not find echo backend?");

// Test framework sanity check: a request without the auto_decompress flag
// should bounce back to us unchanged.
let standard_echo = Request::put("http://127.0.0.1:9000")
.with_header("Content-Encoding", "gzip")
.with_body_octet_stream(HELLO_WORLD_GZ)
.send(echo_server.clone())?;

assert_eq!(standard_echo.get_header_str("Content-Encoding"), Some("gzip"));
assert_eq!(standard_echo.get_content_length(), Some(HELLO_WORLD_GZ.len()));

// Similarly, if we set the auto_decompress flag to false, it should also
// bounce back to us unchanged.
let explicit_no = Request::put("http://127.0.0.1:9000")
.with_header("Content-Encoding", "gzip")
.with_body_octet_stream(HELLO_WORLD_GZ)
.with_auto_decompress_gzip(false)
.send(echo_server.clone())?;

assert_eq!(explicit_no.get_header_str("Content-Encoding"), Some("gzip"));
assert_eq!(explicit_no.get_content_length(), Some(HELLO_WORLD_GZ.len()));

// But if we set the auto_decompress flag to true, and send a compressed
// file, we should get the uncompressed version back
let mut unpacked_echo = Request::put("http://127.0.0.1:9000")
.with_header("Content-Encoding", "gzip")
.with_body_octet_stream(HELLO_WORLD_GZ)
.with_auto_decompress_gzip(true)
.send(echo_server.clone())?;

assert!(unpacked_echo.get_header("Content-Encoding").is_none());
assert!(unpacked_echo.get_content_length().is_none());
let hopefully_unpacked = unpacked_echo.take_body_str();
assert_eq!(HELLO_WORLD, &hopefully_unpacked);

// That being said, if we set the flag to true and send it a text file,
// we should just get it back unchanged.
let mut yes_but_uncompressed = Request::put("http://127.0.0.1:9000")
.with_body_octet_stream(HELLO_WORLD.as_bytes())
.with_auto_decompress_gzip(true)
.send(echo_server.clone())?;

let still_unpacked = yes_but_uncompressed.take_body_str();
assert_eq!(HELLO_WORLD, &still_unpacked);

// A slightly odder case: We set everything up for unpacking, but we
// don't actually send a gzip'd file. In this case, we're going to say
// that we should get back exactly the input, even with the bogus
// content encoding.
let bad_gzip = Request::put("http://127.0.0.1:9000")
.with_header("Content-Encoding", "gzip")
.with_body_octet_stream(HELLO_WORLD.as_bytes())
.with_auto_decompress_gzip(true)
.send(echo_server.clone())?;

assert_eq!(bad_gzip.get_header_str("Content-Encoding"), Some("gzip"));
assert_eq!(bad_gzip.get_content_length(), Some(HELLO_WORLD.len()));

Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lovely! As we mentioned yesterday, we'll want to expand out with some uses of append operations and streaming as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

acw added 3 commits February 8, 2022 09:38
This appears to be a bit more standardized, and provides a nicer way to
carve off early chunks, as well. Based on testing, it's possible that
this will consistently generate an empty chunk at the end of the
decompression run, but that should be managed correctly by other parts
of the system.
This is relatively straightforward, but more thought should go into the
error case in which someone marks data as gzip-compressed but it isn't.
The system remains whole after this, but the observable behavior is odd
in that it gets marked as StatusCode::OK but taking the body leads to
uncatchable failures (I think).
…`hyper::Body`.

A bit of an oops on the previous implementations. This requires slightly
more careful handling of lifetimes to avoid copying. It also only puts
the current subchunk back into the queue in the non-error cases, as
opposed to previous versions which would put error cases back up to get
more errors.
@acw acw marked this pull request as draft February 8, 2022 18:06
acw added 4 commits February 8, 2022 10:52
…nsion data.

Other folks should be able to just pile on to this structure to add
additional information in the future, without much pain, and without
having to add a new type for every item. There's a benign clippy warning
in `req_impl.rs` that'll be nicely resolved when someone does so, too.
This test shouldn't actually be very interesting, and wasn't. Yay! But,
in the future, if `send`, `send_async`, and `send_async_streaming` stop
going through the same output code path and someone forgets to move the
decompression code with it, we should be able to detect that.
Just in case there was a funny code path in there, or one gets
introduced, this makes sure that you can properly read your compressed
data from the client side, as well.
@acw acw changed the title WIP: Implement automatic decompression of responses Implement automatic decompression of responses Feb 8, 2022
@acw acw marked this pull request as ready for review February 9, 2022 01:22
Copy link
Contributor

@aturon aturon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good! Just a couple minor nits!

Comment on lines 196 to 197
&& basic_response.headers().get(header::CONTENT_ENCODING)
== Some(&HeaderValue::from_static("gzip"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also allow for x-gzip per https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#gzip.coding (and the C@E implementation).

Comment on lines 476 to 478
// NOTE: We're going to hide this flag in the headers of the request in order to decrease
// the book-keeping burden inside Session; there already existed a cleaning pass in the
// output chain, which we extend for this use case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is stale.

Copy link
Contributor

@aturon aturon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work!

@acw acw merged commit 9d4fb8a into main Feb 9, 2022
@acw acw deleted the awick/124-automatic-gzip branch February 9, 2022 23:37
@acfoltzer acfoltzer mentioned this pull request Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement automatic gzip decompression
2 participants