Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lambda-http): accept http_body::Body in responses #491

Merged
merged 12 commits into from
Jul 13, 2022

Conversation

hugobast
Copy link
Contributor

Issue #, if available:
This PR is meant to continue the work started here: #466

Description of changes:

Original description:
Allows using any http_body::Body implementation in the response for lambda_http.


For context, the outstanding issue on the original PR stems from this comment: #466 (comment)

I picked up this PR and worked on it to the point where I believe that I had a viable testing approach. Ultimately I am blocked from performing the testing that has been requested in the original PR.

Here's why

In order to configure contentHandling we have to put the integration in the aws (ie. not aws-proxy) configuration.

The request is treated as a LambdaRequest::WebSocket by the lambda_http crate, why this is and why the rust code panics when operated in this way, I don't know.

On an aws-proxy configuration, binary formats are dealt with by encoding the body of the response as base64 and setting isBase64Encoded: true (documentation: Working with binary media types for REST APIs - Amazon API Gateway)

I managed to configure an API Gateway + Lambdas to be tested with CONVERT_TO_BINARY and CONVERT_TO_TEXT and it doesn't appear to work. What does work is setting the integration as aws-proxy but then it operates completely differently.

If we test through API gateway we only have two type requests, one where the function returns a Body::Binary and one where it returns a Body::Text (maybe a 3rd one for Body::Empty).

I'm proposing a simpler verification where we rely on the current unit tests (I'm adding impl IntoResponse for binaries + tests) and pasting the Lambda console results (below) that I got while debugging.

Happy to dig in further if I'm wrong and misunderstanding something.


Here's the console output for both a lambda that returns text and one that returns bytes

Bytes

{
  "statusCode": 200,
  "headers": {},
  "multiValueHeaders": {},
  "body": "YmluYXJ5IGRhdGE=",
  "isBase64Encoded": true
}

Text

{
  "statusCode": 200,
  "headers": {},
  "multiValueHeaders": {},
  "body": "text data",
  "isBase64Encoded": false
}

Finally, because I was most of the way there I want to share the commit I removed from this PR where I setup an API gateway (non-proxy) and the makefile changes where I would have added the verifications. hugobast@83fa77d

By submitting this pull request


  • I confirm that my contribution is made under the terms of the Apache 2.0 license.
  • I confirm that I've made a best effort attempt to update all relevant documentation.

@calavera
Copy link
Contributor

Thanks a lot for looking into this @hugobast ! I'm going to take a look at that extra commit to see if I find a way to make the verification work automatically, but the changes look good at first glance. I will run some tests this week.

@david-perez, @bnusunny can you take a look at this PR too?

Co-authored-by: David Calavera <david.calavera@gmail.com>
// Can safely unwrap here as we do type validation in the 'if' statement
Box::pin(ready(*any_self.downcast::<Body>().unwrap()))
} else {
Box::pin(async move { Body::from(to_bytes(self).await.expect("unable to read bytes from body").to_vec()) })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've experimented with this patch and the more I think about this issue, the more I think that it's impossible for aws-lambda-rust-runtime to provide a response body conversion that will work for all cases.

The issue I have with this line in particular is that it's effectively saying "if the user's tower::Service has a Response whose Body is not of type lambda_http::Body, then convert the response body to_bytes, then to Vec<u8>, and use lambda_http::Body::from to yield a response whose body is of the lambda_http::Body::Binary variant".

This is problematic, because lambda_http::Body::Binary is signalling to API Gateway that the response contains binary data, and API Gateway does all kinds of fun stuff before relaying the reponse to the client depending on several factors, including whether we signalled our payload contains binary data or not. For example, it might base64 encode the body before returning the response.

I deployed some Lambdas and was able to reproduce clearly undersirable behavior when the service returns a response body with Data = Bytes that contains a UTF-8 encoded string. Now, one might argue that the user should then use Data = String, or have their Response type be String directly (which would leverage the impl IntoResponse for String in response.rs, that correctly uses lambda_http::Body::Text). The problem is the user might not know or be in control of what body type they're returning because they are relying on a library. For example, Axum uses axum_core::body::BoxBody as the unified response body type of all the routes, regardless of what data type your handler returns. BoxBody is a type-erased boxed body type that holds Bytes.

Here is an example repository that reproduces the problem: https://github.com/david-perez/lambda-pr-491. It has instructions to deploy a Rust Lambda function fronted by both a HTTP API Gateway and a REST API Gateway using CDK.

The Lambda function uses Axum and contains a route that returns JSON:

#[tokio::main]
async fn main() -> Result<(), Error> {
    let lambda_handler = Router::new().route("/prod/hello", get(hello));

    lambda_http::run(lambda_handler)
        .await
        .expect("something went wrong in the Lambda runtime");

    Ok(())
}

async fn hello() -> Json<Value> {
    Json(json!({ "message": "hello stranger!"}))
}

As explained above, this setup makes aws-lambda-rust-runtime return a lambda_http::Body::Binary body to the internal AWS Lambda service, signalling a binary payload.

When the response gets sent to the client, it seems like the behavior of API Gateway is dfferent depending on whether the API type is HTTP or REST.

$ curl https://<http-api-id>.execute-api.eu-west-1.amazonaws.com/prod/hello                             
{"message":"hello stranger!"}%
$ curl https://<rest-api-id>.execute-api.eu-west-1.amazonaws.com/prod/hello
eyJtZXNzYWdlIjoiaGVsbG8gc3RyYW5nZXIhIn0=%

The HTTP API works as we would expect, but the the REST API has "helpfully" base64 encoded the JSON text. In both cases the response has the Content-Type: application/json header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll continue with the information you've provided, I wasn't able to get to the core of the issue and was focusing too much on the issues I had with coming up with a test setup for the response conversion table, what you provided is straightforward and really helpful. Thank you, David!

Copy link
Contributor Author

@hugobast hugobast Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-perez After a few hours of playing with this, the problem as I see it is that axum::body::BoxBody is a downcast (is that the correct term?) of http_body::Body and I don't see how we can get anything more out of it than the bytes it's holding. In other words, is it correct to say that it is essentially "just bytes" at this point and we have no choice but the Body::Binary variant?

That said I had an idea maybe this is what we are looking for

Suggested change
Box::pin(async move { Body::from(to_bytes(self).await.expect("unable to read bytes from body").to_vec()) })
Box::pin(async move {
let bytes = to_bytes(self).await.expect("unable to read bytes from body");
match from_utf8(&bytes) {
Ok(s) => Body::from(s),
Err(_) => Body::from(bytes.to_vec()),
}
})

To summarize the code we are commenting on

  1. lambda_http::Body pass through as they are
  2. Other bytes::Bytes types (of which BoxBody is)
    a. utf8 from bytes -> Body::Text
    b. if not -> Body::Binary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hugobast That's a neat idea. Unfortunately, the fact that the payload is valid UTF-8 does not imply that its contents are text. A priori the user could be sending a PNG that is valid UTF-8 text, for example.

Looking at the Content-Type header of the response is another approach. But again, it's not an infallible solution. The user might not have set the header. It's also easy to map application/json or text/plain to Body::Text, but we can't possibly handle all standard content types, let alone user-defined custom ones.

There are no "one size fits all" solutions unfortunately. All in all, this is why I think it's impossible to land this feature. We can't guess the user's intention. I think any kind of solution will have to ultimately explicitly involve the user communicating his intention to the runtime. Inserting an http::Extensions in the response that the runtime can then read, for example. That will severely limit the user though.

Copy link
Contributor

@calavera calavera Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think any kind of solution will have to ultimately explicitly involve the user communicating his intention to the runtime

I don't think this is necessarily bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnusunny I think your point needs to be documented in the runtime documentation, but we should not make users depend on specific configurations in another service. binaryMediaTypes are for example not available when you call a function with invoke. People that use that should still be able to use http responses as expected.

Copy link
Contributor

@bnusunny bnusunny Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the rules, I would suggest two updates:

2.i.a If it is equal to text/*, application/json, application/javascript or application/xml, leave the body as is and set isBase64Encoded to false.

content-type may have suffixes, such as 'application/json; charset=UTF-8'. So it is better to use:

2.i.a If the content-type begins with text/*, application/json, application/javascript or application/xml, leave the body as is and set isBase64Encoded to false.

2.ii If the content-type header is not present, leave the body as is and set isBase64Encoded to false.

This is problematic. If a bad client sent a binary http response without the content-type header, we would treate it as text and Lambda service would throw an error. We should default to binary here. ALB also uses binary as default.

2.ii If the content-type header is not present, base64 encode the body and set isBase64Encoded to true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnusunny I think your point needs to be documented in the runtime documentation, but we should not make users depend on specific configurations in another service. binaryMediaTypes are for example not available when you call a function with invoke. People that use that should still be able to use http responses as expected.

Yes. We definitely need to document this configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.ii If the content-type header is not present, leave the body as is and set isBase64Encoded to false.

This is problematic. If a bad client sent a binary http response without the content-type header, we would treate it as text and Lambda service would throw an error.

However, if they send a text HTTP response without the content-type header, the client sends e.g. Accept: text/plain and the user has not configured contentHandling (which they shouldn't need to, since they are returning text payloads) we would treat it as binary and return a base64-encoded response. That is, the client would receive eyJtZXNzYWdlIjoiaGVsbG8gc3RyYW5nZXIhIn0= instead of {"message":"hello stranger!"}.

I think APIs that return text are far more common than those that return binary payloads and as such we should default to text in this case where we don't have any information. I also prefer the Lambda service throwing a loud error than the client being puzzled at why they received base64-encoded text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Let's default to text.

By the way. lambda_http crate only supports Lambda Proxy integration with API GW. contentHandling doesn't apply to Lambda Proxy integration. APIGW Developer Guide has the details.

B: HttpBody + Unpin + 'static,
B::Error: fmt::Debug,
{
// assumes utf-8
Copy link
Contributor Author

@hugobast hugobast Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed changes that should be close, at least in spirit, to what was talked about in the last few days in the other thread, I'm not that comfortable with Rust so please make suggestions if the code here is suboptimal.

I have concerns about encoding @david-perez @bnusunny @calavera. I'm guessing that most things will be utf-8 (JSON?) but that's far from saying that everything is utf-8.

@bnusunny commented with an example code that's using reqwest so I looked at how the text fn is implemented on reqwest::Response. We could use the same approach here and parse the charset from the content-type if we'd like I'll defer to your comments/opinions on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @hugobast ! If there is a path to check the charset, it might be worth doing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's optional but here's a commit where encode based on the charset of the content-type, falling back to utf-8 in error cases: 2632aad

Copy link
Contributor

@calavera calavera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good job handling these changes @hugobast, thanks a lot for your help!

@calavera calavera requested review from bnusunny and david-perez July 7, 2022 14:53
Copy link
Contributor

@calavera calavera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran some tests, including the example in #438, and everything looks to be working as expected 👍

@hugobast
Copy link
Contributor Author

@calavera @bnusunny @david-perez is anything more needed here to get this merged?

@calavera
Copy link
Contributor

calavera commented Jul 11, 2022

I know one of @bnusunny's concerns is whether we need to turn the response into async or not. It's a big non-backwards compatible change. When Nicolas implemented this originally, he was thinking about a future where Lambda could stream responses, but that might be better to put it into a separated PR when/if that happens.

Here's the original conversation:

#466 (comment)

@hugobast
Copy link
Contributor Author

hugobast commented Jul 11, 2022

When Nicolas implemented this originally, he was thinking about a future where Lambda could stream responses, but that might be better to put it into a separated PR when/if that happens.

Right, forgot about this, I can push up a version that removes the async response, I agree that tackling this as a separate PR is gonna be clearer.

Ah, let me back out of this answer it feels more tied up, considering that we use to_bytes here on the binary payload return: https://github.com/awslabs/aws-lambda-rust-runtime/pull/491/files#diff-75fd9bca51c33e5b1d241dff4d920d88255dd4ad11b6bf492064c3f505f5a238R208 to me means we are locked into having the response as a future.

And also here: https://github.com/awslabs/aws-lambda-rust-runtime/pull/491/files#diff-75fd9bca51c33e5b1d241dff4d920d88255dd4ad11b6bf492064c3f505f5a238R228

@hugobast
Copy link
Contributor Author

In my hasty response I'm afraid I won't get the point accross so I'll summarise here in a new comment.

What I mean by this previous reply is reading a body is an asynchronous operation and that's why it looks like Nicolas made the response into a Future.

@bnusunny
Copy link
Contributor

@hugobast You are right. It is an asynchronous operation. Thanks very much for getting this PR done.

@nmoutschen
Copy link
Contributor

In my hasty response I'm afraid I won't get the point accross so I'll summarise here in a new comment.

What I mean by this previous reply is reading a body is an asynchronous operation and that's why it looks like Nicolas made the response into a Future.

Just adding this here: this is correct! 😄 Body is designed to return streaming data by nature. So someone could write a Lambda implementation with Body that calls a third party service and the runtime would receive a stream. Thus to_bytes ensure we collect the entire body before sending it back, which is needed by the runtime API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants