Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement httpLabel with nom #996

Merged
merged 2 commits into from
Jan 5, 2022
Merged

Implement httpLabel with nom #996

merged 2 commits into from
Jan 5, 2022

Conversation

82marbag
Copy link
Contributor

Move to nom to implement httpLabel instead of using regexes.

Issue: #938

Signed-off-by: Daniele Ahmed ahmeddan@amazon.com

Motivation and Context

Description

See issue: #938

Testing

I run: gradlew :codegen-server-test:assemble and checked the generated output.
Example output:

#[allow(clippy::unnecessary_wraps)]
pub async fn parse_constant_query_string_request<B>(
    #[allow(unused_variables)] request: &mut axum_core::extract::RequestParts<B>,
) -> std::result::Result<
    crate::input::ConstantQueryStringInput,
    aws_smithy_http_server::rejection::SmithyRejection,
>
where
    B: aws_smithy_http_server::HttpBody + Send,
    B::Data: Send,
    B::Error: Into<aws_smithy_http_server::BoxError>,
    aws_smithy_http_server::rejection::SmithyRejection:
        From<<B as aws_smithy_http_server::HttpBody>::Error>,
{
    Ok({
        #[allow(unused_mut)]
        let mut input = crate::input::constant_query_string_input::Builder::default();
        let input_string = request.uri().path();
        let (input_string, (_tag, _m)) = nom::sequence::tuple::<&str, _, (_, _), _>((
            nom::bytes::complete::tag("/"),
            nom::bytes::complete::tag("ConstantQueryString"),
        ))(input_string)
        .unwrap();
        let (_input_string, (_tag, m)) = nom::sequence::tuple::<&str, _, (_, _), _>((
            nom::bytes::complete::tag("/"),
            nom::branch::alt((
                nom::bytes::complete::take_until1("/"),
                nom::combinator::rest,
            )),
        ))(input_string)
        .unwrap();
        input = input
            .set_hello(crate::operation_deser::parse_str_constant_query_string_input_hello(m)?);
        input.build()?
    })
}

Checklist

  • I have updated CHANGELOG.next.toml if I made changes to the smithy-rs codegen or runtime crates
  • I have updated CHANGELOG.next.toml if I made changes to the AWS SDK, generated SDK code, or SDK runtime crates

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@82marbag 82marbag requested review from a team as code owners December 20, 2021 16:08
@crisidev
Copy link
Contributor

Wow, nice, thanks a lot for the contribution!

@crisidev
Copy link
Contributor

If you haven't, you probably want to install pre-commit.ci as written in the readme and add a new commit. it will run a bunch of validations for you.

@crisidev crisidev added enhancement New feature or request server Rust server SDK labels Dec 20, 2021
@crisidev crisidev linked an issue Dec 20, 2021 that may be closed by this pull request
@crisidev
Copy link
Contributor

The fact 2 of the CI steps are failing because the PR comes from a fork, so don't worry about it.

Copy link
Contributor

@david-perez david-perez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening the PR, it looks promising.

I've never used nom, but my idea from reading the docs was that we would create a single parser that, given the URI path, will extract all the bound @httpLabel fields in one go with a single call. This PR is creating one parser per binding though, and feeding it and consuming the input little by little. We should be able to combine all these small parsers into one that represents the expected URI path for the Smithy operation.

val binding = bindings[0]
val deserializer = generateParsePercentEncodedStrFn(binding)
rustTemplate(
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd take this opportunity to improve our naming and use binding.memberName as the variable binding name instead of m.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memberName is in camelCase, but Rust prefers (warns if not in) snake_case. If it's ok, I'd leave it like this here for simplicity. If you think it's best, I can add a function to convert to snake case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want symbolProvder.toMemberName

Note that you can't use memberName directly in code because it may need to be escaped

if (it.isGreedyLabel) {
pattern.append(".+")
} else {
pattern.append("[^/]+")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were we accepting empty path segments? I see the TypeScript sSDK is also stripping away empty path segments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing my thoughts down here. I guess done this way because anecdotally servers accept empty path segments e.g. https://github.com//awslabs///////smithy-typescript///commits works fine.

But when extracting labels, I'm not sure about whether this behavior should be the correct one, and I don't see it specified in the Smithy spec. I'm thinking of possible ambiguous scenarios e.g. say the URI pattern is /foo/{label}/bar and the server receives a request with URI path /foo//bar. Should the label field be bound to an empty string? Or should we strip away the empty segment and reject the request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion on this and interpretation of the Smithy spec is that we were indeed doing it wrong and empty path segments should not be stripped away, but I've opened an issue with the Smithy team to clarify the spec: smithy-lang/smithy#1024

@82marbag
Copy link
Contributor Author

82marbag commented Dec 21, 2021

Thanks for opening the PR, it looks promising.

I've never used nom, but my idea from reading the docs was that we would create a single parser that, given the URI path, will extract all the bound @httpLabel fields in one go with a single call. This PR is creating one parser per binding though, and feeding it and consuming the input little by little. We should be able to combine all these small parsers into one that represents the expected URI path for the Smithy operation.

I see. For what I know, nom isn't a good choice here; tuples for parsing in one go are restricted to 21 items. The alternative could be to chain maps and setting them, but it feels like multiple calls anyway. I can't find a better way with nom for this.

@82marbag
Copy link
Contributor Author

82marbag commented Dec 23, 2021

I performed some benchmarks manually and with criterion:

  • nom parsing one piece of input at a time: 2.1us
  • nom parsing in a tuple until a greedy label is found: 1.8us
  • regex (current approach): 410us

These are over between 20k and 2M iterations.

When a greedy label is found, both tests need to manually parse the string because nom does not support finding the last occurrence of a string (if the greedy label appears before another string and it's not at the end).

I believe these tests help decide whether we should move to nom. I'm pushing a new revision; if unwrap() is not the right approach, please tell me where I can leave the implementation for the error and I will go for it

Copy link
Contributor

@david-perez david-perez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to try out what I suggested about building a single big parser for the entire URI path and calling it once instead of having one parser per segment?

As of now because of how you implemented the greedy label case with rfind I don't think it'd be easy/possible to build one big parser, but a simpler implementation is perhaps as follows:

If the URI pattern has a greedy label, the first thing we do at the very beginning is check for the (possibly empty) suffix using str::ends_with, if the suffix is not there we can error out early without doing any parsing. If the suffix is there, we can strip it away and work with the string slice input_string[..input_string.len() - s] thereafter, where s is the length of the suffix, which we know at codegen time. It then becomes easy to build a big parser, since the greedy label can be extracted with ::combinator::rest.

} else {
rustTemplate(
"""
let (input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let (input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap();
let (${inputStringPrefix}input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap();

I think clippy is failing in CI because of this.

@david-perez
Copy link
Contributor

Thanks for the benchmarking! Would you mind linking a pastebin to this PR for historical purposes? Thanks!

@82marbag
Copy link
Contributor Author

82marbag commented Jan 3, 2022

Thanks for the benchmarking! Would you mind linking a pastebin to this PR for historical purposes? Thanks!

This is the benchmark test: https://pastebin.com/raw/v9r3qwqS

I've changed the code, the only two things remaining to see are:

  • a max of 21 labels / strings can be parsed now (agreed it is fine here)
  • with ends_with, some / at the end won't be fine

Example output:

#[allow(clippy::unnecessary_wraps)]
pub async fn parse_http_request_with_greedy_label_in_path_request<B>(
    #[allow(unused_variables)] request: &mut axum_core::extract::RequestParts<B>,
) -> std::result::Result<
    crate::input::HttpRequestWithGreedyLabelInPathInput,
    aws_smithy_http_server::rejection::SmithyRejection,
>
where
    B: aws_smithy_http_server::HttpBody + Send,
    B::Data: Send,
    B::Error: Into<aws_smithy_http_server::BoxError>,
    aws_smithy_http_server::rejection::SmithyRejection:
        From<<B as aws_smithy_http_server::HttpBody>::Error>,
{
    Ok({
        #[allow(unused_mut)]
        let mut input =
            crate::input::http_request_with_greedy_label_in_path_input::Builder::default();
        let input_string = request.uri().path();
        if !input_string.ends_with("/") {
            return std::result::Result::Err(
                aws_smithy_http_server::rejection::SmithyRejection::MissingQueryString(
                    aws_smithy_http_server::rejection::MissingQueryString,
                ),
            );
        }
        let input_string = input_string[..(input_string.len() - "/".len())].into();
        let (_input_string, (_, _, m2, _, m4)) =
            nom::sequence::tuple::<_, _, nom::error::Error<&str>, _>((
                nom::sequence::preceded(
                    nom::bytes::complete::tag("/"),
                    nom::bytes::complete::tag("HttpRequestWithGreedyLabelInPath"),
                ),
                nom::sequence::preceded(
                    nom::bytes::complete::tag("/"),
                    nom::bytes::complete::tag("foo"),
                ),
                nom::sequence::preceded(
                    nom::bytes::complete::tag("/"),
                    nom::branch::alt((
                        nom::bytes::complete::take_until1("/"),
                        nom::combinator::rest,
                    )),
                ),
                nom::sequence::preceded(
                    nom::bytes::complete::tag("/"),
                    nom::bytes::complete::tag("baz"),
                ),
                nom::sequence::preceded(nom::bytes::complete::tag("/"), nom::combinator::rest),
            ))(input_string)?;
        input = input.set_foo(
            crate::operation_deser::parse_str_http_request_with_greedy_label_in_path_input_foo(m2)?,
        );
        input = input.set_baz(
            crate::operation_deser::parse_str_http_request_with_greedy_label_in_path_input_baz(m4)?,
        );
        input.build()?
    })
}

Copy link
Contributor

@david-perez david-perez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better; only minor things left now.

with ends_with, some / at the end won't be fine

This should be correct, trailing slashes at the end do have meaning. So if the URI pattern ends with / requests must end with /; if it doesn't, requests must not end with /. This might seem very strict but if the URI pattern is /foo/{label} then the request /foo/ is binding "" to label. Same goes for empty path segments in the middle of the URI, which we were previously accepting in the regex approach, but with this PR we will be strict and interpret them. See this issue smithy-lang/smithy#1024.

a max of 21 labels / strings can be parsed now (agreed it is fine here)

This is fine IMO, I don't foresee users will need so many labels. Leave a comment somewhere in the source code documenting this limitation, lest we forget or decide to tackle it in the future.

""
val labeledNames = segments
.mapIndexed { index, segment ->
if (segment.isLabel) { "m$index" } else { "_" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider safeName() from RustWriter.kt for this. Although it would require to rewrite the block of segments.forEachIndexed at the end, since the code relies on these indices. An idea could be to zip labeledNames with the corresponding path bindings, but I don't know if it'd end up any cleaner, so feel free to ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safeName(prefix="var") basically returns var_$i with i always increasing. It'd still have to leave _ at the beginning for those labels/constants we don't use and keep the two problems on the naming (like camel case) and on the possible characters in the variable names if we decide to use any other prefix. It seems to me like it won't make it necessarily simpler, but I can add that too.

segments
.forEachIndexed { index, segment ->
val binding = pathBindings.find { it.memberName == segment.content }
if (binding != null && segment.isLabel) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If segment.isLabel is true, then binding must be non-null, right? Because pathBindings contains all the @httpLabel bindings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, such as /foo/{foo}, foo is found as binding too because I look for the name

rust-runtime/aws-smithy-http-server/src/rejection.rs Outdated Show resolved Hide resolved
Move to `nom` to implement httpLabel instead of using regexes.

Issue: smithy-lang#938

Signed-off-by: Daniele Ahmed <ahmeddan@amazon.com>
@82marbag 82marbag force-pushed the main branch 2 times, most recently from 1120c83 to 5bec90c Compare January 4, 2022 14:09
david-perez added a commit that referenced this pull request Jan 4, 2022
In #996 [0], we realized that we were ignoring empty URI path segments
when doing label extraction. It turns out we are also ignoring them when
doing routing, as the TypeScript sSDK currently does [1], from which the
initial implementation was copied.

However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2]
revealed that we _must not_ ignore empty URI path segments when routing
or doing label extraction, since empty strings (`""`) should be assigned
to the labels in those segments.

This commit fixes the behavior so that we don't ignore empty path
segments _when doing routing_. #996 will take care of fixing the behavior
when doing label extraction.

[0]: #996 (comment)
[1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78
[2]: smithy-lang/smithy#1024
@82marbag
Copy link
Contributor Author

82marbag commented Jan 5, 2022

I saw the linked PR #1029 and I have changed the code here accordingly. Empty labels are fine now, such as /{foo}/str for //str

Copy link
Contributor

@david-perez david-perez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@david-perez david-perez merged commit e935fbc into smithy-lang:main Jan 5, 2022
david-perez added a commit that referenced this pull request Jan 12, 2022
…ng (#1029)

In #996 [0], we realized that we were ignoring empty URI path segments
when doing label extraction. It turns out we are also ignoring them when
doing routing, as the TypeScript sSDK currently does [1], from which the
initial implementation was copied.

However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2]
revealed that we _must not_ ignore empty URI path segments when routing
or doing label extraction, since empty strings (`""`) should be assigned
to the labels in those segments.

This commit fixes the behavior so that we don't ignore empty path
segments _when doing routing_. #996 will take care of fixing the behavior
when doing label extraction.

[0]: #996 (comment)
[1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78
[2]: smithy-lang/smithy#1024
aws-sdk-rust-ci pushed a commit to awslabs/aws-sdk-rust that referenced this pull request Jan 12, 2022
… when routing (#1029)

In #996 [0], we realized that we were ignoring empty URI path segments
when doing label extraction. It turns out we are also ignoring them when
doing routing, as the TypeScript sSDK currently does [1], from which the
initial implementation was copied.

However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2]
revealed that we _must not_ ignore empty URI path segments when routing
or doing label extraction, since empty strings (`""`) should be assigned
to the labels in those segments.

This commit fixes the behavior so that we don't ignore empty path
segments _when doing routing_. #996 will take care of fixing the behavior
when doing label extraction.

[0]: smithy-lang/smithy-rs#996 (comment)
[1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78
[2]: smithy-lang/smithy#1024
Velfi pushed a commit to awslabs/aws-sdk-rust that referenced this pull request Jan 19, 2022
… when routing (#1029)

In #996 [0], we realized that we were ignoring empty URI path segments
when doing label extraction. It turns out we are also ignoring them when
doing routing, as the TypeScript sSDK currently does [1], from which the
initial implementation was copied.

However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2]
revealed that we _must not_ ignore empty URI path segments when routing
or doing label extraction, since empty strings (`""`) should be assigned
to the labels in those segments.

This commit fixes the behavior so that we don't ignore empty path
segments _when doing routing_. #996 will take care of fixing the behavior
when doing label extraction.

[0]: smithy-lang/smithy-rs#996 (comment)
[1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78
[2]: smithy-lang/smithy#1024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request server Rust server SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Server] Reimplement httpLabel trait deserializer without regexes
4 participants