Implement httpLabel with nom #996

82marbag · 2021-12-20T16:08:08Z

Move to nom to implement httpLabel instead of using regexes.

Issue: #938

Signed-off-by: Daniele Ahmed ahmeddan@amazon.com

Motivation and Context

Description

See issue: #938

Testing

I run: gradlew :codegen-server-test:assemble and checked the generated output.
Example output:

#[allow(clippy::unnecessary_wraps)]
pub async fn parse_constant_query_string_request<B>(
    #[allow(unused_variables)] request: &mut axum_core::extract::RequestParts<B>,
) -> std::result::Result<
    crate::input::ConstantQueryStringInput,
    aws_smithy_http_server::rejection::SmithyRejection,
>
where
    B: aws_smithy_http_server::HttpBody + Send,
    B::Data: Send,
    B::Error: Into<aws_smithy_http_server::BoxError>,
    aws_smithy_http_server::rejection::SmithyRejection:
        From<<B as aws_smithy_http_server::HttpBody>::Error>,
{
    Ok({
        #[allow(unused_mut)]
        let mut input = crate::input::constant_query_string_input::Builder::default();
        let input_string = request.uri().path();
        let (input_string, (_tag, _m)) = nom::sequence::tuple::<&str, _, (_, _), _>((
            nom::bytes::complete::tag("/"),
            nom::bytes::complete::tag("ConstantQueryString"),
        ))(input_string)
        .unwrap();
        let (_input_string, (_tag, m)) = nom::sequence::tuple::<&str, _, (_, _), _>((
            nom::bytes::complete::tag("/"),
            nom::branch::alt((
                nom::bytes::complete::take_until1("/"),
                nom::combinator::rest,
            )),
        ))(input_string)
        .unwrap();
        input = input
            .set_hello(crate::operation_deser::parse_str_constant_query_string_input_hello(m)?);
        input.build()?
    })
}

Checklist

I have updated CHANGELOG.next.toml if I made changes to the smithy-rs codegen or runtime crates
I have updated CHANGELOG.next.toml if I made changes to the AWS SDK, generated SDK code, or SDK runtime crates

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codegen/src/main/kotlin/software/amazon/smithy/rust/codegen/rustlang/CargoDependency.kt

crisidev · 2021-12-20T16:15:58Z

Wow, nice, thanks a lot for the contribution!

crisidev · 2021-12-20T16:16:53Z

If you haven't, you probably want to install pre-commit.ci as written in the readme and add a new commit. it will run a bunch of validations for you.

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

crisidev · 2021-12-20T16:30:48Z

The fact 2 of the CI steps are failing because the PR comes from a fork, so don't worry about it.

david-perez

Thanks for opening the PR, it looks promising.

I've never used nom, but my idea from reading the docs was that we would create a single parser that, given the URI path, will extract all the bound @httpLabel fields in one go with a single call. This PR is creating one parser per binding though, and feeding it and consuming the input little by little. We should be able to combine all these small parsers into one that represents the expected URI path for the Smithy operation.

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

david-perez · 2021-12-20T19:26:51Z

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

+                    val binding = bindings[0]
+                    val deserializer = generateParsePercentEncodedStrFn(binding)
+                    rustTemplate(
+                        """


I'd take this opportunity to improve our naming and use binding.memberName as the variable binding name instead of m.

memberName is in camelCase, but Rust prefers (warns if not in) snake_case. If it's ok, I'd leave it like this here for simplicity. If you think it's best, I can add a function to convert to snake case

You may want symbolProvder.toMemberName

Note that you can't use memberName directly in code because it may need to be escaped

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

david-perez · 2021-12-20T19:44:13Z

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

-                if (it.isGreedyLabel) {
-                    pattern.append(".+")
-                } else {
-                    pattern.append("[^/]+")


Why were we accepting empty path segments? I see the TypeScript sSDK is also stripping away empty path segments.

Writing my thoughts down here. I guess done this way because anecdotally servers accept empty path segments e.g. https://github.com//awslabs///////smithy-typescript///commits works fine.

But when extracting labels, I'm not sure about whether this behavior should be the correct one, and I don't see it specified in the Smithy spec. I'm thinking of possible ambiguous scenarios e.g. say the URI pattern is /foo/{label}/bar and the server receives a request with URI path /foo//bar. Should the label field be bound to an empty string? Or should we strip away the empty segment and reject the request?

My opinion on this and interpretation of the Smithy spec is that we were indeed doing it wrong and empty path segments should not be stripped away, but I've opened an issue with the Smithy team to clarify the spec: smithy-lang/smithy#1024

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

82marbag · 2021-12-21T21:44:08Z

Thanks for opening the PR, it looks promising.

I've never used nom, but my idea from reading the docs was that we would create a single parser that, given the URI path, will extract all the bound @httpLabel fields in one go with a single call. This PR is creating one parser per binding though, and feeding it and consuming the input little by little. We should be able to combine all these small parsers into one that represents the expected URI path for the Smithy operation.

I see. For what I know, nom isn't a good choice here; tuples for parsing in one go are restricted to 21 items. The alternative could be to chain maps and setting them, but it feels like multiple calls anyway. I can't find a better way with nom for this.

82marbag · 2021-12-23T08:48:54Z

I performed some benchmarks manually and with criterion:

nom parsing one piece of input at a time: 2.1us
nom parsing in a tuple until a greedy label is found: 1.8us
regex (current approach): 410us

These are over between 20k and 2M iterations.

When a greedy label is found, both tests need to manually parse the string because nom does not support finding the last occurrence of a string (if the greedy label appears before another string and it's not at the end).

I believe these tests help decide whether we should move to nom. I'm pushing a new revision; if unwrap() is not the right approach, please tell me where I can leave the implementation for the error and I will go for it

david-perez

Are you going to try out what I suggested about building a single big parser for the entire URI path and calling it once instead of having one parser per segment?

As of now because of how you implemented the greedy label case with rfind I don't think it'd be easy/possible to build one big parser, but a simpler implementation is perhaps as follows:

If the URI pattern has a greedy label, the first thing we do at the very beginning is check for the (possibly empty) suffix using str::ends_with, if the suffix is not there we can error out early without doing any parsing. If the suffix is there, we can strip it away and work with the string slice input_string[..input_string.len() - s] thereafter, where s is the length of the suffix, which we know at codegen time. It then becomes easy to build a big parser, since the greedy label can be extracted with ::combinator::rest.

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

david-perez · 2021-12-23T18:47:59Z

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

+                } else {
+                    rustTemplate(
+                        """
+                        let (input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap();


Suggested change

let (input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap();

let (${inputStringPrefix}input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap();

I think clippy is failing in CI because of this.

david-perez · 2021-12-23T18:58:03Z

Thanks for the benchmarking! Would you mind linking a pastebin to this PR for historical purposes? Thanks!

82marbag · 2022-01-03T14:40:06Z

Thanks for the benchmarking! Would you mind linking a pastebin to this PR for historical purposes? Thanks!

This is the benchmark test: https://pastebin.com/raw/v9r3qwqS

I've changed the code, the only two things remaining to see are:

a max of 21 labels / strings can be parsed now (agreed it is fine here)
with ends_with, some / at the end won't be fine

Example output:

#[allow(clippy::unnecessary_wraps)]
pub async fn parse_http_request_with_greedy_label_in_path_request<B>(
    #[allow(unused_variables)] request: &mut axum_core::extract::RequestParts<B>,
) -> std::result::Result<
    crate::input::HttpRequestWithGreedyLabelInPathInput,
    aws_smithy_http_server::rejection::SmithyRejection,
>
where
    B: aws_smithy_http_server::HttpBody + Send,
    B::Data: Send,
    B::Error: Into<aws_smithy_http_server::BoxError>,
    aws_smithy_http_server::rejection::SmithyRejection:
        From<<B as aws_smithy_http_server::HttpBody>::Error>,
{
    Ok({
        #[allow(unused_mut)]
        let mut input =
            crate::input::http_request_with_greedy_label_in_path_input::Builder::default();
        let input_string = request.uri().path();
        if !input_string.ends_with("/") {
            return std::result::Result::Err(
                aws_smithy_http_server::rejection::SmithyRejection::MissingQueryString(
                    aws_smithy_http_server::rejection::MissingQueryString,
                ),
            );
        }
        let input_string = input_string[..(input_string.len() - "/".len())].into();
        let (_input_string, (_, _, m2, _, m4)) =
            nom::sequence::tuple::<_, _, nom::error::Error<&str>, _>((
                nom::sequence::preceded(
                    nom::bytes::complete::tag("/"),
                    nom::bytes::complete::tag("HttpRequestWithGreedyLabelInPath"),
                ),
                nom::sequence::preceded(
                    nom::bytes::complete::tag("/"),
                    nom::bytes::complete::tag("foo"),
                ),
                nom::sequence::preceded(
                    nom::bytes::complete::tag("/"),
                    nom::branch::alt((
                        nom::bytes::complete::take_until1("/"),
                        nom::combinator::rest,
                    )),
                ),
                nom::sequence::preceded(
                    nom::bytes::complete::tag("/"),
                    nom::bytes::complete::tag("baz"),
                ),
                nom::sequence::preceded(nom::bytes::complete::tag("/"), nom::combinator::rest),
            ))(input_string)?;
        input = input.set_foo(
            crate::operation_deser::parse_str_http_request_with_greedy_label_in_path_input_foo(m2)?,
        );
        input = input.set_baz(
            crate::operation_deser::parse_str_http_request_with_greedy_label_in_path_input_baz(m4)?,
        );
        input.build()?
    })
}

david-perez

This looks much better; only minor things left now.

with ends_with, some / at the end won't be fine

This should be correct, trailing slashes at the end do have meaning. So if the URI pattern ends with / requests must end with /; if it doesn't, requests must not end with /. This might seem very strict but if the URI pattern is /foo/{label} then the request /foo/ is binding "" to label. Same goes for empty path segments in the middle of the URI, which we were previously accepting in the regex approach, but with this PR we will be strict and interpret them. See this issue smithy-lang/smithy#1024.

a max of 21 labels / strings can be parsed now (agreed it is fine here)

This is fine IMO, I don't foresee users will need so many labels. Leave a comment somewhere in the source code documenting this limitation, lest we forget or decide to tackle it in the future.

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

david-perez · 2022-01-03T21:40:45Z

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

+                    ""
+            val labeledNames = segments
+                .mapIndexed { index, segment ->
+                    if (segment.isLabel) { "m$index" } else { "_" }


Consider safeName() from RustWriter.kt for this. Although it would require to rewrite the block of segments.forEachIndexed at the end, since the code relies on these indices. An idea could be to zip labeledNames with the corresponding path bindings, but I don't know if it'd end up any cleaner, so feel free to ignore.

safeName(prefix="var") basically returns var_$i with i always increasing. It'd still have to leave _ at the beginning for those labels/constants we don't use and keep the two problems on the naming (like camel case) and on the possible characters in the variable names if we decide to use any other prefix. It seems to me like it won't make it necessarily simpler, but I can add that too.

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

david-perez · 2022-01-03T22:07:31Z

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

+            segments
+                .forEachIndexed { index, segment ->
+                    val binding = pathBindings.find { it.memberName == segment.content }
+                    if (binding != null && segment.isLabel) {


If segment.isLabel is true, then binding must be non-null, right? Because pathBindings contains all the @httpLabel bindings.

In some cases, such as /foo/{foo}, foo is found as binding too because I look for the name

rust-runtime/aws-smithy-http-server/src/rejection.rs

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt

Move to `nom` to implement httpLabel instead of using regexes. Issue: smithy-lang#938 Signed-off-by: Daniele Ahmed <ahmeddan@amazon.com>

In #996 [0], we realized that we were ignoring empty URI path segments when doing label extraction. It turns out we are also ignoring them when doing routing, as the TypeScript sSDK currently does [1], from which the initial implementation was copied. However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2] revealed that we _must not_ ignore empty URI path segments when routing or doing label extraction, since empty strings (`""`) should be assigned to the labels in those segments. This commit fixes the behavior so that we don't ignore empty path segments _when doing routing_. #996 will take care of fixing the behavior when doing label extraction. [0]: #996 (comment) [1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78 [2]: smithy-lang/smithy#1024

82marbag · 2022-01-05T09:40:39Z

I saw the linked PR #1029 and I have changed the code here accordingly. Empty labels are fine now, such as /{foo}/str for //str

david-perez

Thank you!

…ng (#1029) In #996 [0], we realized that we were ignoring empty URI path segments when doing label extraction. It turns out we are also ignoring them when doing routing, as the TypeScript sSDK currently does [1], from which the initial implementation was copied. However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2] revealed that we _must not_ ignore empty URI path segments when routing or doing label extraction, since empty strings (`""`) should be assigned to the labels in those segments. This commit fixes the behavior so that we don't ignore empty path segments _when doing routing_. #996 will take care of fixing the behavior when doing label extraction. [0]: #996 (comment) [1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78 [2]: smithy-lang/smithy#1024

… when routing (#1029) In #996 [0], we realized that we were ignoring empty URI path segments when doing label extraction. It turns out we are also ignoring them when doing routing, as the TypeScript sSDK currently does [1], from which the initial implementation was copied. However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2] revealed that we _must not_ ignore empty URI path segments when routing or doing label extraction, since empty strings (`""`) should be assigned to the labels in those segments. This commit fixes the behavior so that we don't ignore empty path segments _when doing routing_. #996 will take care of fixing the behavior when doing label extraction. [0]: smithy-lang/smithy-rs#996 (comment) [1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78 [2]: smithy-lang/smithy#1024

82marbag requested review from a team as code owners December 20, 2021 16:08

crisidev reviewed Dec 20, 2021

View reviewed changes

codegen/src/main/kotlin/software/amazon/smithy/rust/codegen/rustlang/CargoDependency.kt Outdated Show resolved Hide resolved

crisidev added enhancement New feature or request server Rust server SDK labels Dec 20, 2021

crisidev linked an issue Dec 20, 2021 that may be closed by this pull request

[Server] Reimplement httpLabel trait deserializer without regexes #938

Closed

rcoh reviewed Dec 20, 2021

View reviewed changes

...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt Outdated Show resolved Hide resolved

82marbag force-pushed the main branch from 624d717 to e6eb95a Compare December 20, 2021 18:22

david-perez reviewed Dec 20, 2021

View reviewed changes

82marbag force-pushed the main branch from e6eb95a to a430987 Compare December 23, 2021 08:56

david-perez reviewed Dec 23, 2021

View reviewed changes

guymguym mentioned this pull request Dec 28, 2021

[Server] Unreachable routes in operation_registry when only httpLabels are different #1009

Open

82marbag force-pushed the main branch from cffb556 to f14dfd0 Compare January 3, 2022 14:33

david-perez requested changes Jan 3, 2022

View reviewed changes

Implement httpLabel with nom

29cbd04

Move to `nom` to implement httpLabel instead of using regexes. Issue: smithy-lang#938 Signed-off-by: Daniele Ahmed <ahmeddan@amazon.com>

82marbag force-pushed the main branch 2 times, most recently from 1120c83 to 5bec90c Compare January 4, 2022 14:09

david-perez mentioned this pull request Jan 4, 2022

aws-smithy-http-server: don't ignore empty path segments when routing #1029

Merged

82marbag force-pushed the main branch from 5bec90c to 29cbd04 Compare January 5, 2022 09:38

Merge branch 'main' into main

bee5b11

david-perez approved these changes Jan 5, 2022

View reviewed changes

david-perez merged commit e935fbc into smithy-lang:main Jan 5, 2022

david-perez mentioned this pull request Jan 21, 2022

[Server] S3 operations require ?x-id=OPNAME which is not compatible with existing clients #1012

Open

david-perez mentioned this pull request Jul 9, 2024

Avoid regexes when routing in restJson1 and rpcv2Cbor #3748

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement httpLabel with nom #996

Implement httpLabel with nom #996

82marbag commented Dec 20, 2021

crisidev commented Dec 20, 2021

crisidev commented Dec 20, 2021

crisidev commented Dec 20, 2021

david-perez left a comment

david-perez Dec 20, 2021

82marbag Dec 21, 2021

rcoh Dec 22, 2021

david-perez Dec 20, 2021

david-perez Dec 20, 2021

david-perez Dec 22, 2021

82marbag commented Dec 21, 2021 •

edited

Loading

82marbag commented Dec 23, 2021 •

edited

Loading

david-perez left a comment

david-perez Dec 23, 2021

david-perez commented Dec 23, 2021

82marbag commented Jan 3, 2022 •

edited

Loading

david-perez left a comment

david-perez Jan 3, 2022

82marbag Jan 4, 2022

david-perez Jan 3, 2022

82marbag Jan 4, 2022

82marbag commented Jan 5, 2022

david-perez left a comment

	let (input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap();
	let (${inputStringPrefix}input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap();

Implement httpLabel with nom #996

Implement httpLabel with nom #996

Conversation

82marbag commented Dec 20, 2021

Motivation and Context

Description

Testing

Checklist

crisidev commented Dec 20, 2021

crisidev commented Dec 20, 2021

crisidev commented Dec 20, 2021

david-perez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

82marbag commented Dec 21, 2021 • edited Loading

82marbag commented Dec 23, 2021 • edited Loading

david-perez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

david-perez commented Dec 23, 2021

82marbag commented Jan 3, 2022 • edited Loading

david-perez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

82marbag commented Jan 5, 2022

david-perez left a comment

Choose a reason for hiding this comment

82marbag commented Dec 21, 2021 •

edited

Loading

82marbag commented Dec 23, 2021 •

edited

Loading

82marbag commented Jan 3, 2022 •

edited

Loading