-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement httpLabel with nom #996
Conversation
codegen/src/main/kotlin/software/amazon/smithy/rust/codegen/rustlang/CargoDependency.kt
Outdated
Show resolved
Hide resolved
Wow, nice, thanks a lot for the contribution! |
If you haven't, you probably want to install |
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
The fact 2 of the CI steps are failing because the PR comes from a fork, so don't worry about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening the PR, it looks promising.
I've never used nom
, but my idea from reading the docs was that we would create a single parser that, given the URI path, will extract all the bound @httpLabel
fields in one go with a single call. This PR is creating one parser per binding though, and feeding it and consuming the input little by little. We should be able to combine all these small parsers into one that represents the expected URI path for the Smithy operation.
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
val binding = bindings[0] | ||
val deserializer = generateParsePercentEncodedStrFn(binding) | ||
rustTemplate( | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd take this opportunity to improve our naming and use binding.memberName
as the variable binding name instead of m
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memberName is in camelCase, but Rust prefers (warns if not in) snake_case. If it's ok, I'd leave it like this here for simplicity. If you think it's best, I can add a function to convert to snake case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want symbolProvder.toMemberName
Note that you can't use memberName directly in code because it may need to be escaped
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
if (it.isGreedyLabel) { | ||
pattern.append(".+") | ||
} else { | ||
pattern.append("[^/]+") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were we accepting empty path segments? I see the TypeScript sSDK is also stripping away empty path segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing my thoughts down here. I guess done this way because anecdotally servers accept empty path segments e.g. https://github.com//awslabs///////smithy-typescript///commits works fine.
But when extracting labels, I'm not sure about whether this behavior should be the correct one, and I don't see it specified in the Smithy spec. I'm thinking of possible ambiguous scenarios e.g. say the URI pattern is /foo/{label}/bar
and the server receives a request with URI path /foo//bar
. Should the label
field be bound to an empty string? Or should we strip away the empty segment and reject the request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion on this and interpretation of the Smithy spec is that we were indeed doing it wrong and empty path segments should not be stripped away, but I've opened an issue with the Smithy team to clarify the spec: smithy-lang/smithy#1024
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
I see. For what I know, nom isn't a good choice here; tuples for parsing in one go are restricted to 21 items. The alternative could be to chain |
I performed some benchmarks manually and with criterion:
These are over between 20k and 2M iterations. When a greedy label is found, both tests need to manually parse the string because nom does not support finding the last occurrence of a string (if the greedy label appears before another string and it's not at the end). I believe these tests help decide whether we should move to nom. I'm pushing a new revision; if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to try out what I suggested about building a single big parser for the entire URI path and calling it once instead of having one parser per segment?
As of now because of how you implemented the greedy label case with rfind
I don't think it'd be easy/possible to build one big parser, but a simpler implementation is perhaps as follows:
If the URI pattern has a greedy label, the first thing we do at the very beginning is check for the (possibly empty) suffix using str::ends_with
, if the suffix is not there we can error out early without doing any parsing. If the suffix is there, we can strip it away and work with the string slice input_string[..input_string.len() - s]
thereafter, where s
is the length of the suffix, which we know at codegen time. It then becomes easy to build a big parser, since the greedy label can be extracted with ::combinator::rest
.
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
} else { | ||
rustTemplate( | ||
""" | ||
let (input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let (input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap(); | |
let (${inputStringPrefix}input_string, ${bindingPrefix}m) = #{Nom}::bytes::complete::tag::<_, &str, #{Nom}::error::Error<&str>>(${segment.content.dq()})(input_string).unwrap(); |
I think clippy is failing in CI because of this.
Thanks for the benchmarking! Would you mind linking a pastebin to this PR for historical purposes? Thanks! |
This is the benchmark test: https://pastebin.com/raw/v9r3qwqS I've changed the code, the only two things remaining to see are:
Example output:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much better; only minor things left now.
with ends_with, some / at the end won't be fine
This should be correct, trailing slashes at the end do have meaning. So if the URI pattern ends with /
requests must end with /
; if it doesn't, requests must not end with /
. This might seem very strict but if the URI pattern is /foo/{label}
then the request /foo/
is binding ""
to label
. Same goes for empty path segments in the middle of the URI, which we were previously accepting in the regex approach, but with this PR we will be strict and interpret them. See this issue smithy-lang/smithy#1024.
a max of 21 labels / strings can be parsed now (agreed it is fine here)
This is fine IMO, I don't foresee users will need so many labels. Leave a comment somewhere in the source code documenting this limitation, lest we forget or decide to tackle it in the future.
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
"" | ||
val labeledNames = segments | ||
.mapIndexed { index, segment -> | ||
if (segment.isLabel) { "m$index" } else { "_" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider safeName()
from RustWriter.kt
for this. Although it would require to rewrite the block of segments.forEachIndexed
at the end, since the code relies on these indices. An idea could be to zip labeledNames
with the corresponding path bindings, but I don't know if it'd end up any cleaner, so feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safeName(prefix="var")
basically returns var_$i
with i
always increasing. It'd still have to leave _
at the beginning for those labels/constants we don't use and keep the two problems on the naming (like camel case) and on the possible characters in the variable names if we decide to use any other prefix. It seems to me like it won't make it necessarily simpler, but I can add that too.
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Show resolved
Hide resolved
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
segments | ||
.forEachIndexed { index, segment -> | ||
val binding = pathBindings.find { it.memberName == segment.content } | ||
if (binding != null && segment.isLabel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If segment.isLabel
is true
, then binding
must be non-null, right? Because pathBindings
contains all the @httpLabel
bindings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases, such as /foo/{foo}
, foo
is found as binding too because I look for the name
...n/software/amazon/smithy/rust/codegen/server/smithy/protocols/ServerHttpProtocolGenerator.kt
Outdated
Show resolved
Hide resolved
Move to `nom` to implement httpLabel instead of using regexes. Issue: smithy-lang#938 Signed-off-by: Daniele Ahmed <ahmeddan@amazon.com>
1120c83
to
5bec90c
Compare
In #996 [0], we realized that we were ignoring empty URI path segments when doing label extraction. It turns out we are also ignoring them when doing routing, as the TypeScript sSDK currently does [1], from which the initial implementation was copied. However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2] revealed that we _must not_ ignore empty URI path segments when routing or doing label extraction, since empty strings (`""`) should be assigned to the labels in those segments. This commit fixes the behavior so that we don't ignore empty path segments _when doing routing_. #996 will take care of fixing the behavior when doing label extraction. [0]: #996 (comment) [1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78 [2]: smithy-lang/smithy#1024
I saw the linked PR #1029 and I have changed the code here accordingly. Empty labels are fine now, such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
…ng (#1029) In #996 [0], we realized that we were ignoring empty URI path segments when doing label extraction. It turns out we are also ignoring them when doing routing, as the TypeScript sSDK currently does [1], from which the initial implementation was copied. However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2] revealed that we _must not_ ignore empty URI path segments when routing or doing label extraction, since empty strings (`""`) should be assigned to the labels in those segments. This commit fixes the behavior so that we don't ignore empty path segments _when doing routing_. #996 will take care of fixing the behavior when doing label extraction. [0]: #996 (comment) [1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78 [2]: smithy-lang/smithy#1024
… when routing (#1029) In #996 [0], we realized that we were ignoring empty URI path segments when doing label extraction. It turns out we are also ignoring them when doing routing, as the TypeScript sSDK currently does [1], from which the initial implementation was copied. However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2] revealed that we _must not_ ignore empty URI path segments when routing or doing label extraction, since empty strings (`""`) should be assigned to the labels in those segments. This commit fixes the behavior so that we don't ignore empty path segments _when doing routing_. #996 will take care of fixing the behavior when doing label extraction. [0]: smithy-lang/smithy-rs#996 (comment) [1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78 [2]: smithy-lang/smithy#1024
… when routing (#1029) In #996 [0], we realized that we were ignoring empty URI path segments when doing label extraction. It turns out we are also ignoring them when doing routing, as the TypeScript sSDK currently does [1], from which the initial implementation was copied. However, a discussion with the Smithy team in smithy-lang/smithy#1024 [2] revealed that we _must not_ ignore empty URI path segments when routing or doing label extraction, since empty strings (`""`) should be assigned to the labels in those segments. This commit fixes the behavior so that we don't ignore empty path segments _when doing routing_. #996 will take care of fixing the behavior when doing label extraction. [0]: smithy-lang/smithy-rs#996 (comment) [1]: https://github.com/awslabs/smithy-typescript/blob/d263078b81485a6a2013d243639c0c680343ff47/smithy-typescript-ssdk-libs/server-common/src/httpbinding/mux.ts#L78 [2]: smithy-lang/smithy#1024
Move to
nom
to implement httpLabel instead of using regexes.Issue: #938
Signed-off-by: Daniele Ahmed ahmeddan@amazon.com
Motivation and Context
Description
See issue: #938
Testing
I run:
gradlew :codegen-server-test:assemble
and checked the generated output.Example output:
Checklist
CHANGELOG.next.toml
if I made changes to the smithy-rs codegen or runtime cratesCHANGELOG.next.toml
if I made changes to the AWS SDK, generated SDK code, or SDK runtime cratesBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.