Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: clarify the use of "/" as a prefix_rewrite #3160

Merged
merged 11 commits into from
Apr 30, 2018
Merged

docs: clarify the use of "/" as a prefix_rewrite #3160

merged 11 commits into from
Apr 30, 2018

Conversation

dio
Copy link
Member

@dio dio commented Apr 22, 2018

docs: clarify the use of "/" as a prefix_rewrite

This patch clarifies the use of "/" as a prefix_rewrite in route and redirect prefix rewriting. And also a note on the use of trailing slashes as match value.

Risk Level: Low

Testing: add more input samples to RedirectPrefixRewrite test.

Docs Changes:

  • Update route.proto doc regarding path_rewrite both for redirect and route.

Release Notes:
N/A

Fixes #2956

dio added 2 commits April 22, 2018 07:17
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@mattklein123
Copy link
Member

@zuercher can you review this one?

@zuercher
Copy link
Member

It's unfortunate that we can't distinguish between prefix_rewrite = "" and unset because the real problem is that you can't rewrite a prefix to the empty string.

std::string Utility::normalizePath(absl::string_view path) {
std::vector<absl::string_view> segments = StringUtil::splitToken(path, "/", false);

// Removes dot and double-dot segments https://tools.ietf.org/html/rfc3986#section-5.2.4.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm described at that URL does more than simply delete ".." segments.

I think if you're going to do more than handle the case where the user wants to remove a prefix from the path (which just amounts to dealing with "//" at the start of the rewritten path, then the normalization should handle ".." correctly.

In either case, the docs for the prefix_rewrite field should be updated to describe what happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuercher I think I will remove that partial support for now.

A side question/clarification,

It's unfortunate that we can't distinguish between prefix_rewrite = "" and unset because the real problem is that you can't rewrite a prefix to the empty string.

Do you think it is better to allow users to set prefix_rewrite as "" rather than "/"?

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Can you also update api/envoy/api/v2/route/route.proto with this new behavior?

@@ -263,6 +263,13 @@ class Utility {
factoryForGrpcApiConfigSource(Grpc::AsyncClientManager& async_client_manager,
const envoy::api::v2::core::ApiConfigSource& api_config_source,
Stats::Scope& scope);

/**
* Normalize URI path. Currently it only handles to remove repeated slashes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "Currently it only handles removing repeated slashes."

dio added 2 commits April 24, 2018 06:19
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@dio
Copy link
Member Author

dio commented Apr 23, 2018

@zuercher I added an explanation on this as an example. If you think we need more please let me know. Thanks!

zuercher
zuercher previously approved these changes Apr 24, 2018
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a functionality perspective, wondering if we can do better with perf?

@@ -228,5 +228,11 @@ Grpc::AsyncClientFactoryPtr Utility::factoryForGrpcApiConfigSource(
return async_client_manager.factoryForGrpcService(grpc_service, scope, false);
}

std::string Utility::normalizePath(absl::string_view path) {
std::vector<absl::string_view> segments = StringUtil::splitToken(path, "/", false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const here and next line

std::string Utility::normalizePath(absl::string_view path) {
std::vector<absl::string_view> segments = StringUtil::splitToken(path, "/", false);
bool has_trailing_slash = path.back() == '/';
return absl::StrCat("/", absl::StrJoin(segments, "/"), has_trailing_slash ? "/" : "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: I feel like this is a pretty heavy hammer from a perf perspective for solving a specific issue (duplicate '/' after rewrite). Could we detect the duplicate '/' and copy strings in a way in which this would perform better? cc @jmarantz who likes these types of things. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Probably most of the time the use-case is to have duplicates '/' at the beginning of the path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I have functional concerns about this PR first.

std::string Utility::normalizePath(absl::string_view path) {
std::vector<absl::string_view> segments = StringUtil::splitToken(path, "/", false);
bool has_trailing_slash = path.back() == '/';
return absl::StrCat("/", absl::StrJoin(segments, "/"), has_trailing_slash ? "/" : "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I have functional concerns about this PR first.

// Maintain the trailing slash.
EXPECT_EQ("/ok/a/b/c/", Utility::normalizePath("/ok/a/b/c/"));
EXPECT_EQ("/ok/a/b/c/", Utility::normalizePath("///ok/a/b/c///"));
EXPECT_EQ("/ok/a/b/c/", Utility::normalizePath("///ok////a////b////c///"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think normalization should be this aggressive, messing with the slashes in the middle of the path.

I think the beginning of the path is a special case, and "scheme://origin/" should be treated the same as "scheme://origin". But I don't think that URL normalizers usually mess with anything after the "/".

The motivation force in the bug the fixes was really the behavior of replacing prefix "/foo" with "/". WDYT of handling this at the parsing of that command, where a trailing slash is appended to src or dest if they don't have them? Then the swap can be done fairly cleanly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, that's basically my thinking as well.

Copy link
Member Author

@dio dio Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for overthinking it. What do you think of handling this “/“ specifically: treating that as replacing with empty string? https://github.com/dio/envoy/blob/fix-2956/source/common/router/config_impl.cc#L375

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@dio dio changed the title config: normalize URL path after prefix rewriting router config: handle / in prefix rewrite as replace with empty string Apr 25, 2018
@dio dio changed the title router config: handle / in prefix rewrite as replace with empty string router config: handle / in prefix rewriting as replacing with empty string Apr 25, 2018
@dio dio changed the title router config: handle / in prefix rewriting as replacing with empty string router config: handle “/“ in prefix rewriting as replacing with empty string Apr 25, 2018
@jmarantz
Copy link
Contributor

In the description you describe a scenario:

- match: { prefix: "/ignore-this"}
  redirect: { prefix_rewrite: "/" }

The reason that produces unexpected results is that the match string doesn't end with a slash and the redirect does. You could solve this problem by either adding a slash to the match-string or removing the slash from the redirect string. Which you choose depends on what you want the result of a URL "/ignore-this-maybe/foo" to be. Do you want that to map to "/-maybe/foo"? If the answer is you'd prefer that not trigger the matching logic at all, then you want to make the match more specific by appending a slash.

@jmarantz
Copy link
Contributor

Another alternative is to just change doc (and not code) to suggest that if you want to match a whole segment, then include the trailing slash in the match string.

@dio
Copy link
Member Author

dio commented Apr 25, 2018

@jmarantz thanks for clearing that up.

Sorry for not catching that earlier! Will keep the code and convert this PR to just contain doc and additional test input samples for that RedirectPrefixRewrite case. Thanks!

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@dio dio changed the title router config: handle “/“ in prefix rewriting as replacing with empty string docs: clarify the use of "/" as a prefix_rewrite Apr 25, 2018
dio added 2 commits April 26, 2018 03:21
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@dio
Copy link
Member Author

dio commented Apr 26, 2018

@jmarantz updated. If you have time please help to take a look. Thanks!

Side note: for now to accomplish that common use-case (when you want to rewrite e.g. /prefix, /prefix/ok, /prefix/ok/1/2 to /, /ok and /ok/1/2 respectively) we need to define two entries in the config.

// it rewrites :code:`/prefix` to :code:`/` and :code:`/prefix/foo` to :code:`/foo`.
// it rewrites :code:`/prefix` to :code:`/`, but :code:`/prefix/ok` to :code:`//ok`, which
// most of the time is not the intended result. However, it can be useful for the case of
// replacing substring of a path segment. For example to rewrite :code:`/prefixok` to :code:`/ok`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/replacing substring/replacing a substring/

// prefix: "/prefix"
// route:
// prefix_rewrite: "/"
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the examples below will be clearer if the two prefix-substitutions above use different match strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify?

Copy link
Member Author

@dio dio Apr 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this is to tell that to achieve such behavior (rewriting by replacing a segment of a URL, in this case: "prefix" and attach the rest to the root path) is by declaring two match rules (with and without trailing "/", in this case "/prefix" and "/prefix/"). The below example tells that if we remove the rule with "match" value with trailing slash, it gives a different result (rewrites "/prefix/ok" to "//ok").

// - match:
// prefix: "/prefix"
// redirect:
// prefix_rewrite: "/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't have the whole context for this file, but I wonder if you could point to the first instance of this example rather than repeating it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see

redirect->rewritePathHeader(headers);
EXPECT_EQ("http://redirect.lyft.com/", redirect->newPath(headers));
}
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth some comments here to describe what how these two cases are different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will add some comments here.

//
// .. note::
//
// Take a careful attention to the use of trailing slash in match prefix value. A common
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you write something like the following, you can delete the examples.

"Pay careful attention to the use of trailing slashes in the RouteMatch's prefix value. Stripping a prefix from a path requires multiple Routes to handle all cases. For example, rewriting /prefix to / and /prefix/etc to /etc cannot be done in a single Route."

Copy link
Member Author

@dio dio Apr 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zuercher, I have updated this PR as suggested.

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
zuercher
zuercher previously approved these changes Apr 28, 2018
// :ref:`route's match <envoy_api_field_route.Route.match>` prefix value.
// Stripping a prefix from a path requires multiple Routes to handle all cases. For example,
// rewriting */prefix* to */* and */prefix/etc* to */etc*
// cannot be done in a single :ref:`Route <envoy_api_msg_route.Route>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give an explicit example of what the user would configure? I think that would be helpful.

Also, Q: I lost track of the conversation but isn't there a targeted fix possible here where we just adjust/normalize the slash where the prefix ends and the rest of the route begins?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give an explicit example of what the user would configure? I think that would be helpful.

I added an example in my previous commit, https://github.com/dio/envoy/blob/d46c26e4ede389971fbfeca7bd2d2e48716ea2d4/api/envoy/api/v2/route/route.proto#L369-L393 if you think this is good enough I can add it back.

Copy link
Member Author

@dio dio Apr 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Q: I lost track of the conversation but isn't there a targeted fix possible here where we just adjust/normalize the slash where the prefix ends and the rest of the route begins?

We have this case like the following:

{
    Http::TestHeaderMapImpl headers =
        genRedirectHeaders("redirect.lyft.com", "/ignore-this/use/", false, false);
    const DirectResponseEntry* redirect = config.route(headers, 0)->directResponseEntry();
    redirect->rewritePathHeader(headers);
    EXPECT_EQ("http://redirect.lyft.com/use/", redirect->newPath(headers));
}

Where the config is:

      - match: { prefix: "/ignore-this"}
        redirect: { prefix_rewrite: "/" }

We decided to accommodate this behavior. Hence no code change for this.

Copy link
Contributor

@jmarantz jmarantz Apr 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my thinking here is that slashes have no special role in the URL after the first 3. E.g.
scheme://origin/foo/bar
and
scheme://origin/foo//bar
are both different valid URLs and origin servers can interpret them differently. Moreover, proxies may see arbitrary slashes in URL parameters. If they originate from a browser they'd be escaped, but not if they originate from some other client.

You could normalize the match strings, but you only lose generality doing so, and it's conceivable someone may want a URL rewriter that strips off a prefix that does not necessarily end with a slash.

Authoritative spec is https://www.w3.org/Addressing/URL/url-spec.txt which has this (IMO) somewhat ambiguous statement:

The path is interpreted in a manner dependent on the scheme being
used. Generally, the reserved slash "/" character (ASCII 2F hex)
denotes a level in a hierarchical structure, the higher level part
to the left of the slash.

So in typical use (e.g. mapping to a file-system) might correspond to a file-system where normalization makes sense, that's not required by the spec and in practice I think it's better not to mess with URLs too much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I'm suggesting is to add an explicit example above in the docs where you talk about how it cannot be done in a single route rule. Can you list out the two rules that cover the example case so the user better understands?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 OK. I have it here: a4c60d5.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dio! Very clear.

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@zuercher zuercher merged commit d20c12d into envoyproxy:master Apr 30, 2018
ramaraochavali pushed a commit to ramaraochavali/envoy that referenced this pull request May 3, 2018
docs: clarify the use of "/" as a prefix_rewrite

This patch clarifies the use of "/" as a prefix_rewrite in route and redirect prefix rewriting. And also a note on the use of trailing slashes as match value.

Risk Level: Low
Testing: add more input samples to RedirectPrefixRewrite test.
Docs Changes: Update route.proto doc regarding path_rewrite both for redirect and route.
Release Notes: N/A

Fixes envoyproxy#2956

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>

Signed-off-by: Rama <rama.rao@salesforce.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

URL normalization
4 participants