Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support stricter encoding of URI variables in UriComponents [SPR-17039] #21577

Closed
spring-projects-issues opened this issue Jul 13, 2018 · 20 comments
Assignees
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Milestone

Comments

@spring-projects-issues
Copy link
Collaborator

spring-projects-issues commented Jul 13, 2018

Rossen Stoyanchev opened SPR-17039 and commented

Historically UriComponents has always encoded only characters that are illegal in a given part of the URI (e.g. "/" is illegal in a path segment), and that does not include characters that are legal but have some other reserved meaning (e.g. ";" in a path segment, or also "+" in a query param).

UriComponents has also always relied on expanding URI variables first, and then encoding the expanded String, which makes it impossible to apply stricter encoding to URI variable values which is usually what's expected intuitively, because once expanded it's impossible to tell the values apart from the rest of the template. Typically the expectation is that expanded values will have by fully encoded.

While the RestTemplate and WebClient can be configured with a UriBuilderFactory that supports different encoding mode strategy, currently there is really no answer when using UriComponents directly.


Affects: 5.0.7

Issue Links:

1 votes, 9 watchers

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

This is now ready, see the updated "URI Encoding" in the docs. The short version is, use the new UriComponentsBuilder#encode method and not the existing one in UriComponents, i.e. invoke encode before and not after expanding URI variables.

Please give this a try with 5.0.8 or 5.1 snapshots to confirm how it works in your application.

@spring-projects-issues
Copy link
Collaborator Author

Christophe Levesque commented

Thanks Rossen Stoyanchev! The only downside is that it requires that extra UriComponentsBuilder#encode call.

UriComponentBuilder.fromHttpUrl(url).queryParam("foo", foo).toUriString(); // <= this would still not work, needs to add new encode() after toUriString

Is there a way to change the toUriString method in a way that would have the previous code work as is?

PS: Also, unrelated request: can there be a toUri() shorthand method in UriComponentsBuilder the same way there is a toUriString()? :)

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

The key to understand this, is that different degrees of encoding are applied to the URI template vs URI variables. In other words given:

http://example.com/a/{b}/c?q={q}&p={p}

The URI template is everything except for the URI variable placeholders. However the code snippet you showed only builds a URI literal without any variables, so the level encoding is the same, only illegal characters, no matter which method is used.

So it would also have to be something like:

.queryParam("foo", "{foo}").buildAndExpand(foo)

By the time .toUriString() is called, the expand would have happened and at that point it's too late to encode URI variables more strictly. Unfortunately we cannot switch the default mode of encoding in UriComponentsBuilder at this stage since that's the only way toUriString() could work without a call to encode().

That said UriComponentsBuilder does have a buildAndExpand shortcut to URI:

URI uri = UriComponentBuilder.fromHttpUrl(url).queryParam("foo", "{foo}").encode().buildAndExpand("a+b");

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

Come to think of it, this works and it's almost identical length as what you had:

URI uri = UriComponentsBuilder
        .fromHttpUrl(url).queryParam("foo", "{foo}").build(foo);

Or include it in the URI template:

URI uri = UriComponentsBuilder
        .fromHttpUrl(url + "?foo={foo}").build(foo);

Explanation: the build(Object...) and build(Map<String, Object>) methods had to be implemented as part of UriBuilder (new in 5.0) but that contract is more likely to be used through DefaultUriBuilderFactory, if at all, and is overall quite new. So I've switched those methods internally to do encode().buildAndExpand().toUri().

For toUriString() a switch from build() + encode() to encode() + build() would make no difference, because no URI variables are expanded. We could add a toUri() as well but that would also have no effect on encoding, i.e. same as toUriString().

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

Resolving but now is a good time to try this.

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Nov 8, 2018

Michal Domagala commented

I use autoconfigured WebTestClient in my integration test. I was satisfied with EncodingMode.URI_COMPONENT because I could easy test trimming spaces in request argument - I can just add a space to my request .queryParam("foo", " bar")) and verify space is trimmed on server side.

Could you point me how to elegant undo #21577 for autoconfigured WebTestClient? The only way I found is drop autoconfigured one and create custom client.

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

Michal Domagala, I've create a ticket in Boot. I've also included at least one example there of how it can be done. There may be better ways though so watch for updates on the ticket.

@spring-projects-issues
Copy link
Collaborator Author

Leonard Brünings commented

Rossen Stoyanchev please advice on how to solve it with generic UriTemplate

    // templates are populated via Spring Boot Configuration Properties
    private Map<String, UriTemplate> templates = new HashMap<>();

    public URI getLink(String linkType, Map<String, String> templateParams) {
        return templates.get(linkType).expand(templateParams);
    }
conf:
  links:
    configSite: "https://example.com/config?email={email}"

At this point we don't know what variable we have, and if they are query or path variables.

URI redirectUri = getLink("configSite", Collections.singletonMap("email", "service+bar@gmail.com"));
// render     https://example.com/config?email=service+bar@gmail.com
// instead of https://example.com/config?email=service%2Bbar%40gmail.com

This will provide a valid URI, however it won't encode the + and @, and then the plus will get decoded to a space on the receiving site. The problem is that we can't even use URLEncoder.encode manually on the calling site, as this will cause double encoding.

@spring-projects-issues
Copy link
Collaborator Author

Leonard Brünings commented

I managed to get the desired result with:

public URI getLink(String linkType, Map<String, String> templateParams) {
    return UriComponentsBuilder.fromUriString(templates.get(linkType).toString())
            .encode()
            .buildAndExpand(templateParams).toUri();
}

However, I must say this is quite ugly. Is there any way we could get UriTemplate.encode().expand()?

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

Not really, UriTemplate uses UriComponentsBuilder internally, and provides just one way of doing it. You could shorten your example to:

public URI getLink(String linkType, Map<String, String> templateParams) {
        return UriComponentsBuilder.fromUriString(templates.get(linkType).toString())
                .build(templateParams);
    }

@spring-projects-issues spring-projects-issues added type: enhancement A general enhancement in: web Issues in web modules (web, webmvc, webflux, websocket) labels Jan 11, 2019
@spring-projects-issues spring-projects-issues added this to the 5.1 RC1 milestone Jan 11, 2019
@HermanBovens
Copy link

HermanBovens commented Dec 18, 2019

I ended up here after doing quite a bit of searching when investigating a bug that only appeared when users had entered a string with a + in it, which was saved into the DB and later used as a URI parameter. Although in the end it's possible to have this encoded properly, I don't agree that the current solution is developer friendly. At the very least, the javadoc is lacking: javadoc for UriComponentsBuilder#queryParam does not say that certain legal but special characters, such as +, will not be encoded.

I believe that a lot of the confusion around this issue is related to the "String obsession" anti-pattern. If we have separate types for URI components with different levels of encoding applied, that might help to make conversion from one form to another more explicit.

@rstoyanchev
Copy link
Contributor

UriComponentsBuilder does have different URI components. The issue here is that there are legitimate reasons to want to encode + or to leave it as is, and the fact that there was a change of behavior after a long time.

We did revamp the section on URI encoding but adding something specifically about the treatment of + in the Javadoc of UriComponentsBuilder is a good idea. I will do so.

@rstoyanchev
Copy link
Contributor

adding something specifically about the treatment of + in the Javadoc of UriComponentsBuilder is a good idea. I will do so

261956f

@HermanBovens
Copy link

HermanBovens commented Dec 21, 2019 via email

gonzalad added a commit to gonzalad/cas that referenced this issue Mar 27, 2020
The + character is not correcly encoded by Cas when
using delegated authenication.

This leads to the following error when CAS redirects to a delegated
authentication server:

```
ERROR [org.apereo.cas.support.oauth.services.OAuth20AuthenticationServiceSelectionStrategy] -
<Illegal character in query at index xxx:
https://casserver.fr:443/cas/oauth2.0/callbackAuthorize?client_id=myapp&redirect_uri=http%3A%2F%2Fmyapp.fr%3A4200%2Fauth%2Fauth-callback&response_type=id_token token&client_name=CasOAuthClient>
```

This error originates from UriComponentsBuilder usage.

See spring-projects/spring-framework#21577
and https://docs.spring.io/spring/docs/current/spring-framework-reference/web.html#web-uri-encoding
@ravenblackdusk
Copy link

Did I understand this properly? Is the simplest way to encode "=" in query params something like this:

UriComponentsBuilder.fromHttpUrl("http://localhost").queryParam("foo", "{1}").queryParam("bar", "{2}").build(mapOf(
        "1" to foo,
        "2" to bar
))

@rstoyanchev
Copy link
Contributor

rstoyanchev commented Jul 13, 2020

No, this works too:

UriComponentsBuilder.fromHttpUrl("http://localhost").queryParam("foo", "=").build().encode().toUri()

@HermanBovens
Copy link

What if you need to add a parameter that can have multiple values, for example using a MultiValueMap<String, String>?
Will this properly encode everything (including + of course) ?


MultiValueMap<String, String> map = ...

...
webClient.get(builder -> builder.path(...)
                                .queryParams(map)
                                .build())

If not then what is the suggested way to get this functionality?

@HermanBovens
Copy link

HermanBovens commented Jul 17, 2020

It seems the above does not fully encode, it's the same as using .queryParam(...) without template variable.

I continue to find bugs as a result of the current API, and expect to be doing so for a long time.
I've consulted RFC 3986, and although I did not see a section that mandates such a confusing API, I did find several passages that indicate the opposite.

In Section 2.2 "Reserved Characters":

   If data for a URI component would conflict with a reserved
   character's purpose as a delimiter, then the conflicting data must be
   percent-encoded before the URI is formed.

--> I think the parameters after the first one in the queryParam method qualify as data for a URI component. Apparently it was decided that only template variable contents qualify, but the spec certainly doesn't dictate that.

  Thus, characters in the reserved
   set are protected from normalization and are therefore safe to be
   used by scheme-specific and producer-specific algorithms for
   delimiting data subcomponents within a URI.

--> So yes characters like '+' should be preserved when used as a delimiter, not otherwise.

Section 2.4 "When to Encode or Decode":

   Under normal circumstances, the only time when octets within a URI
   are percent-encoded is during the process of producing the URI from
   its component parts.  This is when an implementation determines which
   of the reserved characters are to be used as subcomponent delimiters
   and which can be safely used as data.

--> This decision is important, and always treating reserved characters in query parameter data as delimiters (except when used with template variables) does not seem right, especially when there is no other option than to use template variables.

   When a URI is dereferenced, the components and subcomponents
   significant to the scheme-specific dereferencing process (if any)
   must be parsed and separated before the percent-encoded octets within
   those components can be safely decoded, as otherwise the data may be
   mistaken for component delimiters.

--> This is the reverse process and warns again about the possibility to mix up data and delimiters.

So, it's all about API design, the spec doesn't say + characters should not be encoded, on the contrary.

As I see it, there are only 2 kind of Strings that could be supplied as values for the queryParam() method:

  1. Plain Strings that need to be fully encoded because they must be reconstructed when dereferencing the URI and may contain illegal or reserved characters (e.g. user input).

  2. Strings that contain delimiters and percent-encoded data. For these, no encoding should be done at all (otherwise it would be double-encoded).

A query parameter string that would consist of plain strings (non-encoded user input) which are joined with '+' characters to be transmitted as delimiters, should not exist. It could result in a String that contains some '+' characters that need to be percent-encoded and others that do not. However, it is precisely this kind of String that is assumed to be passed into queryParam(), as illegal characters will be encoded but '+' will not.

Since parameters values to queryParam are of type Object, maybe a type OpaqueString (constructable using a static factory method opaque(String)) could be provided, whose contents are considered to be of the 1ste type and need full encoding if encoding is enabled at all. This would also solve the case where the parameter contains { and } characters which must be interpreted as data and not template variable pre- and postfixes. When well-documented, I think this is preferable to template variables whose values are specified at the end of the build chain and must be in the exact order.

@ravenblackdusk
Copy link

No, this works too:

UriComponentsBuilder.fromHttpUrl("http://localhost").queryParam("foo", "=").build().encode().toUri()

@rstoyanchev this works for '=' but not for '/' or '+'. so is the following the simplest way to encode '/', '+' and similar characters?

UriComponentsBuilder.fromHttpUrl("http://localhost").queryParam("foo", "{1}").queryParam("bar", "{2}").build(mapOf(
        "1" to foo,
        "2" to bar
))

@HermanBovens
Copy link

@ravenblackdusk It seems so. Maybe the simplest solution would be to add a rawQueryParam() method where the parameters are assumed to not contain any template variables and must be fully encoded (including special characters, including +).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

4 participants