Skip to content

UriTemplate takes wrong approach at encoding substituted template variables [SPR-8662] #13304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spring-projects-issues opened this issue Aug 31, 2011 · 11 comments
Assignees
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Milestone

Comments

@spring-projects-issues
Copy link
Collaborator

spring-projects-issues commented Aug 31, 2011

Hannes Schmidt opened SPR-8662 and commented

The basic problem is that UriTemplate first substitutes the template variables into a single string and then encodes that string as a whole using URIUtils. This simply can't work reliably because the substitution could produce a string that is impossible to parse using a regular expression for URLs. For example, the following template

http://foo.com/things/{thingId}

with the following value for thingId

a/b

produces the following string after substitution

http://foo.com/things/a/b

When URIUtil does not encode the / between a and b because it takes the slash literally. This is what I would expect in this case:

http://foo.com/things/a%2Fb

The reason we have URL-encoding is so we can embed any string into a URL without breaking it. But we need to encode the embedded string BEFORE we embed it. We can't expect URIUtils to be able to fix a URL that we broke by embedding an unencoded string.

Now, depending on where we embed a string into a URL certain characters loose their special meaning. The slash, for example, has no special meaning in the query part. If we want to produce URLs without redundantly encoded parts, we need to take the context into account. Nevertheless, it is always safe to encode all special characters in each embedded string and that would be a simple fix.


Affects: 3.0.5

Issue Links:

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

Hannes, how would you like the UriTemplate to do encoding?

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

Something that may be relevant:
http://tools.ietf.org/html/draft-gregorio-uritemplate-06#section-3.2.6

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Sep 1, 2011

Hannes Schmidt commented

I work around this issue by subclassing RestTemplate, overloading the execute() methods to have it use a custom subclass of UriTemplate which overrides expand(Object...) as follows:

@Override
public URI expand( final Object... uriVariableValues )
{
    Assert.notNull( uriVariableValues, "'uriVariableValues' must not be null" );
    if( uriVariableValues.length != getVariableNames().size() ) {
        throw new IllegalArgumentException(
            "Invalid amount of variables values in [" + this.uriTemplate + "]: expected " +
                getVariableNames().size() + "; got " + uriVariableValues.length );
    }
    final Matcher matcher = NAMES_PATTERN.matcher( this.uriTemplate );
    final StringBuffer buffer = new StringBuffer();
    int i = 0;
    while( matcher.find() ) {
        final Object uriVariable = uriVariableValues[ i++ ];
        final String replacement = Matcher.quoteReplacement( uriVariable != null ? uriVariable.toString() : "" );
        try {
            matcher.appendReplacement( buffer, URLEncoder.encode( replacement, "UTF-8" ) );
        } catch( final UnsupportedEncodingException e ) {
            throw new RuntimeException( e );
        }
    }
    matcher.appendTail( buffer );
    try {
        return new URI( buffer.toString() );
    } catch( final URISyntaxException e ) {
        throw new RuntimeException( e );
    }
}

This produces URLs that unnecessarily encode all special characters independent of where in the URL the replacement is embedded. A better solution would be to parse the template (with the {...} variable placeholders still in place) into the various components (scheme,user,host,path,query etc.). Then we'd substitute the variable placeholders in each component with the encoded variable value, but this time only encoding characters with special meaning to that component. For example, values for placeholders in the path component would have to be encoded by escaping slash, question mark, ampersand and hash, but not colon. Values for placeholders in the query component would have to be encoded by escaping question mark, ampersand and hash, but not colon or slash.

My code also uses the JDK's URLEncoder whereas it should probably use functionality from UriUtils.

Regarding the URI Template draft: I wasn't aware that it existed. It seems pretty complex and I'm not sure UriTemplate should attempt to implement it. Nevertheless, an implementation of this draft would still be subject to the requirement that each embedded value must be encoded before it is embedded.

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

So in your approach the literal parts of the URL are not encoded.

For example:

http://foo.com/all things/{thingId}

where thingId is "a/b", would remain partially encoded:

http://foo.com/all things/a%2Fb

Does that not come up as an issue for you, or do you work around it by varying the template variables?

A better solution would be to parse the template (with the {...} variable placeholders still in place) into the various components (scheme,user,host,path,query etc.)

A variable placeholder may actually contain a character necessary to parse the template (like the path or the '?' separating the query) thus preventing it from being parsed correctly.

This is where the expression types in the URI template draft are relevant by indicating what part of the URI a variable represents and whether to even encode it:

http://foo.com/{/prefix}/{/thingId}{?param}

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Sep 2, 2011

Hannes Schmidt commented

Good discussion.

http://foo.com/all things/{thingId}

Does that not come up as an issue for you, or do you work around it by varying the template variables?

No, it's precisely what I want. I don't have a single case where I'd like to maintain special characters in embedded variables. I do agree that the current behavior represents a valid use case. I just think it's a less common one. Here's why:

The current behavior produces the URL

http://foo.com/things/a/b

when used with the template

http://foo.com/things/{thingId}

When I use that same template to annotate my Spring controllers, that URL won't even match because {thingId} only consumes the string 'a', leaving a trailing '/b'. In other words, the current behavior of the Spring REST client support is inconsistent with Spring's own annotation-based REST server-side support.

Secondly, let's take a look at other technologies where this problem arises. The UNIX shell lets us choose which behavior we want.

file="My file"
cat $file
cat "$file"

The simpler syntax (without quotes) produces

ls My file

the one with quotes produces

ls "My file"

The problem with the shell approach is that the simpler syntax represents the less common use case. We'd almost always want the space in the file name to be preserved rather than break the file name up into two file names. If you look at shell scripts in the wild you will find that the predominant substitution syntax is "$foo". Forgetting the quotes can break the script or pose a security risk. Had the designers of the UNIX shell made the simple syntax represent the common use case, the world today would be a better place. Just kidding.

Now, here's an example where someone took the right approach: SQL statement substitution. Most SQL driver technology I know (JDBC, Perl's DBI, Ruby's DBI ...) escape substitutions in prepared statements:

SELECT * FROM people WHERE nickname = ?

with

Jeff's mom

as the value to be substituted becomes

SELECT * FROM people WHERE nickname = 'Jeff\'s mom'

The default is to escape the substituted value before embedding it. This choice was made for security reasons, i.e. to prevent SQL injections. The current behavior of UriTemplate creates systems that are sensitive to whether substituted variables contain special characters or not. Programmers might not be aware of this issue until one of their users uses a slash in thingId. The safe choice is to make escape-before-embedding the default because it is more the predictable behavior.

This is where the expression types in the URI template draft are relevant by indicating what part of the URI a variable represents and whether to even encode it:

That may be so, but (correct me if I am wrong) neither RestTemplate nor UriTemplate support that draft or even come close to it (even though UriTemplate refers to it in its JavaDoc header). In other words, I don't understand what the draft has to do with the current implementation. I can't help thinking that the current implementation is counter-intuitive and inconsistent with other areas of Spring which is why I created this issue.

@spring-projects-issues
Copy link
Collaborator Author

Hannes Schmidt commented

Let me also give some background information. We are currently implementing a REST service that processes images. The images are identified and accessed by their URI. Our customers need to submit image URIs to our REST service. We also want to enable them to retrieve information about images. We therefore need to have a GET request for individual images. So if the image is at http://customer.com/image.jpg the client needs to do

GET http://ourdomain.com/api/images/http%3A%2F%2Fcustomer.com%2Fimage.jpg

We do not want to introduce separate unique identifiers for images as that would complicate things unnecessarily. URIs are unique and immutable by definition and can thus serve as identifiers.

However, we can only get this to work with two tweaks, one server-side and one client-side. The client-side workaround is described above. The server-side workaround involves tweaking Spring MVC to 1) not decode the entire URL by setting DefaultAnnotationHandlerMapping.urlDecode=false, DefaultAnnotationHandlerMapping.useDefaultSuffixPattern=false and 2) to install an interceptor that decodes the path template variables individually before they are passed on to the controllers. This is exactly the inverse of what our client-side workaround does. RestTemplate's default behavior is to embed and then encode the resulting URL as a whole. The Spring MVC stuff decodes the request URL as a whole, then extracts path variables. Both, the client- and server-side are taking the wrong approach in my opinion.

I haven't opened an issue against Spring MVC because our workaround is pretty clean on that side. The client-side workaround involves some heavy subclassing and duplication of Spring code and is therefore super-brittle. I'd be happy with a little flag that lets users decide which approach they want, the legacy embded-then-encode or the safer encode-then-embed.

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Sep 14, 2011

Rossen Stoyanchev commented

Hannes, first of all thank you for you well reasoned feedback and suggestions. Just wanted to let you know that some substantive improvements are on their way that will accommodate your example. This done as part of work for #10641 but will provide not only the URI building capability but also general improvements to the way URI templates are processed. I will update this ticket again in the near future.

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

UriTemplate is now backed by UriComponentsBuilder, which breaks down the template into individual URI components and then expands and encodes each component individually.

@spring-projects-issues
Copy link
Collaborator Author

Emerson Farrugia commented

I'm facing Hannes' problem. Could someone shed some light on how UriComponents helps solve it?

String uri = "http://foo/{bar}";
UriComponents uriComponents = UriComponentsBuilder.fromUriString(uri).build();
System.out.println(uriComponents.expand("user/name").encode().toUriString());

produces

http://foo/user/name

instead of

http://foo/user%2Fname

Cheers, Emerson

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

This may be a bug. It seems to be related to the expanding a URI variable in a full path. When the path is specified in segments it works:

String uri = "http://foo";
UriComponents uriComponents = UriComponentsBuilder.fromUriString(uri).pathSegment("{bar}").build();
System.out.println(uriComponents.expand("user/name").encode().toUriString());

Could you open a separate ticket?

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Feb 24, 2015

Rossen Stoyanchev commented

Actually it's not a bug. The fromUriString method parses a URI string as a "full path" and applies path encoding. Currently you'd have to have append individual path segments to get path segment encoding. See #17347 for more coming on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

2 participants