-
Notifications
You must be signed in to change notification settings - Fork 38.5k
UriTemplate takes wrong approach at encoding substituted template variables [SPR-8662] #13304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Rossen Stoyanchev commented Hannes, how would you like the UriTemplate to do encoding? |
Rossen Stoyanchev commented Something that may be relevant: |
Hannes Schmidt commented I work around this issue by subclassing RestTemplate, overloading the execute() methods to have it use a custom subclass of UriTemplate which overrides expand(Object...) as follows:
This produces URLs that unnecessarily encode all special characters independent of where in the URL the replacement is embedded. A better solution would be to parse the template (with the {...} variable placeholders still in place) into the various components (scheme,user,host,path,query etc.). Then we'd substitute the variable placeholders in each component with the encoded variable value, but this time only encoding characters with special meaning to that component. For example, values for placeholders in the path component would have to be encoded by escaping slash, question mark, ampersand and hash, but not colon. Values for placeholders in the query component would have to be encoded by escaping question mark, ampersand and hash, but not colon or slash. My code also uses the JDK's URLEncoder whereas it should probably use functionality from UriUtils. Regarding the URI Template draft: I wasn't aware that it existed. It seems pretty complex and I'm not sure UriTemplate should attempt to implement it. Nevertheless, an implementation of this draft would still be subject to the requirement that each embedded value must be encoded before it is embedded. |
Rossen Stoyanchev commented So in your approach the literal parts of the URL are not encoded. For example:
where thingId is "a/b", would remain partially encoded:
Does that not come up as an issue for you, or do you work around it by varying the template variables?
A variable placeholder may actually contain a character necessary to parse the template (like the path or the '?' separating the query) thus preventing it from being parsed correctly. This is where the expression types in the URI template draft are relevant by indicating what part of the URI a variable represents and whether to even encode it:
|
Hannes Schmidt commented Good discussion.
No, it's precisely what I want. I don't have a single case where I'd like to maintain special characters in embedded variables. I do agree that the current behavior represents a valid use case. I just think it's a less common one. Here's why: The current behavior produces the URL
when used with the template
When I use that same template to annotate my Spring controllers, that URL won't even match because {thingId} only consumes the string 'a', leaving a trailing '/b'. In other words, the current behavior of the Spring REST client support is inconsistent with Spring's own annotation-based REST server-side support. Secondly, let's take a look at other technologies where this problem arises. The UNIX shell lets us choose which behavior we want.
The simpler syntax (without quotes) produces
the one with quotes produces
The problem with the shell approach is that the simpler syntax represents the less common use case. We'd almost always want the space in the file name to be preserved rather than break the file name up into two file names. If you look at shell scripts in the wild you will find that the predominant substitution syntax is "$foo". Forgetting the quotes can break the script or pose a security risk. Had the designers of the UNIX shell made the simple syntax represent the common use case, the world today would be a better place. Just kidding. Now, here's an example where someone took the right approach: SQL statement substitution. Most SQL driver technology I know (JDBC, Perl's DBI, Ruby's DBI ...) escape substitutions in prepared statements:
with
as the value to be substituted becomes
The default is to escape the substituted value before embedding it. This choice was made for security reasons, i.e. to prevent SQL injections. The current behavior of UriTemplate creates systems that are sensitive to whether substituted variables contain special characters or not. Programmers might not be aware of this issue until one of their users uses a slash in thingId. The safe choice is to make escape-before-embedding the default because it is more the predictable behavior.
That may be so, but (correct me if I am wrong) neither RestTemplate nor UriTemplate support that draft or even come close to it (even though UriTemplate refers to it in its JavaDoc header). In other words, I don't understand what the draft has to do with the current implementation. I can't help thinking that the current implementation is counter-intuitive and inconsistent with other areas of Spring which is why I created this issue. |
Hannes Schmidt commented Let me also give some background information. We are currently implementing a REST service that processes images. The images are identified and accessed by their URI. Our customers need to submit image URIs to our REST service. We also want to enable them to retrieve information about images. We therefore need to have a GET request for individual images. So if the image is at http://customer.com/image.jpg the client needs to do
We do not want to introduce separate unique identifiers for images as that would complicate things unnecessarily. URIs are unique and immutable by definition and can thus serve as identifiers. However, we can only get this to work with two tweaks, one server-side and one client-side. The client-side workaround is described above. The server-side workaround involves tweaking Spring MVC to 1) not decode the entire URL by setting DefaultAnnotationHandlerMapping.urlDecode=false, DefaultAnnotationHandlerMapping.useDefaultSuffixPattern=false and 2) to install an interceptor that decodes the path template variables individually before they are passed on to the controllers. This is exactly the inverse of what our client-side workaround does. RestTemplate's default behavior is to embed and then encode the resulting URL as a whole. The Spring MVC stuff decodes the request URL as a whole, then extracts path variables. Both, the client- and server-side are taking the wrong approach in my opinion. I haven't opened an issue against Spring MVC because our workaround is pretty clean on that side. The client-side workaround involves some heavy subclassing and duplication of Spring code and is therefore super-brittle. I'd be happy with a little flag that lets users decide which approach they want, the legacy embded-then-encode or the safer encode-then-embed. |
Rossen Stoyanchev commented Hannes, first of all thank you for you well reasoned feedback and suggestions. Just wanted to let you know that some substantive improvements are on their way that will accommodate your example. This done as part of work for #10641 but will provide not only the URI building capability but also general improvements to the way URI templates are processed. I will update this ticket again in the near future. |
Rossen Stoyanchev commented UriTemplate is now backed by UriComponentsBuilder, which breaks down the template into individual URI components and then expands and encodes each component individually. |
Emerson Farrugia commented I'm facing Hannes' problem. Could someone shed some light on how UriComponents helps solve it?
produces
instead of
Cheers, Emerson |
Rossen Stoyanchev commented This may be a bug. It seems to be related to the expanding a URI variable in a full path. When the path is specified in segments it works: String uri = "http://foo";
UriComponents uriComponents = UriComponentsBuilder.fromUriString(uri).pathSegment("{bar}").build();
System.out.println(uriComponents.expand("user/name").encode().toUriString()); Could you open a separate ticket? |
Rossen Stoyanchev commented Actually it's not a bug. The |
Hannes Schmidt opened SPR-8662 and commented
The basic problem is that UriTemplate first substitutes the template variables into a single string and then encodes that string as a whole using URIUtils. This simply can't work reliably because the substitution could produce a string that is impossible to parse using a regular expression for URLs. For example, the following template
with the following value for thingId
produces the following string after substitution
When URIUtil does not encode the / between a and b because it takes the slash literally. This is what I would expect in this case:
The reason we have URL-encoding is so we can embed any string into a URL without breaking it. But we need to encode the embedded string BEFORE we embed it. We can't expect URIUtils to be able to fix a URL that we broke by embedding an unencoded string.
Now, depending on where we embed a string into a URL certain characters loose their special meaning. The slash, for example, has no special meaning in the query part. If we want to produce URLs without redundantly encoded parts, we need to take the context into account. Nevertheless, it is always safe to encode all special characters in each embedded string and that would be a simple fix.
Affects: 3.0.5
Issue Links:
The text was updated successfully, but these errors were encountered: