Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RestTemplate should encode the url variables [SPR-5516] #10187

Closed
spring-projects-issues opened this issue Feb 23, 2009 · 13 comments
Closed

RestTemplate should encode the url variables [SPR-5516] #10187

spring-projects-issues opened this issue Feb 23, 2009 · 13 comments
Assignees
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: bug A general bug
Milestone

Comments

@spring-projects-issues
Copy link
Collaborator

spring-projects-issues commented Feb 23, 2009

Tareq Abedrabbo opened SPR-5516 and commented

The RestTemplate does not encode the url variables. The following line:

template.postForLocation("http://twitter.com/statuses/update.xml?status={status}", "", "Ho Ho");

Results in an exception:

Exception in thread "main" java.lang.IllegalArgumentException
	at java.net.URI.create(URI.java:842)
	at org.springframework.web.util.UriTemplate.expand(UriTemplate.java:140)
	at org.springframework.web.client.core.RestTemplate.execute(RestTemplate.java:266)
	at org.springframework.web.client.core.RestTemplate.postForLocation(RestTemplate.java:203)
	at test.RestClientTest.main(RestClientTest.java:30)
Caused by: java.net.URISyntaxException: Illegal character in query at index 48: http://twitter.com/statuses/update.xml?status=Ho Ho
	at java.net.URI$Parser.fail(URI.java:2809)
	at java.net.URI$Parser.checkChars(URI.java:2982)
	at java.net.URI$Parser.parseHierarchical(URI.java:3072)
	at java.net.URI$Parser.parse(URI.java:3014)
	at java.net.URI.<init>(URI.java:578)
	at java.net.URI.create(URI.java:840)
	... 4 more

Affects: 3.0 M2

Issue Links:

0 votes, 5 watchers

@spring-projects-issues
Copy link
Collaborator Author

Tareq Abedrabbo commented

Fix confirmed. Thanks!

@spring-projects-issues
Copy link
Collaborator Author

Chantal Ackermann commented

Hi there!

I think this URL encoding does not always work correctly. I am using Spring 3.0RC1.
I am running into the following issue with it:

My URL is partially encrypted. The encrypted part, not URL encoded, looks like that:

/jvg+37xRX4tKphEsdgtSMg==

When I add it to the path like that (as part of the URL, or as entry in the variables map, doesn't matter) and post it using restTemplate.getForObject(), the server access log shows:
/jvg+37xRX4tKphEsdgtSMg==

However, I do want to have it URL encoded. So, I encoded it myself using java.net.URLEncoder.encode(str, "utf-8"). This gives me:
/jvg%2B37xRX4tKphEsdgtSMg%3D%3D

BUT - if I send this as part of the URL using the RestTemplate, than the access log shows:
/jvg%252B37xRX4tKphEsdgtSMg%253D%253D
Which means that the percentage signs have been encoded twice.
And the application corretly throws a decryption failed error.

I cross checked the URL from a browser manually, and then it works. It is already working in the development application which does not use RestTemplate but its own HttpClient based implementation. I ran into this because I want to use the RestTemplate for unit testing, and I actually would like to swich the application to using that template, as well. But it seems that I cannot prevent that this URL gets messed up by the RestTemplate?

@spring-projects-issues
Copy link
Collaborator Author

Arjen Poutsma commented

After RC1, I introduced method variants on RestTemplate which take a java.net.URL as argument (and hence no URI templates). When using these variants, no URL encoding is performed.

You can get a recent snapshot (or wait till RC2), and see if that works better.

@spring-projects-issues
Copy link
Collaborator Author

Chantal Ackermann commented

Hi Arjen,

thank you for answering this quickly. I had a look at the current RestTemplate at:

https://fisheye.springsource.org/browse/spring-framework/trunk/
org.springframework.web/src/main/java/org/springframework/web/client/RestTemplate.java?r=HEAD

(mind the gap)

I am quite convinced that these new methods will solve my problem. However, it does feel like a work-around, doens't it? Because the problem with the double encoding when using the "old" methods still remains?

So, I went back looking at the source code, and there are some things I would like to bring up. (I don't want to blow up this issue, but this framework is too good not to try making it perfect - if I am mistaken, than all for the better.)

I had a look at UriTemplate, especially encodeURI(String) from RC1, and at java.net.URL from 1.6.0.

a) UriTemplate.encodeURI(String) simply splits the String into the different URI parts, and then uses the java.net.URI() constuctors with 3, 4 or 5 arguments. All of the URI() constructors just call a special toString() method that concatenates all of the provided parts back together, and then uses the inner class Parser to split it all again.
Splitting and concatenating and splitting again - why is that done, does it prevent errors? For me, this looks a bit like code duplication, and in a place where people would not expect it. Spring does not want to define its own rules on URI patterns?

b) Neither RestTemplate nor UriTemplate (nor any other Spring class in the game) does apply java.net.URLEncoder.encode() onto the input url string or the variables. They seem to rely on java.net.URI() to encode the URL, is this impression correct? But: URI$Parser does not do an URL encoding over the complete URL. It just scans through the URL by char and escapes any char that is not in the correct range. This applies to '%' but not to '+' and '='. And therefore, as an URL encoded URL can contain '%' those get escaped - a second time. But a URL that is not URL encoded and contains '+' and alike won't be encoded by that scan.

c) When splitting UriTemplate.encodeURI(String) checks first for the (first) occurrence of a colon. An URI, however, can have two colons, and if only one is present, it could occur at two different locations - couldn't it (scheme <> port)? Does the splitting in UriTemplate.encodeURI() still work for inputs like "localhost:8080/myServlet"? (Sorry that I haven't checked that myself, yet.)

Sorry for the longish comment, but I found it worth asking...
Chantal

@spring-projects-issues
Copy link
Collaborator Author

Chantal Ackermann commented

the underline is triggered by the plus sign (followed by quotes?) : +++

@spring-projects-issues
Copy link
Collaborator Author

Arjen Poutsma commented

a) UriTemplate.encodeURI(String) simply splits the String into the different URI parts, and then uses the java.net.URI() constuctors with 3, 4 or 5 arguments. All of the URI() constructors just call a special toString() method that concatenates all of the provided parts back together, and then uses the inner class Parser to split it all again.
Splitting and concatenating and splitting again - why is that done, does it prevent errors? For me, this looks a bit like code duplication, and in a place where people would not expect it. Spring does not want to define its own rules on URI patterns?

We do not assume our own rules on URIs, we conform to RFC 2396 and rely on java.net.URI to do the heavy lifting. If you read the javadoc of URI, you will see a section on Escaped octets, quotation, encoding, and decoding. Furthermore, in the URI constructors description, you will see that the scheme-specific part (path, query string) is quoted (i.e. encoded). So in fact, that constructor does a bit more than concatenating and splitting: it encodes (the significant parts) too.

b) Neither RestTemplate nor UriTemplate (nor any other Spring class in the game) does apply java.net.URLEncoder.encode() onto the input url string or the variables. They seem to rely on java.net.URI() to encode the URL, is this impression correct? But: URI$Parser does not do an URL encoding over the complete URL. It just scans through the URL by char and escapes any char that is not in the correct range. This applies to '%' but not to '' and '='. And therefore, as an URL encoded URL can contain '%' those get escaped - a second time. But a URL that is not URL encoded and contains '' and alike won't be encoded by that scan.

Indeed, we rely on URI to do our encoding, and not URLEncoder. URLEncoder should have been named HtmlFormEncoder, since it encodes URLs based on the HTML spec, while we need RFC 2396 compliance. And that's why we use the URI constructor, as described above.

The problem with double-encodings is due to the RestTemplate API, and not easily solved. We assume that passed Strings are not encoded (yet), and that's why we encode them. It would be nice if we could detect whether the given string is already encoded, but that's impossible, because people actually want to pass double-encoded URLs (I've had requests for this to interact with weird web servers). With the URI-based methods, we can assume that they are encoded already, since it is impossible to constructed a non-encoded URI object. That's why I added those methods.

Alternatively, I could have added a boolean "encoded" parameter to every method, but that would double the amount of methods, which is not an option.

Switching to URLEncoder will not solve this either, since

System.out.println(URLEncoder.encode("http://example.com/hotel%20list"));

happily prints out

http%3A%2F%2Fexample.com%2Fhotel%2520list

which is even worse.

c) When splitting UriTemplate.encodeURI(String) checks first for the (first) occurrence of a colon. An URI, however, can have two colons, and if only one is present, it could occur at two different locations - couldn't it (scheme <> port)? Does the splitting in UriTemplate.encodeURI() still work for inputs like "localhost:8080/myServlet"? (Sorry that I haven't checked that myself, yet.)

localhost:8080/myServlet is not a valid URL, according to RFC 2396, and we don't support that. The following code snippet shows this:

URI u = new URI("localhost:8080/myServlet");
System.out.println(u.getScheme());

prints out

localhost

Sorry for the longish comment, but I found it worth asking...

No problem. I will add some more javadoc to RestTemplate that elaborates on encoded URLs.

@spring-projects-issues
Copy link
Collaborator Author

Chantal Ackermann commented

Hi Arjen,

thanks a lot for your time and consideration. Please don't be offended.

Indeed, we rely on URI to do our encoding, and not URLEncoder. URLEncoder should have been named HtmlFormEncoder, since it encodes URLs based on the HTML spec, while we need RFC 2396 compliance. And that's why we use the URI constructor, as described above.

As I understand it, URI takes care that only those characters that are allowed as part of an URI are used. All others are escaped. The problem is with URIComponents that make use of special characters that are allowed by the URI specification but have a special meaning as part of a URL. Symbols like + and =, for example. You want URLEncoder for that, and therefore the name seems fitting. It encodes + and =, while URI does not.

Maybe I misread this whole issue. I thought it was about exactly this: preserving the meaning of the variables passed as arguments to the RestTemplate methods: "RestTemplate should encode the url variables". This issue is not about the URI as such? Escaping url variables (= URI components) does include escaping the special characters that structure a URL, like + and =, doesn't it?

I'll paste some more test results to help clarify my point. Here the outcome of the tests when using Spring MVC DispatcherServlet (RC1) on the server side and RestTemplate on the client side:

  1. No explicit URL encoding:
    a) encrypted variable: jvg+37xRX4tKphEsdgtSMg==
    b) server access log (=URI encoded by RestTemplate): jvg+37xRX4tKphEsdgtSMg==
    c) servlet log: [de.rios.web.feedback.FeedbackController] Failed to decrypt path variable (url input): jvg 37xRX4tKphEsdgtSMg==
    javax.crypto.BadPaddingException: Given final block not properly padded
    => The + sign is converted to space, the = signs are still there.

So, the RestTemplate/URI does not escape the + and = because they are allowed as part of an URI.
(On the server side, the + gets converted to a space which breaks the meaning of the encrypted String. Why? Shouldn't the behaviour of DispatcherServlet conform to the same rules as does RestTemplate?)

  1. Explicit URL encoding on server and client side:
    a) encrypted variable: jvg+37xRX4tKphEsdgtSMg==
    b) explicitly URL encoded: jvg%2B37xRX4tKphEsdgtSMg%3D%3D
    c) server access log (=URI encoded by RestTemplate): jvg%252B37xRX4tKphEsdgtSMg%253D%253D
    d) servlet explicitly URL decodes this path variable (and MVC decodes it implicitly) and the string can be successfully decrypted

This test passed:

  • Those characters that are not allowed by URI spec are encoded twice which makes sense.
  • However, RestTemplate is not URL encoding/decoding the path variables (=escpaping characters with special meanings in URLs).

I'm sorry to have elaborated on that even though it might not have been the goal of this issue.
(Maybe my comments help others, at least...)

Regards!
Chantal

@spring-projects-issues
Copy link
Collaborator Author

Arjen Poutsma commented

thanks a lot for your time and consideration. Please don't be offended.

No problem. It takes quite a lot more to offend me ;).

Let me take a different stab at this. The RestTemplate takes URLs as arguments. URLs conform to RFC 2396, and this RFC does not consider + and = to be special characters. In effect, RestTemplate works like a browser. If you enter http://example.com/jvg+37xRX4tKphEsdgtSMg== in your browser, you will see that it does not escape anything, because it does not have to. This is different than for instance http://example.com/foo bar, which will be escaped into http://example.com/foo%20bar.

URLEncoder is meant to encode data (not just URLs) used in HTML forms. The major difference (in this case) between HTML forms and URLs is that HTML forms encode spaces as +, while the RFC uses %20. If you read the Javadoc for URLEncoder and URLDecoder, and also read the relevant section of the HTML spec, you will not find a single specific reference to URLs, except for a reference to the way some characters are encoded (and then only after spaces have been replaced with +'s). That's why I said it should have been named HtmlFormEncoder, it would have saved everybody a lot of confusion.

Once again, you can see the difference in your browser. Go to Google, type in the query "foo bar" and submit. You will see that this will result in a GET request to http://www.google.com/search?q=foo+bar. That is the HTML spec in action: your browser takes the "foo bar" query, replaces the space with a +, and does the request. Google will look at the query part (q=foo+bar) and decodes this back again. If you submit "foo+bar", your browser will encode the + and do a request on http://www.google.nl/search?q=foo%2Bbar.

In summary, the key difference here is between HTTP and HTML. RestTemplate is a HTTP client, but does not do HTML. As such, it does not treat +'s or ='s as special characters. If you do want special treatment (effectively mimicking a HTML form submittal), you will have to do the encoding manually, and submit the subsequent URL. Just like your browser does when it submits the Google form.

If anything, the issue you are having seem to be on the server side (in DispatcherServlet), where we seem to do too aggressive decoding by using URLDecoder for any request URL (see UrlPathHelper#decodeRequestString). Hence jvg+37xRX4tKphEsdgtSMg== that is decoded into jvg 37xRX4tKphEsdgtSMg==. I will do some further investigation on this, and create a new JIRA for that if necessary.

@spring-projects-issues
Copy link
Collaborator Author

Chantal Ackermann commented

Ok, I am getting your point. Thanks for that helpful explanation.

I've tried around with the plus sign in my browsers. It is never escaped as a literal plus sign.

If you enter http://example.com/ jvg+37xRX4tKphEsdgtSMg== in your browser, you will see that it does not escape anything, because it does not have to.

Yuk. I'm not sure about that. I am not escaping spaces with + or %20 by default. It's not something the average browser user would do, wouldn't they? Wouldn't they rather mean a plus sign if they put it in....?
Thinking about it - no, you could never know. The URL might have been copied from somewhere. So, it's ALWAYS interpreted as a space. But how then input a literal plus - as %2B? But the clients do not scan automatically for %2B, yet. They do have to know that it might come along.

If you submit "foo+bar", your browser will encode the + and do a request on http://www.google.nl/search?q=foo%2Bbar.

No, I've tried that. I am not able to trigger that conversion, neither on Firefox nor on Safari. (There are possiblity a lot of combinations I haven't tried, though.)

If anything, the issue you are having seem to be on the server side (in DispatcherServlet), where we seem to do too aggressive decoding by using URLDecoder for any request URL (see UrlPathHelper#decodeRequestString). Hence jvg+37xRX4tKphEsdgtSMg== that is decoded into jvg 37xRX4tKphEsdgtSMg==. I will do some further investigation on this, and create a new JIRA for that if necessary.

Is it really an error on the server side? I wondered about that when I described the test. But after your last comment I am not sure anymore.

This makes me look forward to a Spring based solution that takes care of these conversion problems... But as it requires a solution that works for different types of web server / client combinations (some of them not using spring), and frontends that dynamically construct URLs in different layers including JavaScript, this certainly is not an easy task.

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Nov 2, 2009

Arjen Poutsma commented

I've confirmed the bug in Spring MVC, and created #10957 to fix it. It seems that the server-side components are too aggressive when decoding request URIs, which is probably the cause of your problems. Specifically, the UrlPathHelper should not decode + into space.

As for your other comments: this is the behavior I see when typing URLs in Safari (following links is not good enough, because HTML interpretation might get in the way):
http://example.com/foo bar becomes http://example.com/foo%20bar, i.e. Safari (not the person typing it) encodes the space into %20.
http://example.com/foo+bar= remains http://example.com/foo+bar=, i.e. + and = are legal URL characters, and don't need to be encoded.

RestTemplate executes the same behavior in these two cases, and I see no reason to change it.

When using Safari as a HTML interpreter (using Google search as an example), I see the following behavior:
Entering "foo bar" in the search box in http://google.com results in a request for http://www.google.com/search?q=foo+bar (there are some other query parameters, but this is the main part), i.e. Safari forms a URL with a query component, and encodes the space in this query component as a +.
Entering "foo+bar" in the search box results in a request for http://www.google.com/search?q=foo%2Bbar, i.e. Safari forms a URL with a query component, and encodes the + in this query component as %2B.

@spring-projects-issues
Copy link
Collaborator Author

Chantal Ackermann commented

Hi Arjen,

thanks a lot for clearing things up!

I'm sorry. I reproduced the test with the Google URL incorrectly. I used the URL address field of the browser directly instead of using the input field of Google's web page...
When using the Google's input field it works exactly as you descibe it.

Thanks again, great work!
Chantal

@spring-projects-issues
Copy link
Collaborator Author

Matthijs Bierman commented

How then would you send a '+' in your URL template?

The encoding is not done properly if '+' occurs in a query parameter. Say you want to do a Google search for "+obama +president":

http://www.google.com/search?q=%2Bobama+%2Bpresident

You would pass the URL http://www.google.com/search?q={query} and pass the parameter in a map:

Map<String,String> params = new HashMap<String,String>();
params.put("query","+obama +president");

This does not work as expected. The '+' is not encoded, and therefore treated as a space. Manually encoding the parameter results in double encoding.

After a little more research, the problem appears to be in UriUtils, where the '+' sign is being cleared from the BitSet of characters to be encoded for the QUERY_PARAM BitSet.

QUERY_PARAM.clear('+');

@spring-projects-issues
Copy link
Collaborator Author

Arjen Poutsma commented

URI encoding is a difficult process. UriUtils tries to guess the various URI components (path, query, etc) by using a regular expression. This guessing is by no means foolproof, but it happens to work in most cases, and it's the best we can do.

The way to work around the problem you encountered is to construct an java.net.URI object with the desired URL, like so:

RestTemplate restTemplate = new RestTemplate();
URI u = new URI("http://www.google.com/search?q=+obama+president");
String result = restTemplate.getForObject(u, String.class);

When a URI is passed to the RestTemplate, it is not encoded, but treated as is.

Read http://blogs.msdn.com/b/oldnewthing/archive/2010/03/31/9987779.aspx for more information on URI encodings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: bug A general bug
Projects
None yet
Development

No branches or pull requests

2 participants