-
Notifications
You must be signed in to change notification settings - Fork 38.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RestTemplate should encode the url variables [SPR-5516] #10187
Comments
Tareq Abedrabbo commented Fix confirmed. Thanks! |
Chantal Ackermann commented Hi there! I think this URL encoding does not always work correctly. I am using Spring 3.0RC1. My URL is partially encrypted. The encrypted part, not URL encoded, looks like that: /jvg+37xRX4tKphEsdgtSMg== When I add it to the path like that (as part of the URL, or as entry in the variables map, doesn't matter) and post it using restTemplate.getForObject(), the server access log shows: However, I do want to have it URL encoded. So, I encoded it myself using java.net.URLEncoder.encode(str, "utf-8"). This gives me: BUT - if I send this as part of the URL using the RestTemplate, than the access log shows: I cross checked the URL from a browser manually, and then it works. It is already working in the development application which does not use RestTemplate but its own HttpClient based implementation. I ran into this because I want to use the RestTemplate for unit testing, and I actually would like to swich the application to using that template, as well. But it seems that I cannot prevent that this URL gets messed up by the RestTemplate? |
Arjen Poutsma commented After RC1, I introduced method variants on RestTemplate which take a java.net.URL as argument (and hence no URI templates). When using these variants, no URL encoding is performed. You can get a recent snapshot (or wait till RC2), and see if that works better. |
Chantal Ackermann commented Hi Arjen, thank you for answering this quickly. I had a look at the current RestTemplate at: https://fisheye.springsource.org/browse/spring-framework/trunk/ (mind the gap) I am quite convinced that these new methods will solve my problem. However, it does feel like a work-around, doens't it? Because the problem with the double encoding when using the "old" methods still remains? So, I went back looking at the source code, and there are some things I would like to bring up. (I don't want to blow up this issue, but this framework is too good not to try making it perfect - if I am mistaken, than all for the better.) I had a look at UriTemplate, especially encodeURI(String) from RC1, and at java.net.URL from 1.6.0. a) UriTemplate.encodeURI(String) simply splits the String into the different URI parts, and then uses the java.net.URI() constuctors with 3, 4 or 5 arguments. All of the URI() constructors just call a special toString() method that concatenates all of the provided parts back together, and then uses the inner class Parser to split it all again. b) Neither RestTemplate nor UriTemplate (nor any other Spring class in the game) does apply java.net.URLEncoder.encode() onto the input url string or the variables. They seem to rely on java.net.URI() to encode the URL, is this impression correct? But: URI$Parser does not do an URL encoding over the complete URL. It just scans through the URL by char and escapes any char that is not in the correct range. This applies to '%' but not to '+' and '='. And therefore, as an URL encoded URL can contain '%' those get escaped - a second time. But a URL that is not URL encoded and contains '+' and alike won't be encoded by that scan. c) When splitting UriTemplate.encodeURI(String) checks first for the (first) occurrence of a colon. An URI, however, can have two colons, and if only one is present, it could occur at two different locations - couldn't it (scheme <> port)? Does the splitting in UriTemplate.encodeURI() still work for inputs like "localhost:8080/myServlet"? (Sorry that I haven't checked that myself, yet.) Sorry for the longish comment, but I found it worth asking... |
Chantal Ackermann commented the underline is triggered by the plus sign (followed by quotes?) : +++ |
Arjen Poutsma commented
We do not assume our own rules on URIs, we conform to RFC 2396 and rely on java.net.URI to do the heavy lifting. If you read the javadoc of URI, you will see a section on Escaped octets, quotation, encoding, and decoding. Furthermore, in the URI constructors description, you will see that the scheme-specific part (path, query string) is quoted (i.e. encoded). So in fact, that constructor does a bit more than concatenating and splitting: it encodes (the significant parts) too.
Indeed, we rely on URI to do our encoding, and not URLEncoder. URLEncoder should have been named HtmlFormEncoder, since it encodes URLs based on the HTML spec, while we need RFC 2396 compliance. And that's why we use the URI constructor, as described above. The problem with double-encodings is due to the RestTemplate API, and not easily solved. We assume that passed Strings are not encoded (yet), and that's why we encode them. It would be nice if we could detect whether the given string is already encoded, but that's impossible, because people actually want to pass double-encoded URLs (I've had requests for this to interact with weird web servers). With the URI-based methods, we can assume that they are encoded already, since it is impossible to constructed a non-encoded URI object. That's why I added those methods. Alternatively, I could have added a boolean "encoded" parameter to every method, but that would double the amount of methods, which is not an option. Switching to URLEncoder will not solve this either, since
happily prints out
which is even worse.
prints out
No problem. I will add some more javadoc to RestTemplate that elaborates on encoded URLs. |
Chantal Ackermann commented Hi Arjen, thanks a lot for your time and consideration. Please don't be offended.
As I understand it, URI takes care that only those characters that are allowed as part of an URI are used. All others are escaped. The problem is with URIComponents that make use of special characters that are allowed by the URI specification but have a special meaning as part of a URL. Symbols like + and =, for example. You want URLEncoder for that, and therefore the name seems fitting. It encodes + and =, while URI does not. Maybe I misread this whole issue. I thought it was about exactly this: preserving the meaning of the variables passed as arguments to the RestTemplate methods: "RestTemplate should encode the url variables". This issue is not about the URI as such? Escaping url variables (= URI components) does include escaping the special characters that structure a URL, like + and =, doesn't it? I'll paste some more test results to help clarify my point. Here the outcome of the tests when using Spring MVC DispatcherServlet (RC1) on the server side and RestTemplate on the client side:
So, the RestTemplate/URI does not escape the + and = because they are allowed as part of an URI.
This test passed:
I'm sorry to have elaborated on that even though it might not have been the goal of this issue. Regards! |
Arjen Poutsma commented
No problem. It takes quite a lot more to offend me ;). Let me take a different stab at this. The RestTemplate takes URLs as arguments. URLs conform to RFC 2396, and this RFC does not consider + and = to be special characters. In effect, RestTemplate works like a browser. If you enter http://example.com/jvg+37xRX4tKphEsdgtSMg== in your browser, you will see that it does not escape anything, because it does not have to. This is different than for instance http://example.com/foo bar, which will be escaped into http://example.com/foo%20bar. URLEncoder is meant to encode data (not just URLs) used in HTML forms. The major difference (in this case) between HTML forms and URLs is that HTML forms encode spaces as +, while the RFC uses %20. If you read the Javadoc for URLEncoder and URLDecoder, and also read the relevant section of the HTML spec, you will not find a single specific reference to URLs, except for a reference to the way some characters are encoded (and then only after spaces have been replaced with +'s). That's why I said it should have been named HtmlFormEncoder, it would have saved everybody a lot of confusion. Once again, you can see the difference in your browser. Go to Google, type in the query "foo bar" and submit. You will see that this will result in a GET request to http://www.google.com/search?q=foo+bar. That is the HTML spec in action: your browser takes the "foo bar" query, replaces the space with a +, and does the request. Google will look at the query part (q=foo+bar) and decodes this back again. If you submit "foo+bar", your browser will encode the + and do a request on http://www.google.nl/search?q=foo%2Bbar. In summary, the key difference here is between HTTP and HTML. RestTemplate is a HTTP client, but does not do HTML. As such, it does not treat +'s or ='s as special characters. If you do want special treatment (effectively mimicking a HTML form submittal), you will have to do the encoding manually, and submit the subsequent URL. Just like your browser does when it submits the Google form. If anything, the issue you are having seem to be on the server side (in DispatcherServlet), where we seem to do too aggressive decoding by using URLDecoder for any request URL (see UrlPathHelper#decodeRequestString). Hence |
Chantal Ackermann commented Ok, I am getting your point. Thanks for that helpful explanation. I've tried around with the plus sign in my browsers. It is never escaped as a literal plus sign.
Yuk. I'm not sure about that. I am not escaping spaces with + or %20 by default. It's not something the average browser user would do, wouldn't they? Wouldn't they rather mean a plus sign if they put it in....?
No, I've tried that. I am not able to trigger that conversion, neither on Firefox nor on Safari. (There are possiblity a lot of combinations I haven't tried, though.)
Is it really an error on the server side? I wondered about that when I described the test. But after your last comment I am not sure anymore. This makes me look forward to a Spring based solution that takes care of these conversion problems... But as it requires a solution that works for different types of web server / client combinations (some of them not using spring), and frontends that dynamically construct URLs in different layers including JavaScript, this certainly is not an easy task. |
Arjen Poutsma commented I've confirmed the bug in Spring MVC, and created #10957 to fix it. It seems that the server-side components are too aggressive when decoding request URIs, which is probably the cause of your problems. Specifically, the UrlPathHelper should not decode + into space. As for your other comments: this is the behavior I see when typing URLs in Safari (following links is not good enough, because HTML interpretation might get in the way): RestTemplate executes the same behavior in these two cases, and I see no reason to change it. When using Safari as a HTML interpreter (using Google search as an example), I see the following behavior: |
Chantal Ackermann commented Hi Arjen, thanks a lot for clearing things up! I'm sorry. I reproduced the test with the Google URL incorrectly. I used the URL address field of the browser directly instead of using the input field of Google's web page... Thanks again, great work! |
Matthijs Bierman commented How then would you send a '+' in your URL template? The encoding is not done properly if '+' occurs in a query parameter. Say you want to do a Google search for "+obama +president": You would pass the URL http://www.google.com/search?q={query} and pass the parameter in a map: Map<String,String> params = new HashMap<String,String>();
params.put("query","+obama +president"); This does not work as expected. The '+' is not encoded, and therefore treated as a space. Manually encoding the parameter results in double encoding. After a little more research, the problem appears to be in UriUtils, where the '+' sign is being cleared from the BitSet of characters to be encoded for the QUERY_PARAM BitSet. QUERY_PARAM.clear('+'); |
Arjen Poutsma commented URI encoding is a difficult process. UriUtils tries to guess the various URI components (path, query, etc) by using a regular expression. This guessing is by no means foolproof, but it happens to work in most cases, and it's the best we can do. The way to work around the problem you encountered is to construct an java.net.URI object with the desired URL, like so:
When a URI is passed to the RestTemplate, it is not encoded, but treated as is. Read http://blogs.msdn.com/b/oldnewthing/archive/2010/03/31/9987779.aspx for more information on URI encodings. |
Tareq Abedrabbo opened SPR-5516 and commented
The RestTemplate does not encode the url variables. The following line:
Results in an exception:
Affects: 3.0 M2
Issue Links:
0 votes, 5 watchers
The text was updated successfully, but these errors were encountered: