-
Notifications
You must be signed in to change notification settings - Fork 179
URI encoding query strings to data servers #1144
Comments
This has begun to occur because Tomcat changed a default about |
Is it for certain that doing the proper URL encoding will break TDS running on older Tomcat versions? It seems like having the URL encoded should always be ok. |
As for older servers, we would need to stand up such a server. Not sure how far back |
HI folks, I have spent quite a bit of time looking into this issue in Hyrax and I would like to share my findings. The change in Tomcat was not actually a change in "what characters are legal". These characters were never legal unless URI encoded. Unfortunately, Tomcat (and pretty much every other web server) was lenient w.r.t. issue. And, over time, clients became non-conformant. It is the client application's responsibility to ensure that the URL content is encoded correctly before request transmission. Not doing this is now considered a significant security issue and all major web servers are moving to block unencoded URL content. My first inclination was also to reconfigure Tomcat to allow the verboten characters. But after discussing this with James and several others we came to the conclusion that suggesting to sys admins that they lower the security barriers in order to accept unencoded content from non-conformant clients doesn't seem to be an advisable path forward. (And imagine the fallout if you make this suggestion, they do it, and then they get compromised. Ouch.) And I think it is not necessary. Older servers should accept the encoded content just fine as this has always been a part of the specification that clients transmit correctly encoded URL content. Tomcat et al. have been lenient, but they do also accept correct inputs. For example, our test server, test.opendap.org, is running an older Hyrax (1.14.0) and Tomcat-7.0.90. Both encodings work:
(Seems like either GitHub's link builder or my browser was encoding all of the URLs in my examples so I buried them in code blocks to stop it. hah.) However, this test mule is running Hyrax on top of Tomcat-8.5.33 and only the encoded URL will prevail:
One thing that I found a bit vexing is that Tomcat only returns HTTP headers when the encoding is rejected:
Which can make understanding the problem on the client side difficult. I think Robert is dead on with this:
I hope you will consider revisiting this before choosing not to make the change. Sincerely, |
Certainly we need to fix the server to be able to handle url encoded requests, and we need netCDF-Java to make encoded requests by default (perhaps provide a signature to allow non-encoded requests). Dennis - can you investigate this on 5.0 first? Ideally, this should be done on the 4.6.x branch as well if possible (will make a 4.6.12 release as soon it is in). |
ok |
Is there anything to fix in TDS? If you enter a DAP URL into a browser, the browser encodes the URL before making the request. I haven't yet had trouble mucking about with DAP URLs in a browser. So, I suspect TDS handles encoded DAP URLs. |
From Dennis comment above:
So I was basing my comment of fixing the server off of that. |
Did I misinterpret what failure we were seeing? |
The problem that Robert originally brought is that the java-netcdf library is no longer encoding DAP request URLs where previously it did. (And it should still.) Dennis stated that doing so would confound the TDS. I have not tested the TDS. But I can say that in Hyrax I have had to make 0 changes to accommodate the encoded URLs - Not in the OLFS (java servlet in Tomcat) or in our c++ BES code. For what it's worth, I checked my code in a debugger and it's clear that Tomcat is doing the decoding. For example |
When I first came to Unidata, the fact that the DAP2 protocol used In any case, it will take me a while to reconstruct what was the problem |
A few points to clarify how and why OPeNDAP's C++ server-side software uses the %hh encoding, or more to the point, takes advantage of its presence. We don't use the encoding in any way that is incompatible with the web, so there's no reason that encoding a URL will break any OPeNDAP software (assuming that the software is correctly written, but that's a given). In our C++ server software (code that is run long after the information has been read off the wire and passed down through several layers), the 'Web URL encoding' is removed. Dennis, I think you may be remembering that the software does this in two stages: first all but the spaces are decoded, then the constraint expression is parsed, and then spaces are treated. So, in that very limited sense, our software uses the encoding. But this happens far down inside the server. Properly encoding a URL in the client will in no way affect this. As for the TDS code, it's hard to imagine that correctly encoding URLs would break it. Certainly, web browsers feed the server URLs and they encode the characters. |
@dmh - any updates? I'll be making a PR that make sure the urls sent by the OPeNDAP Dataset Access Form are encoded properly, as the requests made by the The bigger problem lies in the I noticed in 4.6 and 5.0 that there is a comment in the * Note that the term legalurl in the following code means that the url has
* reserved characters within identifiers in escaped form. This is
* particularly an issue for queries. Especially: ?x[0:5] is legal and the
* square brackets need not be encoded. We need to start encoding those square brackets. Once this is fixed, I'll cut a 4.6.12 release and a 5.0.0-beta6 release. |
I knew I forgot something :-) |
SO I added encoding in the branch fixurlencode.tmp |
Well, heck - it looks like they return a Request Headers2 =
Status = 404 HTTP/1.1 404 Not Found
Status Line = HTTP/1.1 404 Not Found
Response Headers =
Date: Fri, 02 Nov 2018 22:18:27 GMT
Server: Ingrid 0.9
Mime-Version: 1.0
<snip>
ResponseBody---------------
Error {
code = 3;
message ="The identifier `olr%5B0:1:0%5D%5B0:1:72%5D%5B0:1:143%5D' is not in the dataset.";
}; We could start getting all kinds of hack-y here, and check the headers for Does anyone have a connection to the developers of Ingrid? This is the first I've personally encountered it. @ethanrd? |
DO we have any idea what kind of server/servlet setup they are using? Does not look |
According to http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.GLOBAL/.daily/dods.ver:
I can't seem to find anything that looks like what we are seeing here. The closest would be https://www.ingrid-oss.eu/latest/index.html, but I'm pretty sure that's not right (they are at version 4.5 of their software stack); that looks very |
Mybe @rabernat knows someone with more information about INGRID? |
Benno Blumenthal was the original author of the Ingrid data system. I believe he, for personal reasons, stopped working on Ingrid several years ago. I don't know who might have taken over. |
@DennisHeimbigner Since the URL you gave works in the browser, I would check the tests again. (FYI the link text is not encoded but the link URL is already encoded. So following the link ends up with the browser double encoding the URL.) |
Sorry, I do not understand your comment. Who is double encoding? ncdump? the web browser? |
@ethanrd - I think I might have double encoded using toolsUI (looks like it might be encoding in the UrlDump tab), but I don't think our tests are: thredds/opendap/src/test/java/ucar/nc2/dods/TestUserProblems.java Lines 37 to 56 in 11b9d63
|
Currently, I am encoding down in HTTPMethod.java, so |
@DennisHeimbigner Sorry, I was waiting for a plane and writing on my phone for that last, very terse message. I just meant that, since browsers encode URLs, if the Ingrid URL with square brackets works in a browser, the server probably is not the problem. |
I really think this is a server problem. Using curl without encoding works: curl -G -v http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.GLOBAL/.daily/dods.dods?olr[0:1:0][0:1:2][0:1:3]
* Trying 129.236.110.35...
* Connected to iridl.ldeo.columbia.edu (129.236.110.35) port 80 (#0)
> GET /SOURCES/.NOAA/.NCEP/.CPC/.GLOBAL/.daily/dods.dods?olr[0:1:0][0:1:2][0:1:3] HTTP/1.1
> Host: iridl.ldeo.columbia.edu
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< XDODS-Server: Ingrid/2.15
< Date: Mon, 05 Nov 2018 16:28:53 GMT
< Server: Ingrid 0.9
< Mime-Version: 1.0
< Access-Control-Allow-Origin: *
< Cache-Control: public
< Content-Description: dods_data
< Content-Type: application/octet-stream
< Last-Modified: Mon, 30 Apr 2018 07:04:57 GMT
< Content-Length: 321
< X-Cache: MISS from gfs2geo2.ldeo.columbia.edu
< X-Cache-Lookup: MISS from gfs2geo2.ldeo.columbia.edu:3128
< X-Cache: MISS from iridls1.ldeo.columbia.edu
< X-Cache-Lookup: MISS from iridls1.ldeo.columbia.edu:80
< Via: 1.0 gfs2geo2.ldeo.columbia.edu:3128 (squid/2.7.STABLE9), 1.0 iridls1.ldeo.columbia.edu:80 (squid/2.7.STABLE9)
<
Dataset {
Grid {
ARRAY:
Int16 olr[time = 1][lat = 3][lon = 4];
MAPS:
Float32 time[time = 1];
Float32 lat[lat = 3];
Float32 lon[lon = 4];
} olr;
} daily;
Data:
* Connection #0 to host iridl.ldeo.columbia.edu left intact
� � �C� � �B B B � � @ @ @ but with encoding fails: curl -G -v http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.GLOBAL/.d
aily/dods.dods --data-urlencode "olr[0:1:0][0:1:2][0:1:3]"
* Trying 129.236.110.35...
* Connected to iridl.ldeo.columbia.edu (129.236.110.35) port 80 (#0)
> GET /SOURCES/.NOAA/.NCEP/.CPC/.GLOBAL/.daily/dods.dods?olr%5B0%3A1%3A0%5D%5B0%3A1%3A2%5D%5B0%3A1%3A3%5D HTTP/1.1
> Host: iridl.ldeo.columbia.edu
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Date: Mon, 05 Nov 2018 16:30:44 GMT
< Server: Ingrid 0.9
< Mime-Version: 1.0
< Access-Control-Allow-Origin: *
< Cache-Control: public
< Content-Type: text/plain
< XDODS-Server: Ingrid/2.15
< Content-Description: dods_error
< X-Cache: MISS from gfs2geo2.ldeo.columbia.edu
< X-Cache-Lookup: MISS from gfs2geo2.ldeo.columbia.edu:3128
< X-Cache: MISS from iridls1.ldeo.columbia.edu
< X-Cache-Lookup: MISS from iridls1.ldeo.columbia.edu:80
< Transfer-Encoding: chunked
< Via: 1.0 gfs2geo2.ldeo.columbia.edu:3128 (squid/2.7.STABLE9), 1.0 iridls1.ldeo.columbia.edu:80 (squid/2.7.STABLE9)
< Connection: close
<
Error {
code = 3;
message ="The identifier `olr%5B0%3A1%3A0%5D%5B0%3A1%3A2%5D%5B0%3A1%3A3%5D' is not in the dataset.";
};
* Closing connection 0 |
I'll try pinging some people from the IRI Ingrid world to see if they have any insights on this issue: |
@lesserwhirls Yes, similar to the results when pasting the URL into a browser. I believe both the browser and curl are encoding the URL behind the scenes. So, for your first request with the unencoded URL [1], the server gets the encode URL [2] which it unencodes back to the unencoded URL [1]. So the query part the server interprets is a valid DAP constraint expression and it succeeds. For your second request with the encoded URL [2], the server gets a double encoded URL [3] which the server unencodes back to the encoded URL [2]. The server tries to interpret that as a DAP constraint and fails. [1] Unencoded URL: |
Which issue?
Ángel
… On Nov 5, 2018, at 13:03, Ryan Abernathey ***@***.***> wrote:
I'll try pinging some people from the IRI Ingrid world to see if they have any insights on this issue:
@1jdelcorral, @remicousin <https://github.com/remicousin>, @agmunozs <https://github.com/agmunozs>, @ikhomyakov <https://github.com/ikhomyakov>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1144 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AaOZhkuX5DljH4nlpiYEsQ6w-2CLwlVeks5usH1wgaJpZM4WfnnD>.
|
@agmunozs We noticed after enforcing URL encoding by netCDF-java (which sits on top of the Apache |
Ethan- |
@DennisHeimbigner Yes, I talked to Sean about this offline and he convinced met that Curl and browsers don't encode URLs (or at least different browsers do it differently). It makes sense to me that libcurl would leave that to the caller. But I'm surprised that browsers don't encode URLs. From previous run-ins with this issue, I believe browsers used to encode URLs. |
@DennisHeimbigner - let's take your fix as is, broken tests and all. Unless you are super bored, I can add some code to look for a Java system property (something like |
re: Issue #1144 For DAP2 and DAP4 requests, encode at least the query part because the square brackets are no longer being accepted by the default Tomcat setup. Turns out that some servers (e.g. http://iridl.ldeo.columbia.edu) do not handle this correctly so some provision will have to be made for this at some point. In any case, the change is a one-liner in HTTPMethod.java, line 213: - this.methodurl = HTTPUtil.parseToURI(url); /// validate + this.methodurl = new URI(Escape.escapeURL(url)); /// validate Misc. Unrelated changes: 1. Modify DatasetUrl.java to add a heuristic for choosing the protocol by searching the path for the string "/thredds/XXX/" where XXX is some protocol like dodsC or dap4.
Ok, netCDF-Java will now, by default, encode URLs. This has been address in the 4.6 and 5.0 branches. I will be cutting a We have found some servers that do not handle this (e.g. |
@lesserwhirls, Any chance the 5.0b6 will be posted Thursday? I'm on the verge of posting a Panoply bugfix release. |
@msdsoftware - there are a few 3rd party deps that need to be updated, but I will work hard to get 5.0b6 up today to make sure it's ready for you. |
I forget @msdsoftware - are you using |
@lesserwhirls, Yes, netCDFAll.jar is the only jar I actually use. Building via gradle would have been no problem, but thanks for providing a compiled binary. |
Great - let me know how it goes! |
A developer at OPENDAP was testing out my Panoply app and reported a problem with accessing data on his testbed server. The same issue arises when using IDV. It seems that when a variable in the remote dataset has been chosen and a subset of that variable is selected, the constraint query string sent to the server is not URI-encoded. Apparently due to security concerns some server software (e.g. Tomcat) is tightening up requirements and enforcing URI encoding rules on the incoming HTTP request.
Here's part of the stack trace from IDV 5.5:
A stack trace in Panoply using a current netCDF-Java 5.0 snapshot reveals a bit more depth:
The problem is that the query string in these requests look like
T250[0:1:0][0:1:360][0:1:575]
, but the :, [ and ] are reserved characters and need to be "% encoded".For the purposes of Panoply, I found I could hack
DODSNetcdfFile.readDataDDSfromServer
to alter the CE (constraint expression) viaCE = java.net.URLEncoder(CE,"UTF-8");
before making the call tododsConnection.getData(CE,null)
at line 1436. The request went through without complaint and the data came back.However, I'm wondering if this is the best place for that fix. Perhaps encoding the CE should be done within
DConnect2
, perhaps within itsgetCompleteCE(String)
method?BTW: The OPENDAP developers found that a 4-year-old version of Panoply based on netCDF-Java 4.3.22 properly encoded the URI when communicating with their server. Further testing on my part found found that this problem arose when my code started using NJ 4.5.x.
The text was updated successfully, but these errors were encountered: