Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected encoding in request.getPathInfo() with Jetty 12 beta 0 #9444

Closed
Artur- opened this issue Feb 28, 2023 · 30 comments · Fixed by #9455
Closed

Unexpected encoding in request.getPathInfo() with Jetty 12 beta 0 #9444

Artur- opened this issue Feb 28, 2023 · 30 comments · Fixed by #9455
Assignees
Labels
Bug For general bugs on Jetty side Specification For all industry Specifications (IETF / Servlet / etc)

Comments

@Artur-
Copy link
Contributor

Artur- commented Feb 28, 2023

Jetty version(s)
12.0.0.beta0

Java version/vendor
openjdk version "19.0.1" 2022-10-18

OS type/version
mac os

Description
When requesting a URL like

http://localhost:8888/view/VAADIN/dynamic/resource/5/4a2dc788-7f04-43e4-8471-70d713d6d001/file%20name

with a servlet mapping of /view/* then request.getPathInfo() in

Jetty < 12 returns "/VAADIN/dynamic/resource/5/4a2dc788-7f04-43e4-8471-70d713d6d001/file name" while
Jetty 12 returns "/VAADIN/dynamic/resource/5/4a2dc788-7f04-43e4-8471-70d713d6d001/file%20name"

Is this an intentional change in some spec or accidental?

This breaks handling of resources like "file name" in Vaadin when run in 12 beta0 as Vaadin assumes the path info is decoded and then encodes it to match with a stored, encoded value.

@Artur- Artur- added the Bug For general bugs on Jetty side label Feb 28, 2023
@joakime
Copy link
Contributor

joakime commented Feb 28, 2023

Which ee level are you using when this happens?

@Artur-
Copy link
Contributor Author

Artur- commented Feb 28, 2023

jetty-ee10-maven-plugin

@joakime
Copy link
Contributor

joakime commented Feb 28, 2023

I suspect this is some fallout from the discussion at jakartaee/servlet#18 with Servlet 6.x

@joakime joakime added the Specification For all industry Specifications (IETF / Servlet / etc) label Feb 28, 2023
@Artur-
Copy link
Contributor Author

Artur- commented Feb 28, 2023

So jakartaee/servlet@e1292f9 changes the text for getPathInfo from

a <code>String</code>, decoded by the web container

to

a <code>String</code> specifying extra path information that comes after the servlet path but before the query string in the request URL

which seems to indicate that this has changed. But then there is

This method will not return any encoded characters unless the container is configured specifically to allow them.

which I am not sure what it tries to say

@joakime
Copy link
Contributor

joakime commented Feb 28, 2023

This section of the spec is a bit confusing. Bear with us.

Until we get a definitive answer ...

If you write Vaadin for multiple containers, be aware that this part of the spec says that the container can decide what to return here (regarding what's encoded or not).

@Artur-
Copy link
Contributor Author

Artur- commented Feb 28, 2023

the container can decide what to return here (regarding what's encoded or not).

That sounds horrible

I tested with https://github.com/Artur-/servlet-demo on Jetty 12 beta 0 and Tomcat 10.1.1 to request "hello world"

Server name localhost
Server info jetty/12.0.0.beta0

Request URI: /hello%20world
Request URL: http://localhost:8080/hello%20world
Request path info: /hello%20world
Request path translated: null
Request context path: 
Server name localhost
Server info Apache Tomcat/10.1.1

Request URI: /servlet-demo/hello%20world
Request URL: http://localhost:8080/servlet-demo/hello%20world
Request path info: /hello world
Request path translated: /.../apache-tomcat-10.1.1/webapps/servlet-demo/hello world
Request context path: /servlet-demo

@gregw
Copy link
Contributor

gregw commented Feb 28, 2023

The ability to return encoded characters is intended for characters like %2f, which will normally be rejected.

I think we should be decoding %20 as it is not ambiguous to do so.

@gregw
Copy link
Contributor

gregw commented Feb 28, 2023

Ah we have messed up on %20 at least. It is meant to be decoded in getCanonicalPath, but sometimes it is not... because we are getting confused that space is not a safe character for a URI itself. However, it is perfectly safe to decode as it can be unambiguously re-encoded.

Will need to review (again) before beta1

@gregw
Copy link
Contributor

gregw commented Feb 28, 2023

So our concept of a HttpURI.getCanonicalPath is a to provide a uniquely encoded path that can be passed to the JVM URI class in order to locate resources in the FileSystem. This is fine for encoding characters like '' and ';' which are strange to be in a URI in the first place, but it is a bit more problematic for space, which is commonly encoded in a URI.

In our core request class, we just provide a HttpURI, so our handlers can decide if they want the decoded path (unsafe and ambiguous) or the canonical path (safe and non-ambiguous). We obviously use the later for our own handlers (e.g. ResourceHandler).

The issue is when we come to the servlet API, the getPathInfo method is meant to be fully decoded, except that servlet 6.0 provided some exceptions for ambiguous characters like \. However, it also allows for a container to just reject requests with such ambiguous characters. But space is not ambiguous, it is just not acceptable for the JVM URI class which will ultimately be used by the DefaultServlet to serve resources.

In our ee9 servlets, I believe we are further decoding the canonical URI (as we always have done) to the ambiguous form and trusting the URI violations reject bad requests.

In ee10, we are just using the canonicalPath. I think we need to add a UriUtil.nonAmbiguousDecode method that will decode %20, but leave %2F encoded. We can use that for ee10 getPathInfo so it will only return encoded characters in the unsafe modes if the violations are set to allow those requests through.

@sbordet sbordet linked a pull request Mar 1, 2023 that will close this issue
gregw added a commit that referenced this issue Mar 3, 2023
Added URIUtil.decodeSafePath for EE10, to allow for %2F and %25 to remain encoded in the servlet API.
Fixed async dispatch to also safeDecode
Updated tests to expect decoded space
Apply suggestions from code review

Co-authored-by: Simone Bordet <simone.bordet@gmail.com>
@gregw gregw closed this as completed Mar 3, 2023
@gregw
Copy link
Contributor

gregw commented Mar 3, 2023

@Artur- the fix is merged in head now. It would be great if you could test it before we do a beta1 release in the next few weeks.

@Artur-
Copy link
Contributor Author

Artur- commented Mar 3, 2023

Is there a public snapshot I can test?

@gregw
Copy link
Contributor

gregw commented Mar 3, 2023

It is currently building after the merge, but should be available from https://jenkins.webtide.net/blue/organizations/jenkins/jetty.project/detail/jetty-12.0.x/1083/artifacts

@joakime
Copy link
Contributor

joakime commented Mar 3, 2023

We have nightly snapshot artifacts present at https://oss.sonatype.org/content/repositories/jetty-snapshots/

But that only occurs once a day, at midnight.

@joakime
Copy link
Contributor

joakime commented Mar 3, 2023

Reopening, as this is now unsafe.

Lets take a look at these scenarios with the current implementation from PR #9455

url-pattern request path servlet-path path-info
/foo/* /foo/bar%2Fzed /foo /bar%2Fzed
/foo/* /foo/bar%252Fzed /foo /bar%2Fzed

@joakime joakime reopened this Mar 3, 2023
@joakime
Copy link
Contributor

joakime commented Mar 3, 2023

I'm introducing some test cases right now (which caught this case)

joakime added a commit that referenced this issue Mar 3, 2023
 * will use replacement characters instead of throwing an error.
gregw added a commit that referenced this issue Mar 7, 2023
getServletPath and getPathInfo will never return an encoded path segment. Instead, they will throw an IllegalArgumentException if they are called when there is a URI with violations.
gregw added a commit that referenced this issue Mar 7, 2023
getServletPath and getPathInfo will never return an encoded path segment. Instead, they will throw an IllegalArgumentException if they are called when there is a URI with violations.
gregw added a commit that referenced this issue Mar 7, 2023
getServletPath and getPathInfo will never return an encoded path segment. Instead, they will throw an IllegalArgumentException if they are called when there is a URI with violations.
gregw added a commit that referenced this issue Mar 7, 2023
getServletPath and getPathInfo will never return an encoded path segment. Instead, they will throw an IllegalArgumentException if they are called when there is a URI with violations.
gregw added a commit that referenced this issue Mar 8, 2023
getServletPath and getPathInfo will never return an encoded path segment. Instead, they will throw an IllegalArgumentException if they are called when there is a URI with violations.
gregw added a commit that referenced this issue Mar 8, 2023
getServletPath and getPathInfo will never return an encoded path segment. Instead, they will throw an IllegalArgumentException if they are called when there is a URI with violations.
gregw added a commit that referenced this issue Mar 8, 2023
gregw added a commit that referenced this issue Mar 9, 2023
Fixed ee9 tests
gregw added a commit that referenced this issue Mar 9, 2023
Optionally throw on decode of ambiguous URIs
gregw added a commit that referenced this issue Mar 10, 2023
Optionally throw on decode of ambiguous URIs
gregw added a commit that referenced this issue Mar 10, 2023
updates from review. Extra testing.
gregw added a commit that referenced this issue Mar 10, 2023
updates from review.
@Artur-
Copy link
Contributor Author

Artur- commented Mar 13, 2023

I re-ran our tests today with the lastest Jetty 12 snapshot and it seems all tests pass except one that hits HTTP ERROR 400 Ambiguous URI path encoding.

Looks like the test registers a route using the path

special åäö $%20'´`

which then gets encoded into

special%20%C3%A5%C3%A4%C3%B6%20$%2520'%C2%B4%60

and when the URL (http://localhost:8888/context-path/special%20%C3%A5%C3%A4%C3%B6%20$%2520'%C2%B4%60) is opened, the response is

HTTP ERROR 400 Ambiguous URI path encoding
URI:	/badURI
STATUS:	400
MESSAGE:	Ambiguous URI path encoding
CAUSED BY:	org.eclipse.jetty.http.BadMessageException: 400: Ambiguous URI path encoding
Caused by:
org.eclipse.jetty.http.BadMessageException: 400: Ambiguous URI path encoding
	at org.eclipse.jetty.server.internal.HttpConnection$HttpStreamOverHTTP1.headerComplete(HttpConnection.java:1223)
	at org.eclipse.jetty.server.internal.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1008)
	at org.eclipse.jetty.http.HttpParser.parseFields(HttpParser.java:1245)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1536)
	at org.eclipse.jetty.server.internal.HttpConnection.parseRequestBuffer(HttpConnection.java:626)

@joakime
Copy link
Contributor

joakime commented Mar 13, 2023

What is /badURI and what are its url-patterns ?

@joakime
Copy link
Contributor

joakime commented Mar 13, 2023

note that the 12.0.0.beta0 doesn't have the latest decoding fixes on jetty-12.0.x/HEAD

@Artur-
Copy link
Contributor Author

Artur- commented Mar 13, 2023

It is run using

[INFO] jetty-12.0.0-SNAPSHOT; built: 2023-03-13T00:09:34.359Z; git: bd0186c2f78fb7c87c7bfadf9b0a970657d071f3; jvm 19.0.2

as

            <plugin>
                <groupId>org.eclipse.jetty.ee10</groupId>
                <artifactId>jetty-ee10-maven-plugin</artifactId>
                <configuration>
                    <webApp>
                        <!-- use a non-root context -->
                        <contextPath>/context-path</contextPath>
                    </webApp>
                </configuration>
            </plugin>

together with

@WebServlet(urlPatterns = { "/*" })
public class DefaultMappingServlet extends VaadinServlet {
}

badURI seems to be a Jetty thing https://github.com/eclipse/jetty.project/blob/bd0186c2f78fb7c87c7bfadf9b0a970657d071f3/jetty-core/jetty-server/src/main/java/org/eclipse/jetty/server/internal/HttpConnection.java#L1077

gregw added a commit that referenced this issue Apr 10, 2023
gregw added a commit that referenced this issue Apr 11, 2023
@Artur-
Copy link
Contributor Author

Artur- commented Jul 6, 2023

Was there any conclusion for this? It seems like 12 beta3 works like Jetty 11 except it disallows using an encoded %20, i.e. %2520 in the URL path, so I deleted this case from our tests

@gregw
Copy link
Contributor

gregw commented Jul 7, 2023

@artur, By default we are not allowing %25 in a URI, as it can cause problems if that decoded string is passed to other APIs that do not know that the % is already decoded and thus try to decode it again.

However, we have a violation mode AMBIGUOUS_PATH_ENCODING, that can be allowed. So a server can be configured to allows %25.
This can be used by setting the URLCompliance mode to the string "RFC3986,AMBIGUOUS_PATH_ENCODING". We should probably javadoc the ramifications of using this violation more (i.e. the deployer needs to check that the path is not given to any API that will decode the %.

gregw added a commit that referenced this issue Jul 7, 2023
Improve javadoc of Violations for #9444
@gregw
Copy link
Contributor

gregw commented Jul 7, 2023

@Artur- See #10079 which improves the javadoc of the Violations. We will also improve the actual external documentation soon (@sbordet tell me when the structure is there for the new doco and I'll write about URI Violations).

gregw added a commit that referenced this issue Jul 10, 2023
Improve javadoc of Violations for #9444
@Artur-
Copy link
Contributor Author

Artur- commented Aug 14, 2023

Do I understand correctly that the consequence of this is that you cannot (by default) load a file with a name that contains %. So for instance if I have src/main/webapp/all done 100%.png

then a request to http://localhost:8080/all done 100%.png is translated by Chrome into http://localhost:8888/all%20done%20100%.png and then Jetty responds with

org.eclipse.jetty.http.BadMessageException: 400: Bad Request
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1654)
	at org.eclipse.jetty.server.internal.HttpConnection.parseRequestBuffer(HttpConnection.java:626)
	at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:460)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:410)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
	at java.base/java.lang.Thread.run(Thread.java:1623)
Caused by: java.lang.NumberFormatException: !hex .
	at org.eclipse.jetty.util.TypeUtil.convertHexDigit(TypeUtil.java:416)
	at org.eclipse.jetty.http.HttpURI$Mutable.parse(HttpURI.java:1273)
	at org.eclipse.jetty.http.HttpURI$Mutable.pathQuery(HttpURI.java:935)
	at org.eclipse.jetty.http.HttpURI.build(HttpURI.java:120)
	at org.eclipse.jetty.server.internal.HttpConnection$HttpStreamOverHTTP1.<init>(HttpConnection.java:1145)
	at org.eclipse.jetty.server.internal.HttpConnection.newHttpStream(HttpConnection.java:186)
	at org.eclipse.jetty.server.internal.HttpConnection$RequestHandler.startRequest(HttpConnection.java:996)
	at org.eclipse.jetty.http.HttpParser.parseLine(HttpParser.java:825)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1538)

If I manually URI encode it and instead request http://localhost:8080/all%20done%20100%25.png then I get

org.eclipse.jetty.http.BadMessageException: 400: Ambiguous URI path encoding
	at org.eclipse.jetty.server.internal.HttpConnection$HttpStreamOverHTTP1.headerComplete(HttpConnection.java:1226)
	at org.eclipse.jetty.server.internal.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1012)
	at org.eclipse.jetty.http.HttpParser.parseFields(HttpParser.java:1252)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1545)
	at org.eclipse.jetty.server.internal.HttpConnection.parseRequestBuffer(HttpConnection.java:626)
	at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:460)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
	at java.base/java.lang.Thread.run(Thread.java:1623)

@joakime
Copy link
Contributor

joakime commented Aug 14, 2023

There are a lot of gotchas when relying on the Chrome location bar encode URLs properly.
For example:

  • An open Bug that shows that the non-reserved "<>^ {|} %+ do not encode in some situations (location bar, gurl/kurl, javascript, application launcher, and even the protocol handlers)
  • An open Bug that shows the location bar sometimes encodes with ISO-8859-1 and sometimes with UTF-8 (the rules on which is used is complicated), which means the char ë can wind up encoded as %EB and some other times as %C3%AB

You really shouldn't rely on that Chrome behavior.

Now, the other URL you posted with http://localhost:8080/all%20done%20100%25.png is triggering https://tools.ietf.org/html/rfc3986#section-3.3 rules on ambiguous path encoding. The %25 with no hex numbers after it causes that.

This rule can be configured, btw.
See the UriCompliance rules on set on the HttpConfiguration.setUriCompliance(UriCompliance) object if using jetty embedded.
Or the jetty.httpConfig.uriCompliance property if using the jetty-home setup (see the your ${jetty.base}/start.d/server.ini for help).

@gregw
Copy link
Contributor

gregw commented Aug 23, 2023

Closing this as I believe we are now doing as best we can. Please reopen if there are more problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side Specification For all industry Specifications (IETF / Servlet / etc)
Projects
None yet
3 participants