From bd12ce5ac4639bf59acf8da2cc18375b7aab050b Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Thu, 7 Oct 2021 17:27:47 +1100 Subject: [PATCH 01/42] Issue #18 URI path processing Issue #18 URI path processing. Moved the wiki text to a PR for better comment and suggestions. --- spec/src/main/asciidoc/servlet-spec-body.adoc | 76 +++++++++++++++++++ 1 file changed, 76 insertions(+) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 0b15645a9..a58d31079 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1300,6 +1300,82 @@ the header value to an `int`, a `NumberFormatException` is thrown. If the `getDateHeader` method cannot translate the header to a `Date` object, an `IllegalArgumentException` is thrown. +=== Request URI Processing +The process described here adapts and extends the URI canonicalization process described in [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986) to create a standard Servlet URI path canonicalization process that ensures that URIs can be mapped to Servlets, Filters and security constraints in an unambiguous manner. It is also intended to provide information to reverse proxy implementations so they are aware of how requests they pass to servlet containers will be processed. + +Servlet containers may implement the standard Servlet URI path canonicalization in any manner they see fit as long as the end result is identical to the end result of the process described here. Servlet containers may provide container specific configuration options to vary the standard canonicalization process. Any such variations may have security implications and both Servlet container implementors and users are advised to be sure that they understand the implications of any such container specific canonicalization options. + +==== URI Transport +===== HTTP/1.1 +The URI is extracted from the `request-target` as defined by [RFC 7230](https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.1). URIs in `origin-form` or `asterisk-form` are passed unchanged to stage 2. URIs in `absolute-form` have the protocol and authority removed to convert them to `origin-form` and are then passed to stage 2. URIs in `authority-form` are outside of the scope of this specification. + +===== HTTP/2 +The URI is the `:path` pseudo header as defined by [RFC 7540](https://datatracker.ietf.org/doc/html/rfc7540#section-8.1.2.3) and is passed unchanged to stage 2. + +===== Other protocols +Containers may support other protocols. Containers should extract an appropriate URI for the request from the protocol and pass it to stage 2. + +==== Separation of path and query +The URI is split by the first occurrence of any '?' character to path and query. The query is preserved for later handling and the following steps applied to the path. + +==== Discard fragment +A fragment in the path is indicated by the first occurrence of a `\#` character. Any `#` character and following fragment is removed from the path and discarded. + +==== Decoding of non-special characters +Characters other than `/`, `;` and `%` that are encoded in `%nn` form are decoded and the resulting octet sequences is treated as UTF-8 and converted to a character sequence. + +> Note that special characters cannot be part of a UTF-8 character sequence as all such sequences are comprised of negative octets. + +> Note this is not reserved characters as defined by RFC3986, as that does not include `%` and includes many characters we don't care about. Avoiding a second decoding is worthwhile. + +==== Collapse sequences of multiple `"/"` characters +> **WARNING** Swapping the order of stage 3 and stage 4 may be significant. Consider `"/aaa/bbb//../"`. + +> **TODO** Are we sure we don't want to do this in the other order? + +Any sequence of more than one `"/"` character in the URI must be replaced with a single `"/"`. + +==== Remove dot-segments+ +* A path not starting with "/" must be rejected with a 400 response. +* Sequences of the form `"/./"` must be replaced with `"/"`. +* Sequences of the form `"/" + segment + "/../"` must be replaced with `"/"`. +* If there is no preceding segment for a `".."` segment then return a 400 response. + +==== Removal of path parameters +A path segment containing the `";"` character is split at the first occurence of `";"`. The segment is replaced by the character sequence preceeding the `";"`. The characters following the `";"` are considered a path parameters and may be preserved by the container for later processing (eg `jsessionid`). + +> TODO How do we handle URIs like `/foo/;/bar`? I think as currently written we end up with `/foo//bar` ? + +==== Decoding of remaining `%nn` sequences +Any remaining `%nn` sequences in the path should be decoded. Some containers may be configured to leave some specific characters encoded (eg. the characters '/' and '%' may be left decoded by some container configuration). + +==== Mapping URI to context and resource +The decoded path is used to map the request to a context and resource within the context. This form of the URI path is used for all subsequent mapping (web applications, servlet, filters and security constraints). + +==== Rejecting Suspicious Sequences +If suspicious sequences are discovered during the prior steps, the request must be rejected with a 400 bad request using the error handling of the matched context. By default the set of suspicious sequences includes: + + * The encoded `"/"` character + * Any `"."` or `".."` segment that had a path parameter + * Any `"."` or `".."` segment with any encoded characters + * The `"\"` character encoded or not. + * Any control characters either encoded or not. + +A container or context may be configured to have a different set of rejected sequences. + +> TODO how do we define control characters? < 0x20? Is 0x7F (DEL) OK? + +> TODO should we also by default reject '\' and '%5c' ? + +> TODO should we also by default reject non visible and/or control characters ? + +> TODO if %2F is allowed, we may now have double '/', '/./' and '/../' segments in the URL, should stage 3 and 4 be re-run if this is allowed? + + + + + + === Request Path Elements The request path that leads to a servlet From fabaaddf01bdd4540a1f4d14ca61c3ad566d04be Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Thu, 7 Oct 2021 17:39:16 +1100 Subject: [PATCH 02/42] Issue #18 URI path processing moved TODOs to PR comments --- spec/src/main/asciidoc/servlet-spec-body.adoc | 23 ------------------- 1 file changed, 23 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index a58d31079..85ffced8f 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1324,15 +1324,7 @@ A fragment in the path is indicated by the first occurrence of a `\#` character. ==== Decoding of non-special characters Characters other than `/`, `;` and `%` that are encoded in `%nn` form are decoded and the resulting octet sequences is treated as UTF-8 and converted to a character sequence. -> Note that special characters cannot be part of a UTF-8 character sequence as all such sequences are comprised of negative octets. - -> Note this is not reserved characters as defined by RFC3986, as that does not include `%` and includes many characters we don't care about. Avoiding a second decoding is worthwhile. - ==== Collapse sequences of multiple `"/"` characters -> **WARNING** Swapping the order of stage 3 and stage 4 may be significant. Consider `"/aaa/bbb//../"`. - -> **TODO** Are we sure we don't want to do this in the other order? - Any sequence of more than one `"/"` character in the URI must be replaced with a single `"/"`. ==== Remove dot-segments+ @@ -1344,8 +1336,6 @@ Any sequence of more than one `"/"` character in the URI must be replaced with a ==== Removal of path parameters A path segment containing the `";"` character is split at the first occurence of `";"`. The segment is replaced by the character sequence preceeding the `";"`. The characters following the `";"` are considered a path parameters and may be preserved by the container for later processing (eg `jsessionid`). -> TODO How do we handle URIs like `/foo/;/bar`? I think as currently written we end up with `/foo//bar` ? - ==== Decoding of remaining `%nn` sequences Any remaining `%nn` sequences in the path should be decoded. Some containers may be configured to leave some specific characters encoded (eg. the characters '/' and '%' may be left decoded by some container configuration). @@ -1363,19 +1353,6 @@ If suspicious sequences are discovered during the prior steps, the request must A container or context may be configured to have a different set of rejected sequences. -> TODO how do we define control characters? < 0x20? Is 0x7F (DEL) OK? - -> TODO should we also by default reject '\' and '%5c' ? - -> TODO should we also by default reject non visible and/or control characters ? - -> TODO if %2F is allowed, we may now have double '/', '/./' and '/../' segments in the URL, should stage 3 and 4 be re-run if this is allowed? - - - - - - === Request Path Elements The request path that leads to a servlet From c05f08b9adf24f71faad2949bec35147d0fae5ee Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Thu, 7 Oct 2021 18:17:55 +1100 Subject: [PATCH 03/42] Issue #18 URI path processing Define URI transport as description list with HTTP/1.0 and HTTP/3 included. --- spec/src/main/asciidoc/servlet-spec-body.adoc | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 85ffced8f..0c02926e2 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1306,14 +1306,17 @@ The process described here adapts and extends the URI canonicalization process d Servlet containers may implement the standard Servlet URI path canonicalization in any manner they see fit as long as the end result is identical to the end result of the process described here. Servlet containers may provide container specific configuration options to vary the standard canonicalization process. Any such variations may have security implications and both Servlet container implementors and users are advised to be sure that they understand the implications of any such container specific canonicalization options. ==== URI Transport -===== HTTP/1.1 -The URI is extracted from the `request-target` as defined by [RFC 7230](https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.1). URIs in `origin-form` or `asterisk-form` are passed unchanged to stage 2. URIs in `absolute-form` have the protocol and authority removed to convert them to `origin-form` and are then passed to stage 2. URIs in `authority-form` are outside of the scope of this specification. +HTTP/0.9:: The URI is the `document address` as defined by the W3C [Original HTTP](https://www.w3.org/Protocols/HTTP/AsImplemented.html) document. -===== HTTP/2 -The URI is the `:path` pseudo header as defined by [RFC 7540](https://datatracker.ietf.org/doc/html/rfc7540#section-8.1.2.3) and is passed unchanged to stage 2. +HTTP/1.0:: The URI is extracted from the `Request-URI` in the `Request-Line` as defined by [RFC 1945](https://datatracker.ietf.org/doc/html/rfc1945#section-5.1). URIs in `abs_path` form are passed unchanged to subsequent steps. URIs in `absoluteURI` have the protocol and authority removed to convert them to `origin-form` and are then passed the subsequent steps. -===== Other protocols -Containers may support other protocols. Containers should extract an appropriate URI for the request from the protocol and pass it to stage 2. +HTTP/1.1:: The URI is extracted from the `request-target` as defined by [RFC 7230](https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.1). URIs in `origin-form` are passed unchanged to subsequent steps. URIs in `absolute-form` have the protocol and authority removed to convert them to `origin-form` and are then passed the subsequent steps. URIs in `authority-form` or `asterisk-form` are outside of the scope of this specification. + +HTTP/2:: The URI is the `:path` pseudo header as defined by [RFC 7540](https://datatracker.ietf.org/doc/html/rfc7540#section-8.1.2.3) and is passed unchanged to stage 2. + +HTTP/3:: The URI is the `:path` pseudo header as currently defined by the [draft RFC](https://datatracker.ietf.org/doc/html/draft-ietf-quic-http-34). + +Other protocols:: Containers may support other protocols. Containers should extract an appropriate URI for the request from the protocol and pass it to stage 2. ==== Separation of path and query The URI is split by the first occurrence of any '?' character to path and query. The query is preserved for later handling and the following steps applied to the path. From ad8f3d137882955ce1bcc047c9b54c5c7f6832f8 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Thu, 7 Oct 2021 18:31:39 +1100 Subject: [PATCH 04/42] Issue #18 URI path processing Move all rejections to the last step. --- spec/src/main/asciidoc/servlet-spec-body.adoc | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 0c02926e2..cc3babcdc 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1331,10 +1331,8 @@ Characters other than `/`, `;` and `%` that are encoded in `%nn` form are decode Any sequence of more than one `"/"` character in the URI must be replaced with a single `"/"`. ==== Remove dot-segments+ -* A path not starting with "/" must be rejected with a 400 response. * Sequences of the form `"/./"` must be replaced with `"/"`. * Sequences of the form `"/" + segment + "/../"` must be replaced with `"/"`. -* If there is no preceding segment for a `".."` segment then return a 400 response. ==== Removal of path parameters A path segment containing the `";"` character is split at the first occurence of `";"`. The segment is replaced by the character sequence preceeding the `";"`. The characters following the `";"` are considered a path parameters and may be preserved by the container for later processing (eg `jsessionid`). @@ -1346,11 +1344,14 @@ Any remaining `%nn` sequences in the path should be decoded. Some containers may The decoded path is used to map the request to a context and resource within the context. This form of the URI path is used for all subsequent mapping (web applications, servlet, filters and security constraints). ==== Rejecting Suspicious Sequences -If suspicious sequences are discovered during the prior steps, the request must be rejected with a 400 bad request using the error handling of the matched context. By default the set of suspicious sequences includes: +If suspicious sequences are discovered during the prior steps, the request must be rejected with a 400 bad request. If a context is matched the the error handling of the context may be used to generate the response. By default the set of suspicious sequences includes: + * Any path not starting with the `"/"` character + * Any path starting with an initial segment of `".."` * The encoded `"/"` character * Any `"."` or `".."` segment that had a path parameter * Any `"."` or `".."` segment with any encoded characters + * Any `".."` segment preceeded by an empty segment * The `"\"` character encoded or not. * Any control characters either encoded or not. From f0d56b01a169963753faf52a67e4f816c1d1a90e Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Thu, 7 Oct 2021 18:32:35 +1100 Subject: [PATCH 05/42] Issue #18 URI path processing UTF-8 decoding --- spec/src/main/asciidoc/servlet-spec-body.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index cc3babcdc..d3d7caa3a 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1324,8 +1324,8 @@ The URI is split by the first occurrence of any '?' character to path and query. ==== Discard fragment A fragment in the path is indicated by the first occurrence of a `\#` character. Any `#` character and following fragment is removed from the path and discarded. -==== Decoding of non-special characters -Characters other than `/`, `;` and `%` that are encoded in `%nn` form are decoded and the resulting octet sequences is treated as UTF-8 and converted to a character sequence. +==== Decoding of characters as UTF-8 +Characters other than `/`, `;` and `%` that are encoded in `%nn` form are decoded. The resulting octet sequences is treated as UTF-8 and converted to a character sequence. ==== Collapse sequences of multiple `"/"` characters Any sequence of more than one `"/"` character in the URI must be replaced with a single `"/"`. From bb45889d62974c88c7ad6d46e35201e689d3bdd9 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Fri, 8 Oct 2021 09:23:41 +1100 Subject: [PATCH 06/42] Issue #18 URI path processing Added example table. too long and needs review --- spec/src/main/asciidoc/servlet-spec-body.adoc | 64 ++++++++++++++++++- 1 file changed, 61 insertions(+), 3 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index d3d7caa3a..ab324fe59 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1357,6 +1357,65 @@ If suspicious sequences are discovered during the prior steps, the request must A container or context may be configured to have a different set of rejected sequences. +==== Example URIs + +. Example URIs +|=== +| Encoded URI path | Decoded Path | Rejected + +| /public/file.txt | /public/file.txt | +| /public//file.txt | /public/file.txt | +| /public/;/file.txt | /public/file.txt | +| /PUBLIC/file.txt | /PUBLIC/file.txt | +| /public%2Ffile.txt | /public%2Ffile.txt | 400 +| /public%5Cfile.txt | /public\file.txt | 400 +| /public%00/file.txt | /public␀/file.txt | 400? +| /public/./file.txt | /public/file.txt | +| /public/.;/file.txt | /public/file.txt | 400 +| /public/%2e/file.txt | /public/file.txt | 400 +| /public/%2e;/file.txt | /public/file.txt | 400 +| /../docroot/public/file.txt | /../docroot/public/file.txt | 400 +| /public/dir/../file.txt | /public/file.txt | +| /public//../file.txt | /public/file.txt | +| /public/dir/..;/file.txt | /public/file.txt | 400 +| /public/dir/%2e%2e/file.txt | /public/file.txt | 400 +| /public/dir/%2e%2e;/file.txt | /public/file.txt | 400 +| /WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? +| /web-inf/web.xml | /web-inf/web.xml | 404 or 403? +| /WEB-IN~1.DIR/web.xml | ? | ? +| /WEB-INF;/web.xml | /WEB-INF/web.xml | 404 +| /WEB-INF%2Fweb.xml | /WEB-INF%2Fweb.xml| 400 +| /WEB-INF%5Cweb.xml | /WEB-INF\web.xml | 400 +| /WEB-INF%00/web.xml | /WEB-INF␀/web.xml | 400? +| /WEB-INF/./web.xml | /WEB-INF/web.xml | 404 +| /public/../WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? +| /public/..;/WEB-INF/web.xml | | 404 +| /public/%2e%2e/WEB-INF/web.xml | /public/../WEB-INF/web.xml | +| /public/%2e%2e;/WEB-INF/web.xml | /public/../WEB-INF/web.xml | +| /secret/private.xml | | 403 +| /SeCreT/private.xml | /SeCreT/private.xml | +| /SECRET~1.DIR/private.xml | | +| /secret;/private.xml | | 403 +| /secret%2Fprivate.xml | /secret/private.xml | 403? [2] or 400? +| /secret%5Cprivate.xml | /secret\private.xml | 403? [2] +| /secret%00/private.xml | /secret␀/private.xml | 400? +| /./secret/private.xml | | 403 +| /.;/secret/private.xml | | 403 +| /%2e/secret/private.xml | /./secret/private.xml or throw? | 400? +| /%2e;/secret/private.xml | /./secret/private.xml or throw? | 400? +| /public/../secret/private.xml | | 403 +| /public/..;/secret/private.xml | | 403 +| /public/%2e%2e/secret/private.xml | /public/../secret/private.xml or throw?| 400? +| /public/%2e%2e;/secret/private.xml | /public/../secret/private.xml or throw?| 400? +| /dispatch/public/file.txt | /public/file.txt | +| /dispatch/public%2Ffile.txt | /public/file.txt | 400? +| /dispatch/public%5Cfile.txt | /public\file.txt | 400? +| /dispatch/public%252Ffile.txt | /public%2Ffile.txt | 400? +| /dispatch/WEB-INF/web.xml | /WEB-INF/web.xml | +| /dispatch/secret/private.xml | /secret/private/xml | +| /dispatch/%2E%2E/%2E%2E/etc/password | /../../etc/password | 400? + + === Request Path Elements The request path that leads to a servlet @@ -6373,11 +6432,10 @@ public @interface HttpMethodConstraint { |=== |Element |Description +|value |Default -|value -|The HTTP protocol method name -| + |`emptyRoleSemantic` From c46ed03c6b2066f6846594999a605dc1d4f4195c Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Fri, 8 Oct 2021 11:21:40 +1100 Subject: [PATCH 07/42] Issue 225 do head content length (#405) Issue #225 doHead fixes: + fixed reset handling in `NoBodyResponse` and `NoBodyOutputStream` + better implementation of async IO methods in `NoBodyOutputStream` + replaced `doHead` implementation with direct call of `doGet` and legacy nobody mode. + added unit tests. Signed-off-by: Greg Wilkins --- .../jakarta/servlet/http/HttpServlet.java | 75 ++++- .../jakarta/servlet/MockServletConfig.java | 68 ++++ .../jakarta/servlet/MockServletContext.java | 305 ++++++++++++++++++ .../servlet/MockServletOutputStream.java | 78 +++++ .../jakarta/servlet/MockServletRequest.java | 229 +++++++++++++ .../jakarta/servlet/MockServletResponse.java | 126 ++++++++ .../jakarta/servlet/http/HttpServletTest.java | 216 +++++++++++++ .../servlet/http/MockHttpServletRequest.java | 189 +++++++++++ .../servlet/http/MockHttpServletResponse.java | 115 +++++++ spec/src/main/asciidoc/servlet-spec-body.adoc | 12 +- 10 files changed, 1404 insertions(+), 9 deletions(-) create mode 100644 api/src/test/java/jakarta/servlet/MockServletConfig.java create mode 100644 api/src/test/java/jakarta/servlet/MockServletContext.java create mode 100644 api/src/test/java/jakarta/servlet/MockServletOutputStream.java create mode 100644 api/src/test/java/jakarta/servlet/MockServletRequest.java create mode 100644 api/src/test/java/jakarta/servlet/MockServletResponse.java create mode 100644 api/src/test/java/jakarta/servlet/http/HttpServletTest.java create mode 100644 api/src/test/java/jakarta/servlet/http/MockHttpServletRequest.java create mode 100644 api/src/test/java/jakarta/servlet/http/MockHttpServletResponse.java diff --git a/api/src/main/java/jakarta/servlet/http/HttpServlet.java b/api/src/main/java/jakarta/servlet/http/HttpServlet.java index 8fcd11abd..6f34601e8 100644 --- a/api/src/main/java/jakarta/servlet/http/HttpServlet.java +++ b/api/src/main/java/jakarta/servlet/http/HttpServlet.java @@ -19,6 +19,7 @@ package jakarta.servlet.http; import jakarta.servlet.GenericServlet; +import jakarta.servlet.ServletConfig; import jakarta.servlet.ServletException; import jakarta.servlet.ServletOutputStream; import jakarta.servlet.ServletRequest; @@ -78,9 +79,21 @@ public abstract class HttpServlet extends GenericServlet { private static final String HEADER_IFMODSINCE = "If-Modified-Since"; private static final String HEADER_LASTMOD = "Last-Modified"; + /** + * The parameter obtained {@link ServletConfig#getInitParameter(String)} to determine if legacy processing of + * {@link #doHead(HttpServletRequest, HttpServletResponse)} is provided. + * + * @deprecated may be removed in future releases + * @since 6.0 + */ + @Deprecated + public static final String LEGACY_DO_HEAD = "jakarta.servlet.http.legacyDoHead"; + private static final String LSTRING_FILE = "jakarta.servlet.http.LocalStrings"; private static ResourceBundle lStrings = ResourceBundle.getBundle(LSTRING_FILE); + private boolean legacyHeadHandling; + /** * Does nothing, because this is an abstract class. * @@ -89,6 +102,12 @@ public abstract class HttpServlet extends GenericServlet { public HttpServlet() { } + @Override + public void init(ServletConfig config) throws ServletException { + super.init(config); + legacyHeadHandling = Boolean.parseBoolean(config.getInitParameter(LEGACY_DO_HEAD)); + } + /** * * Called by the server (via the service method) to allow a servlet to handle a GET request. @@ -178,6 +197,11 @@ protected long getLastModified(HttpServletRequest req) { * protects itself from being called multiple times for one HTTP HEAD request). * *

+ * The default implementation calls {@link #doGet(HttpServletRequest, HttpServletResponse)}. If the + * {@link ServletConfig} init parameter {@link #LEGACY_DO_HEAD} is set to "TRUE", then the response instance is wrapped + * so that the response body is discarded. + * + *

* If the HTTP HEAD request is incorrectly formatted, doHead returns an HTTP "Bad Request" message. * * @param req the request object that is passed to the servlet @@ -189,10 +213,13 @@ protected long getLastModified(HttpServletRequest req) { * @throws ServletException if the request for the HEAD could not be handled */ protected void doHead(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException { - NoBodyResponse response = new NoBodyResponse(resp); - - doGet(req, response); - response.setContentLength(); + if (legacyHeadHandling) { + NoBodyResponse response = new NoBodyResponse(resp); + doGet(req, response); + response.setContentLength(); + } else { + doGet(req, resp); + } } /** @@ -594,6 +621,7 @@ public void service(ServletRequest req, ServletResponse res) throws ServletExcep * Response object. */ // file private +@Deprecated class NoBodyResponse extends HttpServletResponseWrapper { private static final ResourceBundle lStrings = ResourceBundle.getBundle("jakarta.servlet.http.LocalStrings"); @@ -661,6 +689,29 @@ private void checkHeader(String name) { } } + @Override + public void reset() { + super.reset(); + noBody.reset(); + usingOutputStream = false; + writer = null; + didSetContentLength = false; + } + + @Override + public void resetBuffer() { + super.resetBuffer(); + if (writer != null) { + try { + NoBodyOutputStream.disableFlush.set(Boolean.TRUE); + writer.flush(); + } finally { + NoBodyOutputStream.disableFlush.remove(); + } + } + noBody.reset(); + } + @Override public ServletOutputStream getOutputStream() throws IOException { @@ -692,10 +743,12 @@ public PrintWriter getWriter() throws UnsupportedEncodingException { * Servlet output stream that gobbles up all its data. */ // file private +@Deprecated class NoBodyOutputStream extends ServletOutputStream { private static final String LSTRING_FILE = "jakarta.servlet.http.LocalStrings"; private static ResourceBundle lStrings = ResourceBundle.getBundle(LSTRING_FILE); + static ThreadLocal disableFlush = new ThreadLocal<>(); private int contentLength = 0; @@ -703,6 +756,10 @@ class NoBodyOutputStream extends ServletOutputStream { NoBodyOutputStream() { } + void reset() { + contentLength = 0; + } + // file private int getContentLength() { return contentLength; @@ -732,13 +789,19 @@ public void write(byte buf[], int offset, int len) throws IOException { contentLength += len; } + @Override + public void flush() throws IOException { + if (Boolean.TRUE.equals(disableFlush.get())) + super.flush(); + } + @Override public boolean isReady() { - return false; + throw new UnsupportedOperationException(); } @Override public void setWriteListener(WriteListener writeListener) { - + throw new UnsupportedOperationException(); } } diff --git a/api/src/test/java/jakarta/servlet/MockServletConfig.java b/api/src/test/java/jakarta/servlet/MockServletConfig.java new file mode 100644 index 000000000..9492b8078 --- /dev/null +++ b/api/src/test/java/jakarta/servlet/MockServletConfig.java @@ -0,0 +1,68 @@ +/* + * Copyright (c) 1997, 2020 Oracle and/or its affiliates and others. + * All rights reserved. + * Copyright 2004 The Apache Software Foundation + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package jakarta.servlet; + +import java.util.Collections; +import java.util.Enumeration; +import java.util.HashMap; +import java.util.Map; + +public class MockServletConfig implements ServletConfig { + private final String name; + private final ServletContext servletContext; + private final Map params; + + public MockServletConfig() { + this(null, null, null); + } + + public MockServletConfig(String name) { + this(name, null, null); + } + + public MockServletConfig(String name, ServletContext servletContext, Map params) { + this.name = name == null ? "Servlet" : name; + this.servletContext = servletContext == null ? new MockServletContext() : servletContext; + this.params = params == null ? new HashMap<>() : params; + } + + @Override + public String getServletName() { + return null; + } + + @Override + public ServletContext getServletContext() { + return servletContext; + } + + @Override + public String getInitParameter(String name) { + return params.get(name); + } + + @Override + public Enumeration getInitParameterNames() { + return Collections.enumeration(params.keySet()); + } + + public void setInitParameter(String name, String value) { + params.put(name, value); + } +} diff --git a/api/src/test/java/jakarta/servlet/MockServletContext.java b/api/src/test/java/jakarta/servlet/MockServletContext.java new file mode 100644 index 000000000..0c588c1ea --- /dev/null +++ b/api/src/test/java/jakarta/servlet/MockServletContext.java @@ -0,0 +1,305 @@ +/* + * Copyright (c) 1997, 2020 Oracle and/or its affiliates and others. + * All rights reserved. + * Copyright 2004 The Apache Software Foundation + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package jakarta.servlet; + +import jakarta.servlet.descriptor.JspConfigDescriptor; +import java.io.InputStream; +import java.net.MalformedURLException; +import java.net.URL; +import java.util.Enumeration; +import java.util.EventListener; +import java.util.Map; +import java.util.Set; + +public class MockServletContext implements ServletContext { + @Override + public String getContextPath() { + return null; + } + + @Override + public ServletContext getContext(String uripath) { + return null; + } + + @Override + public int getMajorVersion() { + return 0; + } + + @Override + public int getMinorVersion() { + return 0; + } + + @Override + public int getEffectiveMajorVersion() { + return 0; + } + + @Override + public int getEffectiveMinorVersion() { + return 0; + } + + @Override + public String getMimeType(String file) { + return null; + } + + @Override + public Set getResourcePaths(String path) { + return null; + } + + @Override + public URL getResource(String path) throws MalformedURLException { + return null; + } + + @Override + public InputStream getResourceAsStream(String path) { + return null; + } + + @Override + public RequestDispatcher getRequestDispatcher(String path) { + return null; + } + + @Override + public RequestDispatcher getNamedDispatcher(String name) { + return null; + } + + @Override + public void log(String msg) { + + } + + @Override + public void log(String message, Throwable throwable) { + + } + + @Override + public String getRealPath(String path) { + return null; + } + + @Override + public String getServerInfo() { + return null; + } + + @Override + public String getInitParameter(String name) { + return null; + } + + @Override + public Enumeration getInitParameterNames() { + return null; + } + + @Override + public boolean setInitParameter(String name, String value) { + return false; + } + + @Override + public Object getAttribute(String name) { + return null; + } + + @Override + public Enumeration getAttributeNames() { + return null; + } + + @Override + public void setAttribute(String name, Object object) { + + } + + @Override + public void removeAttribute(String name) { + + } + + @Override + public String getServletContextName() { + return null; + } + + @Override + public ServletRegistration.Dynamic addServlet(String servletName, String className) { + return null; + } + + @Override + public ServletRegistration.Dynamic addServlet(String servletName, Servlet servlet) { + return null; + } + + @Override + public ServletRegistration.Dynamic addServlet(String servletName, Class servletClass) { + return null; + } + + @Override + public ServletRegistration.Dynamic addJspFile(String servletName, String jspFile) { + return null; + } + + @Override + public T createServlet(Class clazz) throws ServletException { + return null; + } + + @Override + public ServletRegistration getServletRegistration(String servletName) { + return null; + } + + @Override + public Map getServletRegistrations() { + return null; + } + + @Override + public FilterRegistration.Dynamic addFilter(String filterName, String className) { + return null; + } + + @Override + public FilterRegistration.Dynamic addFilter(String filterName, Filter filter) { + return null; + } + + @Override + public FilterRegistration.Dynamic addFilter(String filterName, Class filterClass) { + return null; + } + + @Override + public T createFilter(Class clazz) throws ServletException { + return null; + } + + @Override + public FilterRegistration getFilterRegistration(String filterName) { + return null; + } + + @Override + public Map getFilterRegistrations() { + return null; + } + + @Override + public SessionCookieConfig getSessionCookieConfig() { + return null; + } + + @Override + public void setSessionTrackingModes(Set sessionTrackingModes) { + + } + + @Override + public Set getDefaultSessionTrackingModes() { + return null; + } + + @Override + public Set getEffectiveSessionTrackingModes() { + return null; + } + + @Override + public void addListener(String className) { + + } + + @Override + public void addListener(T t) { + + } + + @Override + public void addListener(Class listenerClass) { + + } + + @Override + public T createListener(Class clazz) throws ServletException { + return null; + } + + @Override + public JspConfigDescriptor getJspConfigDescriptor() { + return null; + } + + @Override + public ClassLoader getClassLoader() { + return null; + } + + @Override + public void declareRoles(String... roleNames) { + + } + + @Override + public String getVirtualServerName() { + return null; + } + + @Override + public int getSessionTimeout() { + return 0; + } + + @Override + public void setSessionTimeout(int sessionTimeout) { + + } + + @Override + public String getRequestCharacterEncoding() { + return null; + } + + @Override + public void setRequestCharacterEncoding(String encoding) { + + } + + @Override + public String getResponseCharacterEncoding() { + return null; + } + + @Override + public void setResponseCharacterEncoding(String encoding) { + + } +} diff --git a/api/src/test/java/jakarta/servlet/MockServletOutputStream.java b/api/src/test/java/jakarta/servlet/MockServletOutputStream.java new file mode 100644 index 000000000..0dc5cad57 --- /dev/null +++ b/api/src/test/java/jakarta/servlet/MockServletOutputStream.java @@ -0,0 +1,78 @@ +package jakarta.servlet; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; + +public class MockServletOutputStream extends ServletOutputStream { + private final OutputStream out; + + public MockServletOutputStream() { + this(null); + } + + public MockServletOutputStream(OutputStream out) { + this.out = out == null ? new ByteArrayOutputStream() : out; + } + + @Override + public void close() throws IOException { + out.close(); + } + + @Override + public void flush() throws IOException { + out.flush(); + } + + @Override + public boolean isReady() { + return false; + } + + public void reset() { + if (out instanceof ByteArrayOutputStream) + ((ByteArrayOutputStream) out).reset(); + } + + @Override + public void setWriteListener(WriteListener writeListener) { + throw new UnsupportedOperationException(); + } + + @Override + public void write(byte[] b) throws IOException { + out.write(b); + } + + @Override + public void write(byte[] b, int off, int len) throws IOException { + out.write(b, off, len); + } + + @Override + public void write(int b) throws IOException { + out.write(b); + } + + public String takeOutputAsString() { + return takeOutputAsString(StandardCharsets.ISO_8859_1); + } + + public String takeOutputAsString(Charset charset) { + byte[] bytes = takeOutput(); + return bytes == null ? null : new String(bytes, charset); + } + + public byte[] takeOutput() { + if (out instanceof ByteArrayOutputStream) { + ByteArrayOutputStream bout = (ByteArrayOutputStream) out; + byte[] bytes = bout.toByteArray(); + bout.reset(); + return bytes; + } + return null; + } +} diff --git a/api/src/test/java/jakarta/servlet/MockServletRequest.java b/api/src/test/java/jakarta/servlet/MockServletRequest.java new file mode 100644 index 000000000..fad571ecf --- /dev/null +++ b/api/src/test/java/jakarta/servlet/MockServletRequest.java @@ -0,0 +1,229 @@ +/* + * Copyright (c) 1997, 2020 Oracle and/or its affiliates and others. + * All rights reserved. + * Copyright 2004 The Apache Software Foundation + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package jakarta.servlet; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.UnsupportedEncodingException; +import java.util.Enumeration; +import java.util.Locale; +import java.util.Map; + +public class MockServletRequest implements ServletRequest { + private final ServletContext context; + + public MockServletRequest(ServletContext context) { + this.context = context; + } + + @Override + public Object getAttribute(String name) { + return null; + } + + @Override + public Enumeration getAttributeNames() { + return null; + } + + @Override + public String getCharacterEncoding() { + return null; + } + + @Override + public void setCharacterEncoding(String env) throws UnsupportedEncodingException { + + } + + @Override + public int getContentLength() { + return 0; + } + + @Override + public long getContentLengthLong() { + return 0; + } + + @Override + public String getContentType() { + return null; + } + + @Override + public ServletInputStream getInputStream() throws IOException { + return null; + } + + @Override + public String getParameter(String name) { + return null; + } + + @Override + public Enumeration getParameterNames() { + return null; + } + + @Override + public String[] getParameterValues(String name) { + return new String[0]; + } + + @Override + public Map getParameterMap() { + return null; + } + + @Override + public String getProtocol() { + return "HTTP/1.0"; + } + + @Override + public String getScheme() { + return "http"; + } + + @Override + public String getServerName() { + return "host"; + } + + @Override + public int getServerPort() { + return 80; + } + + @Override + public BufferedReader getReader() throws IOException { + return null; + } + + @Override + public String getRemoteAddr() { + return null; + } + + @Override + public String getRemoteHost() { + return null; + } + + @Override + public void setAttribute(String name, Object o) { + + } + + @Override + public void removeAttribute(String name) { + + } + + @Override + public Locale getLocale() { + return null; + } + + @Override + public Enumeration getLocales() { + return null; + } + + @Override + public boolean isSecure() { + return false; + } + + @Override + public RequestDispatcher getRequestDispatcher(String path) { + return null; + } + + @Override + public int getRemotePort() { + return 0; + } + + @Override + public String getLocalName() { + return null; + } + + @Override + public String getLocalAddr() { + return null; + } + + @Override + public int getLocalPort() { + return 8080; + } + + @Override + public ServletContext getServletContext() { + return context; + } + + @Override + public AsyncContext startAsync() throws IllegalStateException { + throw new UnsupportedOperationException(); + } + + @Override + public AsyncContext startAsync(ServletRequest servletRequest, ServletResponse servletResponse) throws IllegalStateException { + throw new UnsupportedOperationException(); + } + + @Override + public boolean isAsyncStarted() { + return false; + } + + @Override + public boolean isAsyncSupported() { + return false; + } + + @Override + public AsyncContext getAsyncContext() { + return null; + } + + @Override + public DispatcherType getDispatcherType() { + return DispatcherType.REQUEST; + } + + @Override + public String getRequestId() { + return null; + } + + @Override + public String getProtocolRequestId() { + return null; + } + + @Override + public ServletConnection getServletConnection() { + return null; + } +} diff --git a/api/src/test/java/jakarta/servlet/MockServletResponse.java b/api/src/test/java/jakarta/servlet/MockServletResponse.java new file mode 100644 index 000000000..5a3726c61 --- /dev/null +++ b/api/src/test/java/jakarta/servlet/MockServletResponse.java @@ -0,0 +1,126 @@ +package jakarta.servlet; + +import java.io.IOException; +import java.io.PrintWriter; +/* + * Copyright (c) 1997, 2020 Oracle and/or its affiliates and others. + * All rights reserved. + * Copyright 2004 The Apache Software Foundation + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +import java.nio.charset.StandardCharsets; +import java.util.Locale; + +public class MockServletResponse implements ServletResponse { + private ServletOutputStream servletOutputStream; + private PrintWriter printWriter; + private int bufferSize = 1024; + + @Override + public String getCharacterEncoding() { + return StandardCharsets.ISO_8859_1.name(); + } + + @Override + public String getContentType() { + return null; + } + + @Override + public ServletOutputStream getOutputStream() throws IOException { + if (printWriter != null) + throw new IllegalStateException(); + if (servletOutputStream == null) + servletOutputStream = new MockServletOutputStream(); + return servletOutputStream; + } + + @Override + public PrintWriter getWriter() throws IOException { + if (printWriter != null) + return printWriter; + if (servletOutputStream != null) + throw new IllegalStateException(); + printWriter = new PrintWriter(getOutputStream()); + return printWriter; + } + + @Override + public void setCharacterEncoding(String charset) { + + } + + @Override + public void setContentLength(int len) { + setContentLengthLong((long) len); + } + + @Override + public void setContentLengthLong(long len) { + + } + + @Override + public void setContentType(String type) { + + } + + @Override + public void setBufferSize(int size) { + bufferSize = size; + } + + @Override + public int getBufferSize() { + return bufferSize; + } + + @Override + public void flushBuffer() throws IOException { + + } + + @Override + public void resetBuffer() { + if (servletOutputStream instanceof MockServletOutputStream) + ((MockServletOutputStream) servletOutputStream).reset(); + } + + @Override + public boolean isCommitted() { + return false; + } + + @Override + public void reset() { + printWriter = null; + servletOutputStream = null; + } + + @Override + public void setLocale(Locale loc) { + + } + + @Override + public Locale getLocale() { + return null; + } + + public MockServletOutputStream getMockServletOutputStream() { + if (servletOutputStream instanceof MockServletOutputStream) + return (MockServletOutputStream) servletOutputStream; + return null; + } +} diff --git a/api/src/test/java/jakarta/servlet/http/HttpServletTest.java b/api/src/test/java/jakarta/servlet/http/HttpServletTest.java new file mode 100644 index 000000000..72628c7ba --- /dev/null +++ b/api/src/test/java/jakarta/servlet/http/HttpServletTest.java @@ -0,0 +1,216 @@ +/* + * Copyright (c) 1997, 2020 Oracle and/or its affiliates and others. + * All rights reserved. + * Copyright 2004 The Apache Software Foundation + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package jakarta.servlet.http; + +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.anyOf; +import static org.hamcrest.Matchers.is; +import static org.hamcrest.Matchers.nullValue; + +import jakarta.servlet.MockServletConfig; +import jakarta.servlet.MockServletOutputStream; +import jakarta.servlet.ServletConfig; +import jakarta.servlet.ServletException; +import java.io.IOException; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicLong; +import java.util.stream.Stream; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +public class HttpServletTest { + public interface Handler { + void handle(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException; + } + + @ParameterizedTest + @MethodSource("headTest") + public void testLegacyHead(String test, Handler doGet, boolean expectedFlushed, long expectedContentLength) + throws ServletException, IOException { + HttpServlet servlet = new HttpServlet() { + @Override + protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { + doGet.handle(request, response); + } + }; + + MockServletConfig servletConfig = new MockServletConfig(); + servletConfig.setInitParameter("jakarta.servlet.http.legacyDoHead", "true"); + servlet.init(servletConfig); + + MockHttpServletRequest request = new MockHttpServletRequest(servletConfig.getServletContext()) { + @Override + public String getMethod() { + return "HEAD"; + } + }; + + AtomicBoolean committed = new AtomicBoolean(); + AtomicLong contentLength = new AtomicLong(-1); + MockHttpServletResponse response = new MockHttpServletResponse() { + @Override + public void flushBuffer() throws IOException { + committed.set(true); + } + + @Override + public boolean isCommitted() { + return committed.get(); + } + + @Override + public void setContentLengthLong(long len) { + contentLength.set(len); + } + }; + + servlet.service(request, response); + MockServletOutputStream out = response.getMockServletOutputStream(); + String actual = out == null ? null : out.takeOutputAsString(); + + // Check if the output should have already been flushed + assertThat(test, committed.get(), is(expectedFlushed)); + assertThat(test, contentLength.get(), is(expectedContentLength)); + assertThat(test, actual, anyOf(is(""), nullValue())); + } + + public static Stream headTest() { + return Stream.of( + Arguments.of("Nothing output", + (Handler) (request, response) -> { + }, false, 0), + + Arguments.of("Output smaller than buffer", + (Handler) (request, response) -> { + response.setBufferSize(2048); + response.getOutputStream().print("Hello World"); + }, false, 11), + + Arguments.of("Write smaller than buffer", + (Handler) (request, response) -> { + response.setBufferSize(2048); + response.getWriter().print("Hello World"); + }, false, 11), + + Arguments.of("Output bigger than buffer", + (Handler) (request, response) -> { + response.setBufferSize(5); + response.getOutputStream().print("Hello World"); + }, + false, // this is a known deficiency: a GET would commit the response + 11 // this is a known deficiency: a GET would not know the content-length + ), + + Arguments.of("Write bigger than buffer", + (Handler) (request, response) -> { + response.setBufferSize(5); + response.getWriter().print("Hello World"); + }, + false, // this is a known deficiency: a GET would commit the response + 11 // this is a known deficiency: a GET would not know the content-length + ), + + Arguments.of("Outputs with Content-Length smaller than buffer", + (Handler) (request, response) -> { + response.setBufferSize(1024); + response.setContentLength(11); + response.getOutputStream().print("Hello World"); + }, + false, // this is a known deficiency: a GET would commit the response + 11), + + Arguments.of("Write with Content-Length smaller than buffer", + (Handler) (request, response) -> { + response.setBufferSize(1024); + response.setContentLength(11); + response.getWriter().print("Hello World"); + }, + false, // this is a known deficiency: a GET would commit the response + 11), + + Arguments.of("Output with resetBuffer", + (Handler) (request, response) -> { + response.setBufferSize(2048); + response.getOutputStream().print("THIS IS WRONG"); + response.resetBuffer(); + response.getOutputStream().print("Hello World"); + }, false, 11), + + Arguments.of("Write with resetBuffer", + (Handler) (request, response) -> { + response.setBufferSize(2048); + response.getWriter().print("THIS IS WRONG"); + response.resetBuffer(); + response.getWriter().print("Hello World"); + }, + false, + 11), + + Arguments.of("Output with reset", + (Handler) (request, response) -> { + response.setBufferSize(2048); + response.getWriter().print("THIS IS WRONG"); + response.reset(); + response.getOutputStream().print("Hello World"); + }, false, 11), + + Arguments.of("Write with reset", + (Handler) (request, response) -> { + response.setBufferSize(2048); + response.getOutputStream().print("THIS IS WRONG"); + response.reset(); + response.getWriter().print("Hello World"); + }, false, 11) + + ); + } + + @Test + public void testContainerHead() + throws ServletException, IOException { + HttpServlet servlet = new HttpServlet() { + @Override + protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { + response.setBufferSize(2048); + response.getOutputStream().print("Hello World"); + } + }; + + ServletConfig servletConfig = new MockServletConfig(); + servlet.init(servletConfig); + + MockHttpServletRequest request = new MockHttpServletRequest(servletConfig.getServletContext()) { + @Override + public String getMethod() { + return "HEAD"; + } + }; + + MockHttpServletResponse response = new MockHttpServletResponse(); + + servlet.service(request, response); + MockServletOutputStream out = response.getMockServletOutputStream(); + String actual = out == null ? null : out.takeOutputAsString(); + + // Check output makes it to container (which should then consume it) + assertThat(actual, is("Hello World")); + } +} diff --git a/api/src/test/java/jakarta/servlet/http/MockHttpServletRequest.java b/api/src/test/java/jakarta/servlet/http/MockHttpServletRequest.java new file mode 100644 index 000000000..e83d7afee --- /dev/null +++ b/api/src/test/java/jakarta/servlet/http/MockHttpServletRequest.java @@ -0,0 +1,189 @@ +/* + * Copyright (c) 1997, 2020 Oracle and/or its affiliates and others. + * All rights reserved. + * Copyright 2004 The Apache Software Foundation + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package jakarta.servlet.http; + +import jakarta.servlet.MockServletRequest; +import jakarta.servlet.ServletContext; +import jakarta.servlet.ServletException; +import java.io.IOException; +import java.security.Principal; +import java.util.Collection; +import java.util.Enumeration; + +public class MockHttpServletRequest extends MockServletRequest implements HttpServletRequest { + + public MockHttpServletRequest(ServletContext context) { + super(context); + } + + @Override + public String getAuthType() { + return null; + } + + @Override + public Cookie[] getCookies() { + return new Cookie[0]; + } + + @Override + public long getDateHeader(String name) { + return 0; + } + + @Override + public String getHeader(String name) { + return null; + } + + @Override + public Enumeration getHeaders(String name) { + return null; + } + + @Override + public Enumeration getHeaderNames() { + return null; + } + + @Override + public int getIntHeader(String name) { + return 0; + } + + @Override + public String getMethod() { + return "GET"; + } + + @Override + public String getPathInfo() { + return null; + } + + @Override + public String getPathTranslated() { + return null; + } + + @Override + public String getContextPath() { + return null; + } + + @Override + public String getQueryString() { + return null; + } + + @Override + public String getRemoteUser() { + return null; + } + + @Override + public boolean isUserInRole(String role) { + return false; + } + + @Override + public Principal getUserPrincipal() { + return null; + } + + @Override + public String getRequestedSessionId() { + return null; + } + + @Override + public String getRequestURI() { + return null; + } + + @Override + public StringBuffer getRequestURL() { + return null; + } + + @Override + public String getServletPath() { + return null; + } + + @Override + public HttpSession getSession(boolean create) { + return null; + } + + @Override + public HttpSession getSession() { + return null; + } + + @Override + public String changeSessionId() { + return null; + } + + @Override + public boolean isRequestedSessionIdValid() { + return false; + } + + @Override + public boolean isRequestedSessionIdFromCookie() { + return false; + } + + @Override + public boolean isRequestedSessionIdFromURL() { + return false; + } + + @Override + public boolean authenticate(HttpServletResponse response) throws IOException, ServletException { + return false; + } + + @Override + public void login(String username, String password) throws ServletException { + + } + + @Override + public void logout() throws ServletException { + + } + + @Override + public Collection getParts() throws IOException, ServletException { + return null; + } + + @Override + public Part getPart(String name) throws IOException, ServletException { + return null; + } + + @Override + public T upgrade(Class handlerClass) throws IOException, ServletException { + return null; + } +} diff --git a/api/src/test/java/jakarta/servlet/http/MockHttpServletResponse.java b/api/src/test/java/jakarta/servlet/http/MockHttpServletResponse.java new file mode 100644 index 000000000..854baf241 --- /dev/null +++ b/api/src/test/java/jakarta/servlet/http/MockHttpServletResponse.java @@ -0,0 +1,115 @@ +/* + * Copyright (c) 1997, 2020 Oracle and/or its affiliates and others. + * All rights reserved. + * Copyright 2004 The Apache Software Foundation + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package jakarta.servlet.http; + +import jakarta.servlet.MockServletResponse; +import java.io.IOException; +import java.util.Collection; + +public class MockHttpServletResponse extends MockServletResponse implements HttpServletResponse { + @Override + public void addCookie(Cookie cookie) { + + } + + @Override + public boolean containsHeader(String name) { + return false; + } + + @Override + public String encodeURL(String url) { + return null; + } + + @Override + public String encodeRedirectURL(String url) { + return null; + } + + @Override + public void sendError(int sc, String msg) throws IOException { + + } + + @Override + public void sendError(int sc) throws IOException { + + } + + @Override + public void sendRedirect(String location) throws IOException { + + } + + @Override + public void setDateHeader(String name, long date) { + + } + + @Override + public void addDateHeader(String name, long date) { + + } + + @Override + public void setHeader(String name, String value) { + + } + + @Override + public void addHeader(String name, String value) { + + } + + @Override + public void setIntHeader(String name, int value) { + + } + + @Override + public void addIntHeader(String name, int value) { + + } + + @Override + public void setStatus(int sc) { + + } + + @Override + public int getStatus() { + return 0; + } + + @Override + public String getHeader(String name) { + return null; + } + + @Override + public Collection getHeaders(String name) { + return null; + } + + @Override + public Collection getHeaderNames() { + return null; + } +} diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 0b15645a9..e64eb4d25 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -332,13 +332,16 @@ an Application Developer is concerned with the `doGet` and `doPost` methods. The other methods are considered to be methods for use by programmers very familiar with HTTP programming. +==== HEAD Method +The `doHead` method in `HttpServlet`, by default, directly calls the `doGet` method and relies on the container to implement HEAD behaviour of the HTTP protocol. +However, if the "jakarta.servlet.http.legacyDoHead" `ServletConfig` init parameter is set to "TRUE" then the `doGet` method is called with the `ServletResponse` wrapped to provide only the headers produced by the `doGet` method. +The legacy mode is deprecated as it may not be accurate in producing the same heads as `GET` method would have returned and may be removed in a future release. + ==== Additional Methods The `doPut` and `doDelete` methods allow Servlet Developers to support HTTP/1.1 clients that employ these -features. The `doHead` method in `HttpServlet` is a specialized form of -the `doGet` method that returns only the headers produced by the `doGet` -method. The `doOptions` method responds with which HTTP methods are +features. The `doOptions` method responds with which HTTP methods are supported by the servlet. The `doTrace` method generates a response containing all instances of the headers sent in the `TRACE` request. @@ -8356,6 +8359,9 @@ Jakarta Servlet {spec-version} specification developed under the Jakarta EE Work === Changes Since Jakarta Servlet 5.0 +link:https://github.com/eclipse-ee4j/servlet-api/issues/225[Issue 225]:: +Deprecated wrapped response handling in the `doHead` method in favour of container implementation of `HEAD` method behavior. + link:https://github.com/eclipse-ee4j/servlet-api/issues/272[Issue 272]:: Remove the recommendation that Servlet containers should include an `X-Powered-By`header. From c4a20a7cd84877e8675a5f187e7cd27369dec5a5 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Fri, 8 Oct 2021 11:55:55 +1100 Subject: [PATCH 08/42] added change note --- spec/src/main/asciidoc/servlet-spec-body.adoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index bc1a319b3..ee120bedb 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -8474,6 +8474,9 @@ Jakarta Servlet {spec-version} specification developed under the Jakarta EE Work === Changes Since Jakarta Servlet 5.0 +link:https://github.com/eclipse-ee4j/servlet-api/issues/18[Issue 18]:: +Clarify the decoding and normalization of URI paths. + link:https://github.com/eclipse-ee4j/servlet-api/issues/225[Issue 225]:: Deprecated wrapped response handling in the `doHead` method in favour of container implementation of `HEAD` method behavior. From fc4f32df339259529aed1309f8133c0def8f919e Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Fri, 8 Oct 2021 12:05:06 +1100 Subject: [PATCH 09/42] more URI example updates --- spec/src/main/asciidoc/servlet-spec-body.adoc | 43 ++++++++++--------- 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index ee120bedb..7d170c6a8 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1354,13 +1354,16 @@ If suspicious sequences are discovered during the prior steps, the request must * The encoded `"/"` character * Any `"."` or `".."` segment that had a path parameter * Any `"."` or `".."` segment with any encoded characters - * Any `".."` segment preceeded by an empty segment + * Any `".."` segment preceded by an empty segment * The `"\"` character encoded or not. * Any control characters either encoded or not. A container or context may be configured to have a different set of rejected sequences. ==== Example URIs +> TODO more work needed on this section. Perhaps some parts need to be moved to other sections. + +Consider the following URIs with a forbidden security constraint at `/secret/*` and a servlet registered at /dispatch/* that forwards request to the string return from pathInfo() . Example URIs |=== @@ -1379,7 +1382,8 @@ A container or context may be configured to have a different set of rejected seq | /public/%2e;/file.txt | /public/file.txt | 400 | /../docroot/public/file.txt | /../docroot/public/file.txt | 400 | /public/dir/../file.txt | /public/file.txt | -| /public//../file.txt | /public/file.txt | +| /public//../file.txt | /file.txt | 400? +| /public/;param/../file.txt | /file.txt | 400? | /public/dir/..;/file.txt | /public/file.txt | 400 | /public/dir/%2e%2e/file.txt | /public/file.txt | 400 | /public/dir/%2e%2e;/file.txt | /public/file.txt | 400 @@ -1392,32 +1396,31 @@ A container or context may be configured to have a different set of rejected seq | /WEB-INF%00/web.xml | /WEB-INF␀/web.xml | 400? | /WEB-INF/./web.xml | /WEB-INF/web.xml | 404 | /public/../WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? -| /public/..;/WEB-INF/web.xml | | 404 -| /public/%2e%2e/WEB-INF/web.xml | /public/../WEB-INF/web.xml | -| /public/%2e%2e;/WEB-INF/web.xml | /public/../WEB-INF/web.xml | -| /secret/private.xml | | 403 -| /SeCreT/private.xml | /SeCreT/private.xml | -| /SECRET~1.DIR/private.xml | | +| /public/..;/WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? +| /public/%2e%2e/WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? +| /public/%2e%2e;/WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? +| /secret/private.xml | /secret/private.xml | 403 +| /SeCreT/private.xml | /SeCreT/private.xml | TODO move to another section an 404 due to non-canonical +| /SECRET~1.DIR/private.xml | | TODO move to another section an 404 due to non-canonical | /secret;/private.xml | | 403 -| /secret%2Fprivate.xml | /secret/private.xml | 403? [2] or 400? -| /secret%5Cprivate.xml | /secret\private.xml | 403? [2] +| /secret%2Fprivate.xml | /secret%2Fprivate.xml | 400 +| /secret%5Cprivate.xml | /secret\private.xml | 400 | /secret%00/private.xml | /secret␀/private.xml | 400? -| /./secret/private.xml | | 403 -| /.;/secret/private.xml | | 403 -| /%2e/secret/private.xml | /./secret/private.xml or throw? | 400? -| /%2e;/secret/private.xml | /./secret/private.xml or throw? | 400? -| /public/../secret/private.xml | | 403 -| /public/..;/secret/private.xml | | 403 -| /public/%2e%2e/secret/private.xml | /public/../secret/private.xml or throw?| 400? -| /public/%2e%2e;/secret/private.xml | /public/../secret/private.xml or throw?| 400? +| /./secret/private.xml | /secret/private.xml| 403 +| /.;/secret/private.xml | /secret/private.xml| 403 +| /%2e/secret/private.xml | /secret/private.xml | 400 +| /%2e;/secret/private.xml | /secret/private.xml | 400 +| /public/../secret/private.xml | /secret/private.xml | 403 +| /public/..;/secret/private.xml | /secret/private.xml | 403 +| /public/%2e%2e/secret/private.xml | /secret/private.xml | 400 +| /public/%2e%2e;/secret/private.xml | /secret/private.xml | 400 | /dispatch/public/file.txt | /public/file.txt | | /dispatch/public%2Ffile.txt | /public/file.txt | 400? | /dispatch/public%5Cfile.txt | /public\file.txt | 400? | /dispatch/public%252Ffile.txt | /public%2Ffile.txt | 400? | /dispatch/WEB-INF/web.xml | /WEB-INF/web.xml | | /dispatch/secret/private.xml | /secret/private/xml | -| /dispatch/%2E%2E/%2E%2E/etc/password | /../../etc/password | 400? - +| /dispatch/%2E%2E/%2E%2E/etc/password | /../../etc/password | 400? === Request Path Elements From ec63dfcf185129e12828aa4f694ff4b0a69903a4 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Fri, 8 Oct 2021 12:51:40 +1100 Subject: [PATCH 10/42] alternate segment based algorithm lots of format cleanups --- spec/src/main/asciidoc/servlet-spec-body.adoc | 88 +++++++++++-------- 1 file changed, 51 insertions(+), 37 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 7d170c6a8..d45aa525f 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1303,62 +1303,76 @@ the header value to an `int`, a `NumberFormatException` is thrown. If the `getDateHeader` method cannot translate the header to a `Date` object, an `IllegalArgumentException` is thrown. -=== Request URI Processing +=== Request URI Path Processing +The path portion of the URI of an HTTP identifies the resource to be processed. As URI paths may have various non-canonical forms, it is important that all containers process URI paths in the same way so that matching to security constraints and resources is identical. + The process described here adapts and extends the URI canonicalization process described in [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986) to create a standard Servlet URI path canonicalization process that ensures that URIs can be mapped to Servlets, Filters and security constraints in an unambiguous manner. It is also intended to provide information to reverse proxy implementations so they are aware of how requests they pass to servlet containers will be processed. -Servlet containers may implement the standard Servlet URI path canonicalization in any manner they see fit as long as the end result is identical to the end result of the process described here. Servlet containers may provide container specific configuration options to vary the standard canonicalization process. Any such variations may have security implications and both Servlet container implementors and users are advised to be sure that they understand the implications of any such container specific canonicalization options. +==== Obtaining the URI Path +HTTP/0.9:: The URI path is the `document address` as defined by the W3C [Original HTTP](https://www.w3.org/Protocols/HTTP/AsImplemented.html) document. -==== URI Transport -HTTP/0.9:: The URI is the `document address` as defined by the W3C [Original HTTP](https://www.w3.org/Protocols/HTTP/AsImplemented.html) document. +HTTP/1.0:: The URI path is extracted from the `Request-URI` in the `Request-Line` as defined by [RFC 1945](https://datatracker.ietf.org/doc/html/rfc1945#section-5.1). URIs in `abs_path` form are the URI path. URIs in `absoluteURI` have the protocol and authority removed to convert them to `origin-form` and thus obtain the URI path. -HTTP/1.0:: The URI is extracted from the `Request-URI` in the `Request-Line` as defined by [RFC 1945](https://datatracker.ietf.org/doc/html/rfc1945#section-5.1). URIs in `abs_path` form are passed unchanged to subsequent steps. URIs in `absoluteURI` have the protocol and authority removed to convert them to `origin-form` and are then passed the subsequent steps. +HTTP/1.1:: The URI path is extracted from the `request-target` as defined by [RFC 7230](https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.1). URIs in `origin-form` are the URI path. URIs in `absolute-form` have the protocol and authority removed to convert them to `origin-form` and thus obtain the URI path. URIs in `authority-form` or `asterisk-form` are outside of the scope of this specification. -HTTP/1.1:: The URI is extracted from the `request-target` as defined by [RFC 7230](https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.1). URIs in `origin-form` are passed unchanged to subsequent steps. URIs in `absolute-form` have the protocol and authority removed to convert them to `origin-form` and are then passed the subsequent steps. URIs in `authority-form` or `asterisk-form` are outside of the scope of this specification. +HTTP/2:: The URI path is the `:path` pseudo header as defined by [RFC 7540](https://datatracker.ietf.org/doc/html/rfc7540#section-8.1.2.3) and is passed unchanged to stage 2. -HTTP/2:: The URI is the `:path` pseudo header as defined by [RFC 7540](https://datatracker.ietf.org/doc/html/rfc7540#section-8.1.2.3) and is passed unchanged to stage 2. +HTTP/3:: The URI path is the `:path` pseudo header as currently defined by the [draft RFC](https://datatracker.ietf.org/doc/html/draft-ietf-quic-http-34). -HTTP/3:: The URI is the `:path` pseudo header as currently defined by the [draft RFC](https://datatracker.ietf.org/doc/html/draft-ietf-quic-http-34). +Other protocols:: Containers may support other protocols. Containers should extract an appropriate URI path for the request from the protocol and pass it to stage 2. -Other protocols:: Containers may support other protocols. Containers should extract an appropriate URI for the request from the protocol and pass it to stage 2. +==== URI Path Canonicalization -==== Separation of path and query -The URI is split by the first occurrence of any '?' character to path and query. The query is preserved for later handling and the following steps applied to the path. +Servlet containers may implement the standard Servlet URI path canonicalization in any manner they see fit as long as the end result is identical to the end result of the process described here. Servlet containers may provide container specific configuration options to vary the standard canonicalization process. Any such variations may have security implications and both Servlet container implementors and users are advised to be sure that they understand the implications of any such container specific canonicalization options. -==== Discard fragment -A fragment in the path is indicated by the first occurrence of a `\#` character. Any `#` character and following fragment is removed from the path and discarded. +. **Separation of path and query.** ++ +The URI is split by the first occurrence of any '"?"' character to path and query. The query is preserved for later handling and the following steps applied to the path. -==== Decoding of characters as UTF-8 -Characters other than `/`, `;` and `%` that are encoded in `%nn` form are decoded. The resulting octet sequences is treated as UTF-8 and converted to a character sequence. +. **Discard fragment.** ++ +The path is split by the first occurrence of any `"\#"` character. The `"#"` and following fragment are discarded and the path is replaced with the character sequence preceding the `"#"` character. -==== Collapse sequences of multiple `"/"` characters -Any sequence of more than one `"/"` character in the URI must be replaced with a single `"/"`. +. **Split path into segments.** ++ +The path is split into multiple segments divided by the `"/"` character. A leading `"/"` does not indicate an initial empty segment. -==== Remove dot-segments+ -* Sequences of the form `"/./"` must be replaced with `"/"`. -* Sequences of the form `"/" + segment + "/../"` must be replaced with `"/"`. +. **Remove path parameters.** ++ +Any segment containing the `";"` character is split at the first occurrence of `";"`. The segment is replaced by the character sequence preceding the `";"`. The characters following the `";"` are considered path parameters and may be preserved by the container for later decoding and/or processing (eg `jsessionid`). -==== Removal of path parameters -A path segment containing the `";"` character is split at the first occurence of `";"`. The segment is replaced by the character sequence preceeding the `";"`. The characters following the `";"` are considered a path parameters and may be preserved by the container for later processing (eg `jsessionid`). +. **Decode characters.** ++ +Characters that are encoded in `%nn` form are decoded in each segment. The resulting octet sequences is treated as UTF-8 and converted to a character sequence that replaces the segment. -==== Decoding of remaining `%nn` sequences -Any remaining `%nn` sequences in the path should be decoded. Some containers may be configured to leave some specific characters encoded (eg. the characters '/' and '%' may be left decoded by some container configuration). +. **Remove Empty Segments.** ++ +Empty segments are removed. Containers may be configured to retain empty segments. -==== Mapping URI to context and resource -The decoded path is used to map the request to a context and resource within the context. This form of the URI path is used for all subsequent mapping (web applications, servlet, filters and security constraints). +. **Remove dot-segments.** ++ +All segments that are exactly `"."` are removed from the segment series. Segments that are exactly `".."` and that are preceded by a non `".."` segment are removed together with the preceding segment. This normalization differs from RFC3986 in that segments with `%` encoded dots and/or parameters may be treated as dot segments. -==== Rejecting Suspicious Sequences -If suspicious sequences are discovered during the prior steps, the request must be rejected with a 400 bad request. If a context is matched the the error handling of the context may be used to generate the response. By default the set of suspicious sequences includes: +. **Concatenate segments.** ++ +The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If a segment contains the `"/"` or `"%"` characters, and the request will not be rejected below, then the container may configured re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. - * Any path not starting with the `"/"` character - * Any path starting with an initial segment of `".."` - * The encoded `"/"` character - * Any `"."` or `".."` segment that had a path parameter - * Any `"."` or `".."` segment with any encoded characters - * Any `".."` segment preceded by an empty segment - * The `"\"` character encoded or not. - * Any control characters either encoded or not. +. **Mapping URI to context and resource.** ++ +The decoded path is used to map the request to a context and resource within the context. This form of the URI path is used for all subsequent mapping (web applications, servlet, filters and security constraints). -A container or context may be configured to have a different set of rejected sequences. +. **Rejecting Suspicious Sequences.** ++ +If suspicious sequences are discovered during the prior processing steps, the request must be rejected with a 400 bad request rather than dispatched to the target servlet. If a context is matched then the error handling of the context may be used to generate the 400 response. By default, the set of suspicious sequences below, but may be configured differently by a container: + + * Any path not starting with the `"/"` character + * Any path starting with an initial segment of `".."` + * The encoded `"/"` character + * Any `"."` or `".."` segment that had a path parameter + * Any `"."` or `".."` segment with any encoded characters + * Any `".."` segment preceded by an empty segment + * The `"\"` character encoded or not. + * Any control characters either encoded or not. ==== Example URIs > TODO more work needed on this section. Perhaps some parts need to be moved to other sections. From ee3b679272b06f66da510aa27eefa859c783f7ce Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Fri, 8 Oct 2021 14:28:08 +1100 Subject: [PATCH 11/42] First attempt at generating URI example table --- api/pom.xml | 8 + .../servlet/http/CanonicalUriPathTest.java | 200 ++++++++++++++++++ spec/src/main/asciidoc/servlet-spec-body.adoc | 110 +++++----- 3 files changed, 264 insertions(+), 54 deletions(-) create mode 100644 api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java diff --git a/api/pom.xml b/api/pom.xml index 8e2dd6922..5aebeaacc 100644 --- a/api/pom.xml +++ b/api/pom.xml @@ -372,6 +372,14 @@ Use is subject to + + org.apache.maven.plugins + maven-compiler-plugin + + 10 + 10 + + diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java new file mode 100644 index 000000000..264923438 --- /dev/null +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -0,0 +1,200 @@ +package jakarta.servlet.http; + +import java.io.ByteArrayOutputStream; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.ListIterator; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.ValueSource; + +public class CanonicalUriPathTest { + + public static String canonicalUriPath(String uriPath) { + if (uriPath == null) + throw new IllegalArgumentException("null path"); + + String path = uriPath; + + // Separation of path and query. + if (path.contains("?")) + path = path.substring(0, path.indexOf('?')); + + // Discard fragment. + if (path.contains("#")) + path = path.substring(0, path.indexOf('#')); + + // Remember start/end conditions + boolean startsWithSlash = path.startsWith("/"); + boolean endsWithSlash = path.startsWith("/"); + boolean dotSegmentWithParam = false; + boolean encodedDotSegment = false; + + // Split path into segments. + List segments = new ArrayList<>(Arrays.asList(path.substring(1).split("/"))); + + // Remove path parameters. + for (ListIterator s = segments.listIterator(); s.hasNext();) { + String segment = s.next(); + if (segment.contains(";")) { + segment = segment.substring(0, segment.indexOf(';')); + s.set(segment); + + dotSegmentWithParam |= (".".equals(segment) || "..".equals(segment)); + } + } + + // Decode characters + for (ListIterator s = segments.listIterator(); s.hasNext();) { + String segment = s.next(); + if (segment.contains("%")) { + StringBuilder buf = new StringBuilder(); + ByteArrayOutputStream utf8 = new ByteArrayOutputStream(); + for (int i = 0; i < segment.length(); i++) { + char c = segment.charAt(i); + if (c == '%') { + utf8.write(Integer.parseInt(segment.substring(i + 1, i + 3), 16)); + i += 2; + } else { + if (utf8.size() > 0) { + buf.append(utf8.toString(StandardCharsets.UTF_8)); + utf8.reset(); + } + + buf.append(c); + } + } + segment = buf.toString(); + s.set(segment); + encodedDotSegment |= (".".equals(segment) || "..".equals(segment)); + } + } + + // Remove Empty Segments + segments.removeIf(s1 -> s1.length() == 0); + + // Remove dot-segments + int count = 0; + for (ListIterator s = segments.listIterator(); s.hasNext();) { + String segment = s.next(); + + if (segment.equals(".")) { + s.remove(); + } else if (segment.equals("..")) { + if (count > 0) { + s.remove(); + s.previous(); + s.remove(); + count--; + } + } else { + count++; + } + } + + // Concatenate segments + StringBuilder buf = new StringBuilder(); + + for (ListIterator s = segments.listIterator(); s.hasNext();) { + String segment = s.next(); + if (s.hasPrevious() || startsWithSlash) + buf.append("/"); + buf.append(segment); + } + if (endsWithSlash) + buf.append("/"); + path = buf.toString(); + + // Rejecting Suspicious Sequences + String reject = null; + // Any path not starting with the `"/"` character + if (!startsWithSlash) + reject = "must start with /"; + // Any path starting with an initial segment of `".."` + else if (segments.get(0).equals("..")) + reject = "leading dot-dot-segment"; + // The encoded `"/"` character + else if (uriPath.contains("%2f") || uriPath.contains("%2F")) + reject = "encoded /"; + // Any `"."` or `".."` segment that had a path parameter + else if (dotSegmentWithParam) + reject = "dot segment with parameter"; + // Any `"."` or `".."` segment with any encoded characters + else if (encodedDotSegment) + reject = "encoded dot segment"; + // Any `".."` segment preceded by an empty segment + // TODO + // The `"\"` character encoded or not. + // TODO + // Any control characters either encoded or not. + // TODO + + System.err.printf("| `%s` | `%s` | %s %n", + uriPath, + path, + reject == null ? "" : ("400 " + reject)); + + return path; + } + + @ParameterizedTest + @ValueSource(strings = { + "/public/file.txt", + "/public//file.txt", + "/public/;/file.txt", + "/PUBLIC/file.txt", + "/public%2Ffile.txt", + "/public%5Cfile.txt", + "/public%00/file.txt", + "/public/./file.txt", + "/public/.;/file.txt", + "/public/%2e/file.txt", + "/public/%2e;/file.txt", + "/../docroot/public/file.txt", + "/public/dir/../file.txt", + "/public//../file.txt", + "/public/;param/../file.txt", + "/public/dir/..;/file.txt", + "/public/dir/%2e%2e/file.txt", + "/public/dir/%2e%2e;/file.txt", + "/WEB-INF/web.xml", + "/web-inf/web.xml", + "/WEB-IN~1.DIR/web.xml", + "/WEB-INF;/web.xml", + "/WEB-INF%2Fweb.xml", + "/WEB-INF%5Cweb.xml", + "/WEB-INF%00/web.xml", + "/WEB-INF/./web.xml", + "/public/../WEB-INF/web.xml", + "/public/..;/WEB-INF/web.xml", + "/public/%2e%2e/WEB-INF/web.xml", + "/public/%2e%2e;/WEB-INF/web.xml", + "/secret/private.xml", + "/SeCreT/private.xml", + "/SECRET~1.DIR/private.xml", + "/secret;/private.xml", + "/secret%2Fprivate.xml", + "/secret%5Cprivate.xml", + "/secret%00/private.xml", + "/./secret/private.xml", + "/.;/secret/private.xml", + "/%2e/secret/private.xml", + "/%2e;/secret/private.xml", + "/public/../secret/private.xml", + "/public/..;/secret/private.xml", + "/public/%2e%2e/secret/private.xml", + "/public/%2e%2e;/secret/private.xml", + "/dispatch/public/file.txt", + "/dispatch/public%2Ffile.txt", + "/dispatch/public%5Cfile.txt", + "/dispatch/public%252Ffile.txt", + "/dispatch/WEB-INF/web.xml", + "/dispatch/secret/private.xml", + "/dispatch/%2E%2E/%2E%2E/etc/password", + }) + + public void testCanonicalUriPath(String path) { + canonicalUriPath(path); + } +} diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index d45aa525f..9ed49715a 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1381,60 +1381,62 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` . Example URIs |=== -| Encoded URI path | Decoded Path | Rejected - -| /public/file.txt | /public/file.txt | -| /public//file.txt | /public/file.txt | -| /public/;/file.txt | /public/file.txt | -| /PUBLIC/file.txt | /PUBLIC/file.txt | -| /public%2Ffile.txt | /public%2Ffile.txt | 400 -| /public%5Cfile.txt | /public\file.txt | 400 -| /public%00/file.txt | /public␀/file.txt | 400? -| /public/./file.txt | /public/file.txt | -| /public/.;/file.txt | /public/file.txt | 400 -| /public/%2e/file.txt | /public/file.txt | 400 -| /public/%2e;/file.txt | /public/file.txt | 400 -| /../docroot/public/file.txt | /../docroot/public/file.txt | 400 -| /public/dir/../file.txt | /public/file.txt | -| /public//../file.txt | /file.txt | 400? -| /public/;param/../file.txt | /file.txt | 400? -| /public/dir/..;/file.txt | /public/file.txt | 400 -| /public/dir/%2e%2e/file.txt | /public/file.txt | 400 -| /public/dir/%2e%2e;/file.txt | /public/file.txt | 400 -| /WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? -| /web-inf/web.xml | /web-inf/web.xml | 404 or 403? -| /WEB-IN~1.DIR/web.xml | ? | ? -| /WEB-INF;/web.xml | /WEB-INF/web.xml | 404 -| /WEB-INF%2Fweb.xml | /WEB-INF%2Fweb.xml| 400 -| /WEB-INF%5Cweb.xml | /WEB-INF\web.xml | 400 -| /WEB-INF%00/web.xml | /WEB-INF␀/web.xml | 400? -| /WEB-INF/./web.xml | /WEB-INF/web.xml | 404 -| /public/../WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? -| /public/..;/WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? -| /public/%2e%2e/WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? -| /public/%2e%2e;/WEB-INF/web.xml | /WEB-INF/web.xml | 404 or 403? -| /secret/private.xml | /secret/private.xml | 403 -| /SeCreT/private.xml | /SeCreT/private.xml | TODO move to another section an 404 due to non-canonical -| /SECRET~1.DIR/private.xml | | TODO move to another section an 404 due to non-canonical -| /secret;/private.xml | | 403 -| /secret%2Fprivate.xml | /secret%2Fprivate.xml | 400 -| /secret%5Cprivate.xml | /secret\private.xml | 400 -| /secret%00/private.xml | /secret␀/private.xml | 400? -| /./secret/private.xml | /secret/private.xml| 403 -| /.;/secret/private.xml | /secret/private.xml| 403 -| /%2e/secret/private.xml | /secret/private.xml | 400 -| /%2e;/secret/private.xml | /secret/private.xml | 400 -| /public/../secret/private.xml | /secret/private.xml | 403 -| /public/..;/secret/private.xml | /secret/private.xml | 403 -| /public/%2e%2e/secret/private.xml | /secret/private.xml | 400 -| /public/%2e%2e;/secret/private.xml | /secret/private.xml | 400 -| /dispatch/public/file.txt | /public/file.txt | -| /dispatch/public%2Ffile.txt | /public/file.txt | 400? -| /dispatch/public%5Cfile.txt | /public\file.txt | 400? -| /dispatch/public%252Ffile.txt | /public%2Ffile.txt | 400? -| /dispatch/WEB-INF/web.xml | /WEB-INF/web.xml | -| /dispatch/secret/private.xml | /secret/private/xml | -| /dispatch/%2E%2E/%2E%2E/etc/password | /../../etc/password | 400? +| Encoded URI path | Decoded Path | Rejected + +| `/public//../file.txt` | `/file.txt/` | +| `/public/dir/..;/file.txt` | `/public/file.txt/` | 400 dot segment with parameter +| `/public/dir/%2e%2e;/file.txt` | `/public/dir/file.txt/` | +| `/public/file.txt` | `/public/file.txt/` | +| `/public//file.txt` | `/public/file.txt/` | +| `/public/;/file.txt` | `/public/file.txt/` | +| `/PUBLIC/file.txt` | `/PUBLIC/file.txt/` | +| `/public%2Ffile.txt` | `/public/file.txt/` | 400 encoded / +| `/public%5Cfile.txt` | `/public\file.txt/` | +| `/public%00/file.txt` | `/public/file.txt/` | +| `/public/./file.txt` | `/public/file.txt/` | +| `/public/.;/file.txt` | `/public/file.txt/` | 400 dot segment with parameter +| `/public/%2e/file.txt` | `/public/file.txt/` | +| `/public/%2e;/file.txt` | `/public/file.txt/` | +| `/../docroot/public/file.txt` | `/../docroot/public/file.txt/` | 400 leading dot-dot-segment +| `/public/dir/../file.txt` | `/public/file.txt/` | +| `/public/;param/../file.txt` | `/file.txt/` | +| `/public/dir/%2e%2e/file.txt` | `/public/dir/file.txt/` | +| `/WEB-INF/web.xml` | `/WEB-INF/web.xml/` | +| `/web-inf/web.xml` | `/web-inf/web.xml/` | +| `/WEB-IN~1.DIR/web.xml` | `/WEB-IN~1.DIR/web.xml/` | +| `/WEB-INF;/web.xml` | `/WEB-INF/web.xml/` | +| `/WEB-INF%2Fweb.xml` | `/WEB-INF/web.xml/` | 400 encoded / +| `/WEB-INF%5Cweb.xml` | `/WEB-INF\web.xml/` | +| `/WEB-INF%00/web.xml` | `/WEB-INF/web.xml/` | +| `/WEB-INF/./web.xml` | `/WEB-INF/web.xml/` | +| `/public/../WEB-INF/web.xml` | `/WEB-INF/web.xml/` | +| `/public/..;/WEB-INF/web.xml` | `/WEB-INF/web.xml/` | 400 dot segment with parameter +| `/public/%2e%2e/WEB-INF/web.xml` | `/public/WEB-INF/web.xml/` | +| `/public/%2e%2e;/WEB-INF/web.xml` | `/public/WEB-INF/web.xml/` | +| `/secret/private.xml` | `/secret/private.xml/` | +| `/SeCreT/private.xml` | `/SeCreT/private.xml/` | +| `/SECRET~1.DIR/private.xml` | `/SECRET~1.DIR/private.xml/` | +| `/secret;/private.xml` | `/secret/private.xml/` | +| `/secret%2Fprivate.xml` | `/secret/private.xml/` | 400 encoded / +| `/secret%5Cprivate.xml` | `/secret\private.xml/` | +| `/secret%00/private.xml` | `/secret/private.xml/` | +| `/./secret/private.xml` | `/secret/private.xml/` | +| `/.;/secret/private.xml` | `/secret/private.xml/` | 400 dot segment with parameter +| `/%2e/secret/private.xml` | `/secret/private.xml/` | +| `/%2e;/secret/private.xml` | `/secret/private.xml/` | +| `/public/../secret/private.xml` | `/secret/private.xml/` | +| `/public/..;/secret/private.xml` | `/secret/private.xml/` | 400 dot segment with parameter +| `/public/%2e%2e/secret/private.xml` | `/public/secret/private.xml/` | +| `/public/%2e%2e;/secret/private.xml` | `/public/secret/private.xml/` | +| `/dispatch/public/file.txt` | `/dispatch/public/file.txt/` | +| `/dispatch/public%2Ffile.txt` | `/dispatch/public/file.txt/` | 400 encoded / +| `/dispatch/public%5Cfile.txt` | `/dispatch/public\file.txt/` | +| `/dispatch/public%252Ffile.txt` | `/dispatch/public%2Ffile.txt/` | +| `/dispatch/WEB-INF/web.xml` | `/dispatch/WEB-INF/web.xml/` | +| `/dispatch/secret/private.xml` | `/dispatch/secret/private.xml/` | +| `/dispatch/%2E%2E/%2E%2E/etc/password` | `/dispatch/etc/password/` | +|=== + === Request Path Elements From 77f602edcd2d5dacca750076e3d3094d50edfbcb Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Fri, 8 Oct 2021 14:30:59 +1100 Subject: [PATCH 12/42] handle final slash --- spec/src/main/asciidoc/servlet-spec-body.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 9ed49715a..038d5b068 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1355,7 +1355,8 @@ All segments that are exactly `"."` are removed from the segment series. Segment . **Concatenate segments.** + -The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If a segment contains the `"/"` or `"%"` characters, and the request will not be rejected below, then the container may configured re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. +The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If the original path after stripping query and fragment ends with a `"/"` then append a final `"/"` character. + If a segment contains the `"/"` or `"%"` characters, and the request will not be rejected below, then the container may configured re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. . **Mapping URI to context and resource.** + From 639ad85789d602af7d08e1194919032da8999c27 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Fri, 8 Oct 2021 15:38:52 +1100 Subject: [PATCH 13/42] revert auto change to pom.xml --- api/pom.xml | 8 -------- 1 file changed, 8 deletions(-) diff --git a/api/pom.xml b/api/pom.xml index 5aebeaacc..8e2dd6922 100644 --- a/api/pom.xml +++ b/api/pom.xml @@ -372,14 +372,6 @@ Use is subject to - - org.apache.maven.plugins - maven-compiler-plugin - - 10 - 10 - - From 38f3cdd0cefb912a1ec4525a825ecdce33998d76 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Fri, 8 Oct 2021 19:03:21 +1100 Subject: [PATCH 14/42] More rejections in the uri test --- .../servlet/http/CanonicalUriPathTest.java | 29 +++++++++--- spec/src/main/asciidoc/servlet-spec-body.adoc | 45 ++++++++++--------- 2 files changed, 47 insertions(+), 27 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 264923438..6c130f985 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -30,6 +30,7 @@ public static String canonicalUriPath(String uriPath) { boolean endsWithSlash = path.startsWith("/"); boolean dotSegmentWithParam = false; boolean encodedDotSegment = false; + boolean emptySegmentBeforeDotDot = false; // Split path into segments. List segments = new ArrayList<>(Arrays.asList(path.substring(1).split("/"))); @@ -58,13 +59,17 @@ public static String canonicalUriPath(String uriPath) { i += 2; } else { if (utf8.size() > 0) { - buf.append(utf8.toString(StandardCharsets.UTF_8)); + buf.append(new String(utf8.toByteArray(), StandardCharsets.UTF_8)); utf8.reset(); } buf.append(c); } } + if (utf8.size() > 0) { + buf.append(new String(utf8.toByteArray(), StandardCharsets.UTF_8)); + utf8.reset(); + } segment = buf.toString(); s.set(segment); encodedDotSegment |= (".".equals(segment) || "..".equals(segment)); @@ -84,9 +89,10 @@ public static String canonicalUriPath(String uriPath) { } else if (segment.equals("..")) { if (count > 0) { s.remove(); - s.previous(); + String previous = s.previous(); s.remove(); count--; + emptySegmentBeforeDotDot |= previous.length() == 0; } } else { count++; @@ -124,11 +130,20 @@ else if (dotSegmentWithParam) else if (encodedDotSegment) reject = "encoded dot segment"; // Any `".."` segment preceded by an empty segment - // TODO + else if (emptySegmentBeforeDotDot) + reject = "empty segment before dot dot"; // The `"\"` character encoded or not. - // TODO + else if (path.contains("\\")) + reject = "backslash character"; // Any control characters either encoded or not. - // TODO + else { + for (char c : path.toCharArray()) { + if (c < 0x20 || c == 0x7f) { + reject = "control character"; + break; + } + } + } System.err.printf("| `%s` | `%s` | %s %n", uriPath, @@ -156,7 +171,9 @@ else if (encodedDotSegment) "/public//../file.txt", "/public/;param/../file.txt", "/public/dir/..;/file.txt", - "/public/dir/%2e%2e/file.txt", + "/public/dir/%2e%2E/file.txt", + "/public/dir/.%2e/file.txt", + "/public/dir/%2E./file.txt", "/public/dir/%2e%2e;/file.txt", "/WEB-INF/web.xml", "/web-inf/web.xml", diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 038d5b068..a5ba8e79a 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1384,58 +1384,61 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` |=== | Encoded URI path | Decoded Path | Rejected -| `/public//../file.txt` | `/file.txt/` | -| `/public/dir/..;/file.txt` | `/public/file.txt/` | 400 dot segment with parameter -| `/public/dir/%2e%2e;/file.txt` | `/public/dir/file.txt/` | +| `/public/.;/file.txt` | `/public/file.txt/` | 400 dot segment with parameter | `/public/file.txt` | `/public/file.txt/` | | `/public//file.txt` | `/public/file.txt/` | | `/public/;/file.txt` | `/public/file.txt/` | | `/PUBLIC/file.txt` | `/PUBLIC/file.txt/` | | `/public%2Ffile.txt` | `/public/file.txt/` | 400 encoded / -| `/public%5Cfile.txt` | `/public\file.txt/` | -| `/public%00/file.txt` | `/public/file.txt/` | +| `/public%5Cfile.txt` | `/public\file.txt/` | 400 backslash character +| `/public%00/file.txt` | `/public /file.txt/` | 400 control character | `/public/./file.txt` | `/public/file.txt/` | -| `/public/.;/file.txt` | `/public/file.txt/` | 400 dot segment with parameter -| `/public/%2e/file.txt` | `/public/file.txt/` | -| `/public/%2e;/file.txt` | `/public/file.txt/` | +| `/public/%2e/file.txt` | `/public/file.txt/` | 400 encoded dot segment +| `/public/%2e;/file.txt` | `/public/file.txt/` | 400 encoded dot segment | `/../docroot/public/file.txt` | `/../docroot/public/file.txt/` | 400 leading dot-dot-segment | `/public/dir/../file.txt` | `/public/file.txt/` | +| `/public//../file.txt` | `/file.txt/` | | `/public/;param/../file.txt` | `/file.txt/` | -| `/public/dir/%2e%2e/file.txt` | `/public/dir/file.txt/` | +| `/public/dir/..;/file.txt` | `/public/file.txt/` | 400 dot segment with parameter +| `/public/dir/%2e%2E/file.txt` | `/public/file.txt/` | 400 encoded dot segment +| `/public/dir/.%2e/file.txt` | `/public/file.txt/` | 400 encoded dot segment +| `/public/dir/%2E./file.txt` | `/public/file.txt/` | 400 encoded dot segment +| `/public/dir/%2e%2e;/file.txt` | `/public/file.txt/` | 400 encoded dot segment | `/WEB-INF/web.xml` | `/WEB-INF/web.xml/` | | `/web-inf/web.xml` | `/web-inf/web.xml/` | | `/WEB-IN~1.DIR/web.xml` | `/WEB-IN~1.DIR/web.xml/` | | `/WEB-INF;/web.xml` | `/WEB-INF/web.xml/` | | `/WEB-INF%2Fweb.xml` | `/WEB-INF/web.xml/` | 400 encoded / -| `/WEB-INF%5Cweb.xml` | `/WEB-INF\web.xml/` | -| `/WEB-INF%00/web.xml` | `/WEB-INF/web.xml/` | +| `/WEB-INF%5Cweb.xml` | `/WEB-INF\web.xml/` | 400 backslash character +| `/WEB-INF%00/web.xml` | `/WEB-INF /web.xml/` | 400 control character | `/WEB-INF/./web.xml` | `/WEB-INF/web.xml/` | | `/public/../WEB-INF/web.xml` | `/WEB-INF/web.xml/` | | `/public/..;/WEB-INF/web.xml` | `/WEB-INF/web.xml/` | 400 dot segment with parameter -| `/public/%2e%2e/WEB-INF/web.xml` | `/public/WEB-INF/web.xml/` | -| `/public/%2e%2e;/WEB-INF/web.xml` | `/public/WEB-INF/web.xml/` | +| `/public/%2e%2e/WEB-INF/web.xml` | `/WEB-INF/web.xml/` | 400 encoded dot segment +| `/public/%2e%2e;/WEB-INF/web.xml` | `/WEB-INF/web.xml/` | 400 encoded dot segment | `/secret/private.xml` | `/secret/private.xml/` | | `/SeCreT/private.xml` | `/SeCreT/private.xml/` | | `/SECRET~1.DIR/private.xml` | `/SECRET~1.DIR/private.xml/` | | `/secret;/private.xml` | `/secret/private.xml/` | | `/secret%2Fprivate.xml` | `/secret/private.xml/` | 400 encoded / -| `/secret%5Cprivate.xml` | `/secret\private.xml/` | -| `/secret%00/private.xml` | `/secret/private.xml/` | +| `/secret%5Cprivate.xml` | `/secret\private.xml/` | 400 backslash character +| `/secret%00/private.xml` | `/secret /private.xml/` | 400 control character | `/./secret/private.xml` | `/secret/private.xml/` | | `/.;/secret/private.xml` | `/secret/private.xml/` | 400 dot segment with parameter -| `/%2e/secret/private.xml` | `/secret/private.xml/` | -| `/%2e;/secret/private.xml` | `/secret/private.xml/` | +| `/%2e/secret/private.xml` | `/secret/private.xml/` | 400 encoded dot segment +| `/%2e;/secret/private.xml` | `/secret/private.xml/` | 400 encoded dot segment | `/public/../secret/private.xml` | `/secret/private.xml/` | | `/public/..;/secret/private.xml` | `/secret/private.xml/` | 400 dot segment with parameter -| `/public/%2e%2e/secret/private.xml` | `/public/secret/private.xml/` | -| `/public/%2e%2e;/secret/private.xml` | `/public/secret/private.xml/` | +| `/public/%2e%2e/secret/private.xml` | `/secret/private.xml/` | 400 encoded dot segment +| `/public/%2e%2e;/secret/private.xml` | `/secret/private.xml/` | 400 encoded dot segment | `/dispatch/public/file.txt` | `/dispatch/public/file.txt/` | | `/dispatch/public%2Ffile.txt` | `/dispatch/public/file.txt/` | 400 encoded / -| `/dispatch/public%5Cfile.txt` | `/dispatch/public\file.txt/` | +| `/dispatch/public%5Cfile.txt` | `/dispatch/public\file.txt/` | 400 backslash character | `/dispatch/public%252Ffile.txt` | `/dispatch/public%2Ffile.txt/` | | `/dispatch/WEB-INF/web.xml` | `/dispatch/WEB-INF/web.xml/` | | `/dispatch/secret/private.xml` | `/dispatch/secret/private.xml/` | -| `/dispatch/%2E%2E/%2E%2E/etc/password` | `/dispatch/etc/password/` | +| `/dispatch/%2E%2E/%2E%2E/etc/password` | `/../etc/password/` | 400 leading dot-dot-segment + |=== From 4c9d28ba50407f11c37cf886764aff0b87c6fc40 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sat, 9 Oct 2021 08:19:26 +1100 Subject: [PATCH 15/42] Fixed trailing / issue --- .../servlet/http/CanonicalUriPathTest.java | 3 +- spec/src/main/asciidoc/servlet-spec-body.adoc | 106 +++++++++--------- 2 files changed, 53 insertions(+), 56 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 6c130f985..5dae7108f 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -27,7 +27,7 @@ public static String canonicalUriPath(String uriPath) { // Remember start/end conditions boolean startsWithSlash = path.startsWith("/"); - boolean endsWithSlash = path.startsWith("/"); + boolean endsWithSlash = path.endsWith("/"); boolean dotSegmentWithParam = false; boolean encodedDotSegment = false; boolean emptySegmentBeforeDotDot = false; @@ -156,6 +156,7 @@ else if (path.contains("\\")) @ParameterizedTest @ValueSource(strings = { "/public/file.txt", + "/public/dir/", "/public//file.txt", "/public/;/file.txt", "/PUBLIC/file.txt", diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index a5ba8e79a..e9e5ddbf5 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1384,61 +1384,57 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` |=== | Encoded URI path | Decoded Path | Rejected -| `/public/.;/file.txt` | `/public/file.txt/` | 400 dot segment with parameter -| `/public/file.txt` | `/public/file.txt/` | -| `/public//file.txt` | `/public/file.txt/` | -| `/public/;/file.txt` | `/public/file.txt/` | -| `/PUBLIC/file.txt` | `/PUBLIC/file.txt/` | -| `/public%2Ffile.txt` | `/public/file.txt/` | 400 encoded / -| `/public%5Cfile.txt` | `/public\file.txt/` | 400 backslash character -| `/public%00/file.txt` | `/public /file.txt/` | 400 control character -| `/public/./file.txt` | `/public/file.txt/` | -| `/public/%2e/file.txt` | `/public/file.txt/` | 400 encoded dot segment -| `/public/%2e;/file.txt` | `/public/file.txt/` | 400 encoded dot segment -| `/../docroot/public/file.txt` | `/../docroot/public/file.txt/` | 400 leading dot-dot-segment -| `/public/dir/../file.txt` | `/public/file.txt/` | -| `/public//../file.txt` | `/file.txt/` | -| `/public/;param/../file.txt` | `/file.txt/` | -| `/public/dir/..;/file.txt` | `/public/file.txt/` | 400 dot segment with parameter -| `/public/dir/%2e%2E/file.txt` | `/public/file.txt/` | 400 encoded dot segment -| `/public/dir/.%2e/file.txt` | `/public/file.txt/` | 400 encoded dot segment -| `/public/dir/%2E./file.txt` | `/public/file.txt/` | 400 encoded dot segment -| `/public/dir/%2e%2e;/file.txt` | `/public/file.txt/` | 400 encoded dot segment -| `/WEB-INF/web.xml` | `/WEB-INF/web.xml/` | -| `/web-inf/web.xml` | `/web-inf/web.xml/` | -| `/WEB-IN~1.DIR/web.xml` | `/WEB-IN~1.DIR/web.xml/` | -| `/WEB-INF;/web.xml` | `/WEB-INF/web.xml/` | -| `/WEB-INF%2Fweb.xml` | `/WEB-INF/web.xml/` | 400 encoded / -| `/WEB-INF%5Cweb.xml` | `/WEB-INF\web.xml/` | 400 backslash character -| `/WEB-INF%00/web.xml` | `/WEB-INF /web.xml/` | 400 control character -| `/WEB-INF/./web.xml` | `/WEB-INF/web.xml/` | -| `/public/../WEB-INF/web.xml` | `/WEB-INF/web.xml/` | -| `/public/..;/WEB-INF/web.xml` | `/WEB-INF/web.xml/` | 400 dot segment with parameter -| `/public/%2e%2e/WEB-INF/web.xml` | `/WEB-INF/web.xml/` | 400 encoded dot segment -| `/public/%2e%2e;/WEB-INF/web.xml` | `/WEB-INF/web.xml/` | 400 encoded dot segment -| `/secret/private.xml` | `/secret/private.xml/` | -| `/SeCreT/private.xml` | `/SeCreT/private.xml/` | -| `/SECRET~1.DIR/private.xml` | `/SECRET~1.DIR/private.xml/` | -| `/secret;/private.xml` | `/secret/private.xml/` | -| `/secret%2Fprivate.xml` | `/secret/private.xml/` | 400 encoded / -| `/secret%5Cprivate.xml` | `/secret\private.xml/` | 400 backslash character -| `/secret%00/private.xml` | `/secret /private.xml/` | 400 control character -| `/./secret/private.xml` | `/secret/private.xml/` | -| `/.;/secret/private.xml` | `/secret/private.xml/` | 400 dot segment with parameter -| `/%2e/secret/private.xml` | `/secret/private.xml/` | 400 encoded dot segment -| `/%2e;/secret/private.xml` | `/secret/private.xml/` | 400 encoded dot segment -| `/public/../secret/private.xml` | `/secret/private.xml/` | -| `/public/..;/secret/private.xml` | `/secret/private.xml/` | 400 dot segment with parameter -| `/public/%2e%2e/secret/private.xml` | `/secret/private.xml/` | 400 encoded dot segment -| `/public/%2e%2e;/secret/private.xml` | `/secret/private.xml/` | 400 encoded dot segment -| `/dispatch/public/file.txt` | `/dispatch/public/file.txt/` | -| `/dispatch/public%2Ffile.txt` | `/dispatch/public/file.txt/` | 400 encoded / -| `/dispatch/public%5Cfile.txt` | `/dispatch/public\file.txt/` | 400 backslash character -| `/dispatch/public%252Ffile.txt` | `/dispatch/public%2Ffile.txt/` | -| `/dispatch/WEB-INF/web.xml` | `/dispatch/WEB-INF/web.xml/` | -| `/dispatch/secret/private.xml` | `/dispatch/secret/private.xml/` | -| `/dispatch/%2E%2E/%2E%2E/etc/password` | `/../etc/password/` | 400 leading dot-dot-segment - +| `/public/file.txt` | `/public/file.txt` | +| `/public/dir/` | `/public/dir/` | +| `/public//file.txt` | `/public/file.txt` | +| `/PUBLIC/file.txt` | `/PUBLIC/file.txt` | +| `/public%2Ffile.txt` | `/public/file.txt` | 400 encoded / +| `/public%5Cfile.txt` | `/public\file.txt` | 400 backslash character +| `/public%00/file.txt` | `/public /file.txt` | 400 control character +| `/public/.;/file.txt` | `/public/file.txt` | 400 dot segment with parameter +| `/public/%2e/file.txt` | `/public/file.txt` | 400 encoded dot segment +| `/public/%2e;/file.txt` | `/public/file.txt` | 400 encoded dot segment +| `/../docroot/public/file.txt` | `/../docroot/public/file.txt` | 400 leading dot-dot-segment +| `/public/dir/../file.txt` | `/public/file.txt` | +| `/public//../file.txt` | `/file.txt` | +| `/public/dir/..;/file.txt` | `/public/file.txt` | 400 dot segment with parameter +| `/public/dir/%2e%2E/file.txt` | `/public/file.txt` | 400 encoded dot segment +| `/public/dir/.%2e/file.txt` | `/public/file.txt` | 400 encoded dot segment +| `/public/dir/%2E./file.txt` | `/public/file.txt` | 400 encoded dot segment +| `/public/dir/%2e%2e;/file.txt` | `/public/file.txt` | 400 encoded dot segment +| `/WEB-INF/web.xml` | `/WEB-INF/web.xml` | +| `/web-inf/web.xml` | `/web-inf/web.xml` | +| `/WEB-IN~1.DIR/web.xml` | `/WEB-IN~1.DIR/web.xml` | +| `/WEB-INF;/web.xml` | `/WEB-INF/web.xml` | +| `/WEB-INF%2Fweb.xml` | `/WEB-INF/web.xml` | 400 encoded / +| `/WEB-INF%5Cweb.xml` | `/WEB-INF\web.xml` | 400 backslash character +| `/WEB-INF%00/web.xml` | `/WEB-INF /web.xml` | 400 control character +| `/WEB-INF/./web.xml` | `/WEB-INF/web.xml` | +| `/public/../WEB-INF/web.xml` | `/WEB-INF/web.xml` | +| `/public/..;/WEB-INF/web.xml` | `/WEB-INF/web.xml` | 400 dot segment with parameter +| `/public/%2e%2e/WEB-INF/web.xml` | `/WEB-INF/web.xml` | 400 encoded dot segment +| `/public/%2e%2e;/WEB-INF/web.xml` | `/WEB-INF/web.xml` | 400 encoded dot segment +| `/secret/private.xml` | `/secret/private.xml` | +| `/SeCreT/private.xml` | `/SeCreT/private.xml` | +| `/SECRET~1.DIR/private.xml` | `/SECRET~1.DIR/private.xml` | +| `/secret;/private.xml` | `/secret/private.xml` | +| `/secret%2Fprivate.xml` | `/secret/private.xml` | 400 encoded / +| `/secret%5Cprivate.xml` | `/secret\private.xml` | 400 backslash character +| `/secret%00/private.xml` | `/secret /private.xml` | 400 control character +| `/./secret/private.xml` | `/secret/private.xml` | +| `/.;/secret/private.xml` | `/secret/private.xml` | 400 dot segment with parameter +| `/%2e/secret/private.xml` | `/secret/private.xml` | 400 encoded dot segment +| `/%2e;/secret/private.xml` | `/secret/private.xml` | 400 encoded dot segment +| `/public/../secret/private.xml` | `/secret/private.xml` | +| `/public/..;/secret/private.xml` | `/secret/private.xml` | 400 dot segment with parameter +| `/public/%2e%2e/secret/private.xml` | `/secret/private.xml` | 400 encoded dot segment +| `/public/%2e%2e;/secret/private.xml` | `/secret/private.xml` | 400 encoded dot segment +| `/dispatch/public/file.txt` | `/dispatch/public/file.txt` | +| `/dispatch/public%2Ffile.txt` | `/dispatch/public/file.txt` | 400 encoded / +| `/dispatch/public%5Cfile.txt` | `/dispatch/public\file.txt` | 400 backslash character +| `/dispatch/public%252Ffile.txt` | `/dispatch/public%2Ffile.txt` | +| `/dispatch/WEB-INF/web.xml` | `/dispatch/WEB-INF/web.xml` | +| `/dispatch/secret/private.xml` | `/dispatch/secret/private.xml` | |=== From c8c86e66df4785f7d2c72fe7ea9482b845ab8ad1 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sat, 9 Oct 2021 08:23:12 +1100 Subject: [PATCH 16/42] should re-encode / if not rejected --- spec/src/main/asciidoc/servlet-spec-body.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index e9e5ddbf5..738cf68e7 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1356,7 +1356,7 @@ All segments that are exactly `"."` are removed from the segment series. Segment . **Concatenate segments.** + The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If the original path after stripping query and fragment ends with a `"/"` then append a final `"/"` character. - If a segment contains the `"/"` or `"%"` characters, and the request will not be rejected below, then the container may configured re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. + If a segment contains the `"/"` or `"%"` characters, and the container is configured to not reject the request, then the container should re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. . **Mapping URI to context and resource.** + From d433fc8dfba3e271470a87c3fd5cb8b25a985ad2 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sat, 9 Oct 2021 08:30:59 +1100 Subject: [PATCH 17/42] discard fragment before query --- .../java/jakarta/servlet/http/CanonicalUriPathTest.java | 8 ++++---- spec/src/main/asciidoc/servlet-spec-body.adoc | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 5dae7108f..09a98fc6c 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -17,14 +17,14 @@ public static String canonicalUriPath(String uriPath) { String path = uriPath; - // Separation of path and query. - if (path.contains("?")) - path = path.substring(0, path.indexOf('?')); - // Discard fragment. if (path.contains("#")) path = path.substring(0, path.indexOf('#')); + // Separation of path and query. + if (path.contains("?")) + path = path.substring(0, path.indexOf('?')); + // Remember start/end conditions boolean startsWithSlash = path.startsWith("/"); boolean endsWithSlash = path.endsWith("/"); diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 738cf68e7..14d08a261 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1325,14 +1325,14 @@ Other protocols:: Containers may support other protocols. Containers should extr Servlet containers may implement the standard Servlet URI path canonicalization in any manner they see fit as long as the end result is identical to the end result of the process described here. Servlet containers may provide container specific configuration options to vary the standard canonicalization process. Any such variations may have security implications and both Servlet container implementors and users are advised to be sure that they understand the implications of any such container specific canonicalization options. -. **Separation of path and query.** -+ -The URI is split by the first occurrence of any '"?"' character to path and query. The query is preserved for later handling and the following steps applied to the path. - . **Discard fragment.** + The path is split by the first occurrence of any `"\#"` character. The `"#"` and following fragment are discarded and the path is replaced with the character sequence preceding the `"#"` character. +. **Separation of path and query.** ++ +The URI is split by the first occurrence of any '"?"' character to path and query. The query is preserved for later handling and the following steps applied to the path. + . **Split path into segments.** + The path is split into multiple segments divided by the `"/"` character. A leading `"/"` does not indicate an initial empty segment. From 223f75d73f43dc852fb49aa2c247b35b4e8ee749 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sat, 9 Oct 2021 08:44:04 +1100 Subject: [PATCH 18/42] encoded ``%2e` is equivalent to `.` for normilization. --- spec/src/main/asciidoc/servlet-spec-body.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 14d08a261..401d84230 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1351,7 +1351,7 @@ Empty segments are removed. Containers may be configured to retain empty segmen . **Remove dot-segments.** + -All segments that are exactly `"."` are removed from the segment series. Segments that are exactly `".."` and that are preceded by a non `".."` segment are removed together with the preceding segment. This normalization differs from RFC3986 in that segments with `%` encoded dots and/or parameters may be treated as dot segments. +All segments that are exactly `"."` are removed from the segment series. Segments that are exactly `".."` and that are preceded by a non `".."` segment are removed together with the preceding segment. This normalization differs from RFC3986 in that segments with parameters may be treated as dot segments. . **Concatenate segments.** + From c304e6b1a81a1a04421c34b8ca0fc10be5043d9b Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sat, 9 Oct 2021 08:56:32 +1100 Subject: [PATCH 19/42] empty segment with parameters --- .../servlet/http/CanonicalUriPathTest.java | 3 +++ spec/src/main/asciidoc/servlet-spec-body.adoc | 21 +++++++++++-------- 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 09a98fc6c..243bd4b07 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -132,6 +132,9 @@ else if (encodedDotSegment) // Any `".."` segment preceded by an empty segment else if (emptySegmentBeforeDotDot) reject = "empty segment before dot dot"; + // Any empty segment with parameters + else if (uriPath.contains("/;")) + reject = "empty segment with parameters"; // The `"\"` character encoded or not. else if (path.contains("\\")) reject = "backslash character"; diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 401d84230..5b3cf00cf 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1366,14 +1366,15 @@ The decoded path is used to map the request to a context and resource within the + If suspicious sequences are discovered during the prior processing steps, the request must be rejected with a 400 bad request rather than dispatched to the target servlet. If a context is matched then the error handling of the context may be used to generate the 400 response. By default, the set of suspicious sequences below, but may be configured differently by a container: - * Any path not starting with the `"/"` character - * Any path starting with an initial segment of `".."` - * The encoded `"/"` character - * Any `"."` or `".."` segment that had a path parameter - * Any `"."` or `".."` segment with any encoded characters - * Any `".."` segment preceded by an empty segment - * The `"\"` character encoded or not. - * Any control characters either encoded or not. + * Any path not starting with the `"/"` character (e.g. `path/info`) + * Any path starting with an initial segment of `".."` (e.g. `/../path/info`) + * The encoded `"/"` character (e.g. `/path%2Finfo`) + * Any `"."` or `".."` segment that had a path parameter (e.g. `/path/..;/info`) + * Any `"."` or `".."` segment with any encoded characters (e.g. `/path/%2e%2e/info`) + * Any `".."` segment preceded by an empty segment (e.g. `/path//../info`) + * Any empty segment with parameters (e.g. `/path/;param/info` ) + * The `"\"` character encoded or not. (e.g. `/path\info`) + * Any control characters either encoded or not. (e.g. `/path%00/info`) ==== Example URIs > TODO more work needed on this section. Perhaps some parts need to be moved to other sections. @@ -1387,16 +1388,18 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/public/file.txt` | `/public/file.txt` | | `/public/dir/` | `/public/dir/` | | `/public//file.txt` | `/public/file.txt` | +| `/public/;/file.txt` | `/public/file.txt` | 400 empty segment with parameters | `/PUBLIC/file.txt` | `/PUBLIC/file.txt` | | `/public%2Ffile.txt` | `/public/file.txt` | 400 encoded / | `/public%5Cfile.txt` | `/public\file.txt` | 400 backslash character | `/public%00/file.txt` | `/public /file.txt` | 400 control character -| `/public/.;/file.txt` | `/public/file.txt` | 400 dot segment with parameter +| `/public/./file.txt` | `/public/file.txt` | | `/public/%2e/file.txt` | `/public/file.txt` | 400 encoded dot segment | `/public/%2e;/file.txt` | `/public/file.txt` | 400 encoded dot segment | `/../docroot/public/file.txt` | `/../docroot/public/file.txt` | 400 leading dot-dot-segment | `/public/dir/../file.txt` | `/public/file.txt` | | `/public//../file.txt` | `/file.txt` | +| `/public/;param/../file.txt` | `/file.txt` | 400 empty segment with parameters | `/public/dir/..;/file.txt` | `/public/file.txt` | 400 dot segment with parameter | `/public/dir/%2e%2E/file.txt` | `/public/file.txt` | 400 encoded dot segment | `/public/dir/.%2e/file.txt` | `/public/file.txt` | 400 encoded dot segment From 049dc01cca2f2674e60213e706fee48f20a37272 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sat, 9 Oct 2021 09:38:27 +1100 Subject: [PATCH 20/42] lame attempt at better verbage. --- .../jakarta/servlet/http/HttpServletRequest.java | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java b/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java index 0bdc775fb..441083b0b 100644 --- a/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java +++ b/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java @@ -412,14 +412,20 @@ default public PushBuilder newPushBuilder() { /** * Returns the part of this request's URL that calls the servlet. This path starts with a "/" character and includes - * either the servlet name or a path to the servlet, but does not include any extra path information or a query string. + * the path to the servlet, but does not include any extra path information or a query string. * *

* This method will return an empty string ("") if the servlet used to process this request was matched using the "/*" * pattern. * - * @return a String containing the name or path of the servlet being called, as specified in the request - * URL, decoded, or an empty string if the servlet used to process the request is matched using the "/*" pattern. + * @return a String containing the path of the servlet being called, as specified in the request + * URL, or an empty string if the servlet used to process the request is matched using the "/*" pattern. + * The path will be canonicalized as per section 3.5 of the specification. This method will not return any encoded + * characters unless the container is configured specifically to allow them. + * @throws IllegalArgumentException In standard configuration, this method will never throw. However, a container + * may be configured to not reject some suspicious sequences identified by 3.5.2, furthermore the container may be + * configured to allow such paths to only be accessed via safer methods like {@link #getRequestURI()} and to throw + * IllegalArgumentException if this method is called for such suspicious paths. */ public String getServletPath(); From 83817858cd1bd5f9d7c36f3b0876c5354beb0157 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sat, 9 Oct 2021 10:13:17 +1100 Subject: [PATCH 21/42] format --- .../servlet/http/HttpServletRequest.java | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java b/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java index 441083b0b..8c0b17857 100644 --- a/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java +++ b/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java @@ -411,20 +411,20 @@ default public PushBuilder newPushBuilder() { public StringBuffer getRequestURL(); /** - * Returns the part of this request's URL that calls the servlet. This path starts with a "/" character and includes - * the path to the servlet, but does not include any extra path information or a query string. + * Returns the part of this request's URL that calls the servlet. This path starts with a "/" character and includes the + * path to the servlet, but does not include any extra path information or a query string. * *

* This method will return an empty string ("") if the servlet used to process this request was matched using the "/*" * pattern. * - * @return a String containing the path of the servlet being called, as specified in the request - * URL, or an empty string if the servlet used to process the request is matched using the "/*" pattern. - * The path will be canonicalized as per section 3.5 of the specification. This method will not return any encoded - * characters unless the container is configured specifically to allow them. - * @throws IllegalArgumentException In standard configuration, this method will never throw. However, a container - * may be configured to not reject some suspicious sequences identified by 3.5.2, furthermore the container may be - * configured to allow such paths to only be accessed via safer methods like {@link #getRequestURI()} and to throw + * @return a String containing the path of the servlet being called, as specified in the request URL, or an + * empty string if the servlet used to process the request is matched using the "/*" pattern. The path will be + * canonicalized as per section 3.5 of the specification. This method will not return any encoded characters unless the + * container is configured specifically to allow them. + * @throws IllegalArgumentException In standard configuration, this method will never throw. However, a container may be + * configured to not reject some suspicious sequences identified by 3.5.2, furthermore the container may be configured + * to allow such paths to only be accessed via safer methods like {@link #getRequestURI()} and to throw * IllegalArgumentException if this method is called for such suspicious paths. */ public String getServletPath(); From 35a24f2b99354b13b287e415add43f37eba82575 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sun, 10 Oct 2021 16:50:26 +1100 Subject: [PATCH 22/42] + simplified code + simplified examples + more edge cases --- .../servlet/http/CanonicalUriPathTest.java | 292 ++++++++++-------- spec/src/main/asciidoc/servlet-spec-body.adoc | 118 +++---- 2 files changed, 223 insertions(+), 187 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 243bd4b07..2579e7e10 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -4,18 +4,38 @@ import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.Arrays; +import java.util.Collections; import java.util.List; import java.util.ListIterator; +import java.util.Set; import org.junit.jupiter.params.ParameterizedTest; import org.junit.jupiter.params.provider.ValueSource; public class CanonicalUriPathTest { + private static final Set ENCODED_DOT_SEGMENT = Collections.unmodifiableSet(Set.of( + "%2e", + "%2E", + "%2e%2e", + "%2e%2E", + "%2E%2e", + "%2E%2E", + ".%2e", + ".%2E", + "%2e.", + "%2E.")); public static String canonicalUriPath(String uriPath) { if (uriPath == null) throw new IllegalArgumentException("null path"); + // Remember start/end conditions String path = uriPath; + boolean startsWithSlash = path.startsWith("/"); + boolean endsWithSlash = path.endsWith("/"); + boolean dotSegmentWithParam; + boolean encodedDotSegment; + boolean emptySegmentBeforeDotDot = false; + boolean decodeError = false; // Discard fragment. if (path.contains("#")) @@ -25,74 +45,37 @@ public static String canonicalUriPath(String uriPath) { if (path.contains("?")) path = path.substring(0, path.indexOf('?')); - // Remember start/end conditions - boolean startsWithSlash = path.startsWith("/"); - boolean endsWithSlash = path.endsWith("/"); - boolean dotSegmentWithParam = false; - boolean encodedDotSegment = false; - boolean emptySegmentBeforeDotDot = false; - // Split path into segments. - List segments = new ArrayList<>(Arrays.asList(path.substring(1).split("/"))); + List segments = new ArrayList<>(Arrays.asList(path.substring(startsWithSlash ? 1 : 0).split("/"))); // Remove path parameters. - for (ListIterator s = segments.listIterator(); s.hasNext();) { - String segment = s.next(); - if (segment.contains(";")) { - segment = segment.substring(0, segment.indexOf(';')); - s.set(segment); - - dotSegmentWithParam |= (".".equals(segment) || "..".equals(segment)); - } - } + dotSegmentWithParam = segments.stream().anyMatch(s -> s.startsWith(".;") || s.startsWith("..;")); + segments.replaceAll(s -> (s.contains(";")) ? s.substring(0, s.indexOf(';')) : s); // Decode characters - for (ListIterator s = segments.listIterator(); s.hasNext();) { - String segment = s.next(); - if (segment.contains("%")) { - StringBuilder buf = new StringBuilder(); - ByteArrayOutputStream utf8 = new ByteArrayOutputStream(); - for (int i = 0; i < segment.length(); i++) { - char c = segment.charAt(i); - if (c == '%') { - utf8.write(Integer.parseInt(segment.substring(i + 1, i + 3), 16)); - i += 2; - } else { - if (utf8.size() > 0) { - buf.append(new String(utf8.toByteArray(), StandardCharsets.UTF_8)); - utf8.reset(); - } - - buf.append(c); - } - } - if (utf8.size() > 0) { - buf.append(new String(utf8.toByteArray(), StandardCharsets.UTF_8)); - utf8.reset(); - } - segment = buf.toString(); - s.set(segment); - encodedDotSegment |= (".".equals(segment) || "..".equals(segment)); - } + encodedDotSegment = segments.stream().anyMatch(ENCODED_DOT_SEGMENT::contains); + try { + segments.replaceAll(CanonicalUriPathTest::decode); + } catch (Exception e) { + decodeError = true; } // Remove Empty Segments - segments.removeIf(s1 -> s1.length() == 0); + segments.removeIf(s -> s.length() == 0); // Remove dot-segments int count = 0; for (ListIterator s = segments.listIterator(); s.hasNext();) { String segment = s.next(); - if (segment.equals(".")) { s.remove(); } else if (segment.equals("..")) { if (count > 0) { s.remove(); - String previous = s.previous(); + String prev = s.previous(); s.remove(); count--; - emptySegmentBeforeDotDot |= previous.length() == 0; + emptySegmentBeforeDotDot |= prev.length() == 0; } } else { count++; @@ -101,118 +84,159 @@ public static String canonicalUriPath(String uriPath) { // Concatenate segments StringBuilder buf = new StringBuilder(); - - for (ListIterator s = segments.listIterator(); s.hasNext();) { - String segment = s.next(); - if (s.hasPrevious() || startsWithSlash) - buf.append("/"); - buf.append(segment); - } + segments.forEach(s -> buf.append("/").append(s)); if (endsWithSlash) buf.append("/"); path = buf.toString(); - // Rejecting Suspicious Sequences - String reject = null; + // Rejecting Errors and Suspicious Sequences + String reject = ""; + if (decodeError) + reject += "decode error; "; // Any path not starting with the `"/"` character if (!startsWithSlash) - reject = "must start with /"; + reject += "must start with /; "; // Any path starting with an initial segment of `".."` - else if (segments.get(0).equals("..")) - reject = "leading dot-dot-segment"; + if (!segments.isEmpty() && segments.get(0).equals("..")) + reject += "leading dot-dot-segment; "; // The encoded `"/"` character - else if (uriPath.contains("%2f") || uriPath.contains("%2F")) - reject = "encoded /"; + if (uriPath.contains("%2f") || uriPath.contains("%2F")) + reject += "encoded /; "; // Any `"."` or `".."` segment that had a path parameter - else if (dotSegmentWithParam) - reject = "dot segment with parameter"; + if (dotSegmentWithParam) + reject += "dot segment with parameter; "; // Any `"."` or `".."` segment with any encoded characters - else if (encodedDotSegment) - reject = "encoded dot segment"; + if (encodedDotSegment) + reject += "encoded dot segment; "; // Any `".."` segment preceded by an empty segment - else if (emptySegmentBeforeDotDot) - reject = "empty segment before dot dot"; + if (emptySegmentBeforeDotDot) + reject += "empty segment before dot dot; "; // Any empty segment with parameters - else if (uriPath.contains("/;")) - reject = "empty segment with parameters"; + if (uriPath.contains("/;")) + reject += "empty segment with parameters; "; // The `"\"` character encoded or not. - else if (path.contains("\\")) - reject = "backslash character"; + if (path.contains("\\")) + reject += "backslash character; "; // Any control characters either encoded or not. - else { - for (char c : path.toCharArray()) { - if (c < 0x20 || c == 0x7f) { - reject = "control character"; - break; - } + + for (char c : path.toCharArray()) { + if (c < 0x20 || c == 0x7f) { + reject = "control character; "; + break; } } System.err.printf("| `%s` | `%s` | %s %n", uriPath, path, - reject == null ? "" : ("400 " + reject)); + reject.length() == 0 ? "" : ("400 " + reject.substring(0, reject.length() - 2))); return path; } + private static String decode(String segment) { + if (segment.contains("%")) { + StringBuilder buf = new StringBuilder(); + ByteArrayOutputStream utf8 = new ByteArrayOutputStream(); + for (int i = 0; i < segment.length(); i++) { + char c = segment.charAt(i); + if (c == '%') { + int b = Integer.parseInt(segment.substring(i + 1, i + 3), 16); + if (b < 0) + throw new IllegalArgumentException("negative encoding"); + utf8.write(b); + i += 2; + } else { + if (utf8.size() > 0) { + buf.append(new String(utf8.toByteArray(), StandardCharsets.UTF_8)); + utf8.reset(); + } + + buf.append(c); + } + } + if (utf8.size() > 0) { + buf.append(new String(utf8.toByteArray(), StandardCharsets.UTF_8)); + utf8.reset(); + } + segment = buf.toString(); + } + return segment; + } + @ParameterizedTest @ValueSource(strings = { - "/public/file.txt", - "/public/dir/", - "/public//file.txt", - "/public/;/file.txt", - "/PUBLIC/file.txt", - "/public%2Ffile.txt", - "/public%5Cfile.txt", - "/public%00/file.txt", - "/public/./file.txt", - "/public/.;/file.txt", - "/public/%2e/file.txt", - "/public/%2e;/file.txt", - "/../docroot/public/file.txt", - "/public/dir/../file.txt", - "/public//../file.txt", - "/public/;param/../file.txt", - "/public/dir/..;/file.txt", - "/public/dir/%2e%2E/file.txt", - "/public/dir/.%2e/file.txt", - "/public/dir/%2E./file.txt", - "/public/dir/%2e%2e;/file.txt", - "/WEB-INF/web.xml", - "/web-inf/web.xml", - "/WEB-IN~1.DIR/web.xml", - "/WEB-INF;/web.xml", - "/WEB-INF%2Fweb.xml", - "/WEB-INF%5Cweb.xml", - "/WEB-INF%00/web.xml", - "/WEB-INF/./web.xml", - "/public/../WEB-INF/web.xml", - "/public/..;/WEB-INF/web.xml", - "/public/%2e%2e/WEB-INF/web.xml", - "/public/%2e%2e;/WEB-INF/web.xml", - "/secret/private.xml", - "/SeCreT/private.xml", - "/SECRET~1.DIR/private.xml", - "/secret;/private.xml", - "/secret%2Fprivate.xml", - "/secret%5Cprivate.xml", - "/secret%00/private.xml", - "/./secret/private.xml", - "/.;/secret/private.xml", - "/%2e/secret/private.xml", - "/%2e;/secret/private.xml", - "/public/../secret/private.xml", - "/public/..;/secret/private.xml", - "/public/%2e%2e/secret/private.xml", - "/public/%2e%2e;/secret/private.xml", - "/dispatch/public/file.txt", - "/dispatch/public%2Ffile.txt", - "/dispatch/public%5Cfile.txt", - "/dispatch/public%252Ffile.txt", - "/dispatch/WEB-INF/web.xml", - "/dispatch/secret/private.xml", - "/dispatch/%2E%2E/%2E%2E/etc/password", + "foo/bar", + "/foo/bar", + "/foo/bar/", + "/foo;/bar;", + "/foo;/bar;/;", + "/foo%00/bar/", + "/foo%7F/bar/", + "/foo%2Fbar", + "/foo\\bar", + "/foo%5Cbar", + "/foo;%2E/bar", + "/foo;%2F/bar", + + "/foo/./bar", + "/foo/././bar", + "/./foo/bar", + "/foo/%2e/bar", + "/foo/.;/bar", + "/foo/%2e;/bar", + "/foo/.%2Fbar", + "/foo/.%5Cbar", + "/foo/bar/.", + "/foo/bar/./", + "/foo/bar/.;", + "/foo/bar/./;", + "/foo/.bar", + + "/foo/../bar", + "/foo/../../bar", + "/../foo/bar", + "/foo/%2e%2E/bar", + "/foo/%2e%2e/%2E%2E/bar", + "/foo/..;/bar", + "/foo/%2e%2E;/bar", + "/foo/..%2Fbar", + "/foo/..%5Cbar", + "/foo/bar/..", + "/foo/bar/../", + "/foo/bar/..;", + "/foo/bar/../;", + "/foo/..bar", + + "/foo/.../bar", + + "/foo//bar", + "//foo//bar//", + "/;/foo;/;/bar/;/;", + "/foo//../bar", + "/foo/;/../bar", + + "/foo%E2%82%ACbar", + "/foo%20bar", + "/foo%-1bar", + "/foo%XX/bar", + "/foo%/bar", + "/foo/bar%0", + + "/", + "//", + "/;/", + "/.", + "/..", + "/./", + "/../", + + "foo/bar/", + "./foo/bar/", + "%2e/foo/bar/", + "../foo/bar/", + ".%2e/foo/bar/", + ";/foo/bar/", }) public void testCanonicalUriPath(String path) { diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 5b3cf00cf..06ce1a81e 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1385,59 +1385,71 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` |=== | Encoded URI path | Decoded Path | Rejected -| `/public/file.txt` | `/public/file.txt` | -| `/public/dir/` | `/public/dir/` | -| `/public//file.txt` | `/public/file.txt` | -| `/public/;/file.txt` | `/public/file.txt` | 400 empty segment with parameters -| `/PUBLIC/file.txt` | `/PUBLIC/file.txt` | -| `/public%2Ffile.txt` | `/public/file.txt` | 400 encoded / -| `/public%5Cfile.txt` | `/public\file.txt` | 400 backslash character -| `/public%00/file.txt` | `/public /file.txt` | 400 control character -| `/public/./file.txt` | `/public/file.txt` | -| `/public/%2e/file.txt` | `/public/file.txt` | 400 encoded dot segment -| `/public/%2e;/file.txt` | `/public/file.txt` | 400 encoded dot segment -| `/../docroot/public/file.txt` | `/../docroot/public/file.txt` | 400 leading dot-dot-segment -| `/public/dir/../file.txt` | `/public/file.txt` | -| `/public//../file.txt` | `/file.txt` | -| `/public/;param/../file.txt` | `/file.txt` | 400 empty segment with parameters -| `/public/dir/..;/file.txt` | `/public/file.txt` | 400 dot segment with parameter -| `/public/dir/%2e%2E/file.txt` | `/public/file.txt` | 400 encoded dot segment -| `/public/dir/.%2e/file.txt` | `/public/file.txt` | 400 encoded dot segment -| `/public/dir/%2E./file.txt` | `/public/file.txt` | 400 encoded dot segment -| `/public/dir/%2e%2e;/file.txt` | `/public/file.txt` | 400 encoded dot segment -| `/WEB-INF/web.xml` | `/WEB-INF/web.xml` | -| `/web-inf/web.xml` | `/web-inf/web.xml` | -| `/WEB-IN~1.DIR/web.xml` | `/WEB-IN~1.DIR/web.xml` | -| `/WEB-INF;/web.xml` | `/WEB-INF/web.xml` | -| `/WEB-INF%2Fweb.xml` | `/WEB-INF/web.xml` | 400 encoded / -| `/WEB-INF%5Cweb.xml` | `/WEB-INF\web.xml` | 400 backslash character -| `/WEB-INF%00/web.xml` | `/WEB-INF /web.xml` | 400 control character -| `/WEB-INF/./web.xml` | `/WEB-INF/web.xml` | -| `/public/../WEB-INF/web.xml` | `/WEB-INF/web.xml` | -| `/public/..;/WEB-INF/web.xml` | `/WEB-INF/web.xml` | 400 dot segment with parameter -| `/public/%2e%2e/WEB-INF/web.xml` | `/WEB-INF/web.xml` | 400 encoded dot segment -| `/public/%2e%2e;/WEB-INF/web.xml` | `/WEB-INF/web.xml` | 400 encoded dot segment -| `/secret/private.xml` | `/secret/private.xml` | -| `/SeCreT/private.xml` | `/SeCreT/private.xml` | -| `/SECRET~1.DIR/private.xml` | `/SECRET~1.DIR/private.xml` | -| `/secret;/private.xml` | `/secret/private.xml` | -| `/secret%2Fprivate.xml` | `/secret/private.xml` | 400 encoded / -| `/secret%5Cprivate.xml` | `/secret\private.xml` | 400 backslash character -| `/secret%00/private.xml` | `/secret /private.xml` | 400 control character -| `/./secret/private.xml` | `/secret/private.xml` | -| `/.;/secret/private.xml` | `/secret/private.xml` | 400 dot segment with parameter -| `/%2e/secret/private.xml` | `/secret/private.xml` | 400 encoded dot segment -| `/%2e;/secret/private.xml` | `/secret/private.xml` | 400 encoded dot segment -| `/public/../secret/private.xml` | `/secret/private.xml` | -| `/public/..;/secret/private.xml` | `/secret/private.xml` | 400 dot segment with parameter -| `/public/%2e%2e/secret/private.xml` | `/secret/private.xml` | 400 encoded dot segment -| `/public/%2e%2e;/secret/private.xml` | `/secret/private.xml` | 400 encoded dot segment -| `/dispatch/public/file.txt` | `/dispatch/public/file.txt` | -| `/dispatch/public%2Ffile.txt` | `/dispatch/public/file.txt` | 400 encoded / -| `/dispatch/public%5Cfile.txt` | `/dispatch/public\file.txt` | 400 backslash character -| `/dispatch/public%252Ffile.txt` | `/dispatch/public%2Ffile.txt` | -| `/dispatch/WEB-INF/web.xml` | `/dispatch/WEB-INF/web.xml` | -| `/dispatch/secret/private.xml` | `/dispatch/secret/private.xml` | +| `foo/bar` | `/foo/bar` | 400 must start with / +| `/foo/bar` | `/foo/bar` | +| `/foo/bar/` | `/foo/bar/` | +| `/foo;/bar;` | `/foo/bar` | +| `/foo;/bar;/;` | `/foo/bar` | 400 empty segment with parameters +| `/foo%00/bar/` | `/foo/bar/` | 400 control character +| `/foo%7F/bar/` | `/foo/bar/` | 400 control character +| `/foo%2Fbar` | `/foo/bar` | 400 encoded / +| `/foo\bar` | `/foo\bar` | 400 backslash character +| `/foo%5Cbar` | `/foo\bar` | 400 backslash character +| `/foo;%2E/bar` | `/foo/bar` | +| `/foo;%2F/bar` | `/foo/bar` | 400 encoded / +| `/foo/./bar` | `/foo/bar` | +| `/foo/././bar` | `/foo/bar` | +| `/./foo/bar` | `/foo/bar` | +| `/foo/%2e/bar` | `/foo/bar` | 400 encoded dot segment +| `/foo/.;/bar` | `/foo/bar` | 400 dot segment with parameter +| `/foo/%2e;/bar` | `/foo/bar` | 400 encoded dot segment +| `/foo/.%2Fbar` | `/foo/./bar` | 400 encoded / +| `/foo/.%5Cbar` | `/foo/.\bar` | 400 backslash character +| `/foo/bar/.` | `/foo/bar` | +| `/foo/bar/./` | `/foo/bar/` | +| `/foo/bar/.;` | `/foo/bar` | 400 dot segment with parameter +| `/foo/bar/./;` | `/foo/bar` | 400 empty segment with parameters +| `/foo/.bar` | `/foo/.bar` | +| `/foo/../bar` | `/bar` | +| `/foo/../../bar` | `/../bar` | 400 leading dot-dot-segment +| `/../foo/bar` | `/../foo/bar` | 400 leading dot-dot-segment +| `/foo/%2e%2E/bar` | `/bar` | 400 encoded dot segment +| `/foo/%2e%2e/%2E%2E/bar` | `/../bar` | 400 leading dot-dot-segment; encoded dot segment +| `/foo/..;/bar` | `/bar` | 400 dot segment with parameter +| `/foo/%2e%2E;/bar` | `/bar` | 400 encoded dot segment +| `/foo/..%2Fbar` | `/foo/../bar` | 400 encoded / +| `/foo/..%5Cbar` | `/foo/..\bar` | 400 backslash character +| `/foo/bar/..` | `/foo` | +| `/foo/bar/../` | `/foo/` | +| `/foo/bar/..;` | `/foo` | 400 dot segment with parameter +| `/foo/bar/../;` | `/foo` | 400 empty segment with parameters +| `/foo/..bar` | `/foo/..bar` | +| `/foo/.../bar` | `/foo/.../bar` | +| `/foo//bar` | `/foo/bar` | +| `//foo//bar//` | `/foo/bar/` | +| `/;/foo;/;/bar/;/;` | `/foo/bar` | 400 empty segment with parameters +| `/foo//../bar` | `/bar` | +| `/foo/;/../bar` | `/bar` | 400 empty segment with parameters +| `/foo%E2%82%ACbar` | `/foo€bar` | +| `/foo%20bar` | `/foo bar` | +| `/foo%-1bar` | `/foo%-1bar` | 400 decode error +| `/foo%XX/bar` | `/foo%XX/bar` | 400 decode error +| `/foo%/bar` | `/foo%/bar` | 400 decode error +| `/foo/bar%0` | `/foo/bar%0` | 400 decode error +| `/` | `/` | +| `//` | `/` | +| `/;/` | `/` | 400 empty segment with parameters +| `/.` | `` | +| `/..` | `/..` | 400 leading dot-dot-segment +| `/./` | `/` | +| `/../` | `/../` | 400 leading dot-dot-segment +| `foo/bar/` | `/foo/bar/` | 400 must start with / +| `./foo/bar/` | `/foo/bar/` | 400 must start with / +| `%2e/foo/bar/` | `/foo/bar/` | 400 must start with /; encoded dot segment +| `../foo/bar/` | `/../foo/bar/` | 400 must start with /; leading dot-dot-segment +| `.%2e/foo/bar/` | `/../foo/bar/` | 400 must start with /; leading dot-dot-segment; encoded dot segment +| `;/foo/bar/` | `/foo/bar/` | 400 must start with / + |=== From 4bfdd07c6039f169de31a3468824eb63354d83cb Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sun, 10 Oct 2021 17:16:07 +1100 Subject: [PATCH 23/42] handle /. --- .../servlet/http/CanonicalUriPathTest.java | 18 ++++++++++++++++-- spec/src/main/asciidoc/servlet-spec-body.adoc | 16 +++++++++++++--- 2 files changed, 29 insertions(+), 5 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 2579e7e10..2081ab744 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -34,6 +34,7 @@ public static String canonicalUriPath(String uriPath) { boolean endsWithSlash = path.endsWith("/"); boolean dotSegmentWithParam; boolean encodedDotSegment; + boolean emptySegmentWithParam; boolean emptySegmentBeforeDotDot = false; boolean decodeError = false; @@ -49,6 +50,7 @@ public static String canonicalUriPath(String uriPath) { List segments = new ArrayList<>(Arrays.asList(path.substring(startsWithSlash ? 1 : 0).split("/"))); // Remove path parameters. + emptySegmentWithParam = segments.stream().anyMatch(s -> s.startsWith(";")); dotSegmentWithParam = segments.stream().anyMatch(s -> s.startsWith(".;") || s.startsWith("..;")); segments.replaceAll(s -> (s.contains(";")) ? s.substring(0, s.indexOf(';')) : s); @@ -85,7 +87,7 @@ public static String canonicalUriPath(String uriPath) { // Concatenate segments StringBuilder buf = new StringBuilder(); segments.forEach(s -> buf.append("/").append(s)); - if (endsWithSlash) + if (endsWithSlash || segments.size() == 0) buf.append("/"); path = buf.toString(); @@ -112,7 +114,7 @@ public static String canonicalUriPath(String uriPath) { if (emptySegmentBeforeDotDot) reject += "empty segment before dot dot; "; // Any empty segment with parameters - if (uriPath.contains("/;")) + if (emptySegmentWithParam) reject += "empty segment with parameters; "; // The `"\"` character encoded or not. if (path.contains("\\")) @@ -168,6 +170,7 @@ private static String decode(String segment) { @ValueSource(strings = { "foo/bar", "/foo/bar", + "/foo/bar/", "/foo;/bar;", "/foo;/bar;/;", @@ -198,6 +201,7 @@ private static String decode(String segment) { "/../foo/bar", "/foo/%2e%2E/bar", "/foo/%2e%2e/%2E%2E/bar", + "/foo/./../bar", "/foo/..;/bar", "/foo/%2e%2E;/bar", "/foo/..%2Fbar", @@ -223,6 +227,16 @@ private static String decode(String segment) { "/foo%/bar", "/foo/bar%0", + "/foo/bar?q", + "/foo/bar#f", + "/foo/bar?q#f", + "/foo/bar/?q", + "/foo/bar/#f", + "/foo/bar/?q#f", + "/foo/bar;?q", + "/foo/bar;#f", + "/foo/bar;?q#f", + "/", "//", "/;/", diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 06ce1a81e..f0c64e4ad 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1355,7 +1355,7 @@ All segments that are exactly `"."` are removed from the segment series. Segment . **Concatenate segments.** + -The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If the original path after stripping query and fragment ends with a `"/"` then append a final `"/"` character. +The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If the original path after stripping query and fragment ends with a `"/"` or all segmentes have been removed, then append a final `"/"` character. If a segment contains the `"/"` or `"%"` characters, and the container is configured to not reject the request, then the container should re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. . **Mapping URI to context and resource.** @@ -1415,6 +1415,7 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/../foo/bar` | `/../foo/bar` | 400 leading dot-dot-segment | `/foo/%2e%2E/bar` | `/bar` | 400 encoded dot segment | `/foo/%2e%2e/%2E%2E/bar` | `/../bar` | 400 leading dot-dot-segment; encoded dot segment +| `/foo/./../bar` | `/bar` | | `/foo/..;/bar` | `/bar` | 400 dot segment with parameter | `/foo/%2e%2E;/bar` | `/bar` | 400 encoded dot segment | `/foo/..%2Fbar` | `/foo/../bar` | 400 encoded / @@ -1436,10 +1437,19 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/foo%XX/bar` | `/foo%XX/bar` | 400 decode error | `/foo%/bar` | `/foo%/bar` | 400 decode error | `/foo/bar%0` | `/foo/bar%0` | 400 decode error +| `/foo/bar?q` | `/foo/bar` | +| `/foo/bar#f` | `/foo/bar` | +| `/foo/bar?q#f` | `/foo/bar` | +| `/foo/bar/?q` | `/foo/bar` | +| `/foo/bar/#f` | `/foo/bar` | +| `/foo/bar/?q#f` | `/foo/bar` | +| `/foo/bar;?q` | `/foo/bar` | +| `/foo/bar;#f` | `/foo/bar` | +| `/foo/bar;?q#f` | `/foo/bar` | | `/` | `/` | | `//` | `/` | | `/;/` | `/` | 400 empty segment with parameters -| `/.` | `` | +| `/.` | `/` | | `/..` | `/..` | 400 leading dot-dot-segment | `/./` | `/` | | `/../` | `/../` | 400 leading dot-dot-segment @@ -1448,7 +1458,7 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `%2e/foo/bar/` | `/foo/bar/` | 400 must start with /; encoded dot segment | `../foo/bar/` | `/../foo/bar/` | 400 must start with /; leading dot-dot-segment | `.%2e/foo/bar/` | `/../foo/bar/` | 400 must start with /; leading dot-dot-segment; encoded dot segment -| `;/foo/bar/` | `/foo/bar/` | 400 must start with / +| `;/foo/bar/` | `/foo/bar/` | 400 must start with /; empty segment with parameters |=== From d47553232cb947ee4d34f02b4db7a2eeba6c5ef5 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sun, 10 Oct 2021 17:42:10 +1100 Subject: [PATCH 24/42] handle % and UTF-8 decoding errors --- .../servlet/http/CanonicalUriPathTest.java | 18 ++++++++++++++++-- spec/src/main/asciidoc/servlet-spec-body.adoc | 4 ++++ 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 2081ab744..6c3f37a78 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -1,6 +1,10 @@ package jakarta.servlet.http; import java.io.ByteArrayOutputStream; +import java.nio.ByteBuffer; +import java.nio.CharBuffer; +import java.nio.charset.CharacterCodingException; +import java.nio.charset.CodingErrorAction; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.Arrays; @@ -150,7 +154,7 @@ private static String decode(String segment) { i += 2; } else { if (utf8.size() > 0) { - buf.append(new String(utf8.toByteArray(), StandardCharsets.UTF_8)); + buf.append(fromUtf8(utf8.toByteArray())); utf8.reset(); } @@ -158,7 +162,7 @@ private static String decode(String segment) { } } if (utf8.size() > 0) { - buf.append(new String(utf8.toByteArray(), StandardCharsets.UTF_8)); + buf.append(fromUtf8(utf8.toByteArray())); utf8.reset(); } segment = buf.toString(); @@ -166,6 +170,14 @@ private static String decode(String segment) { return segment; } + private static CharBuffer fromUtf8(byte[] bytes) { + try { + return StandardCharsets.UTF_8.newDecoder().onMalformedInput(CodingErrorAction.REPORT).decode(ByteBuffer.wrap(bytes)); + } catch (CharacterCodingException e) { + throw new IllegalArgumentException(e); + } + } + @ParameterizedTest @ValueSource(strings = { "foo/bar", @@ -222,6 +234,8 @@ private static String decode(String segment) { "/foo%E2%82%ACbar", "/foo%20bar", + "/foo%E2%82", + "/foo%E2%82%AC/%E2%82%ACfoo%E2%820bar", "/foo%-1bar", "/foo%XX/bar", "/foo%/bar", diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index f0c64e4ad..335cdb87e 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1375,6 +1375,8 @@ If suspicious sequences are discovered during the prior processing steps, the re * Any empty segment with parameters (e.g. `/path/;param/info` ) * The `"\"` character encoded or not. (e.g. `/path\info`) * Any control characters either encoded or not. (e.g. `/path%00/info`) + * Any illegal hex sequences following a % character + * Any illegal UTF-8 code sequences. ==== Example URIs > TODO more work needed on this section. Perhaps some parts need to be moved to other sections. @@ -1433,6 +1435,8 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/foo/;/../bar` | `/bar` | 400 empty segment with parameters | `/foo%E2%82%ACbar` | `/foo€bar` | | `/foo%20bar` | `/foo bar` | +| `/foo%E2%82` | `/foo%E2%82` | 400 decode error +| `/foo%E2%82%AC/%E2%82%ACfoo%E2%820bar` | `/foo€/%E2%82%ACfoo%E2%820bar` | 400 decode error | `/foo%-1bar` | `/foo%-1bar` | 400 decode error | `/foo%XX/bar` | `/foo%XX/bar` | 400 decode error | `/foo%/bar` | `/foo%/bar` | 400 decode error From 2b4d50ca942c528ed9a4e0acaa026a872109bc54 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Sun, 10 Oct 2021 18:25:16 +1100 Subject: [PATCH 25/42] better handling of rejections java 8 --- .../servlet/http/CanonicalUriPathTest.java | 68 ++++++++++--------- spec/src/main/asciidoc/servlet-spec-body.adoc | 13 ++-- 2 files changed, 43 insertions(+), 38 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 6c3f37a78..40b1a2925 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -12,23 +12,24 @@ import java.util.List; import java.util.ListIterator; import java.util.Set; +import java.util.TreeMap; +import java.util.function.Consumer; import org.junit.jupiter.params.ParameterizedTest; import org.junit.jupiter.params.provider.ValueSource; public class CanonicalUriPathTest { - private static final Set ENCODED_DOT_SEGMENT = Collections.unmodifiableSet(Set.of( - "%2e", - "%2E", - "%2e%2e", - "%2e%2E", - "%2E%2e", - "%2E%2E", - ".%2e", - ".%2E", - "%2e.", - "%2E.")); - public static String canonicalUriPath(String uriPath) { + private static final Set ENCODED_DOT_SEGMENT; + static { + Set set = Collections.newSetFromMap(new TreeMap<>(String.CASE_INSENSITIVE_ORDER)); + set.add("%2e"); + set.add("%2e%2e"); + set.add("%2e."); + set.add(".%2e"); + ENCODED_DOT_SEGMENT = Collections.unmodifiableSet(set); + } + + public static String canonicalUriPath(String uriPath, Consumer rejection) { if (uriPath == null) throw new IllegalArgumentException("null path"); @@ -96,47 +97,40 @@ public static String canonicalUriPath(String uriPath) { path = buf.toString(); // Rejecting Errors and Suspicious Sequences - String reject = ""; if (decodeError) - reject += "decode error; "; + rejection.accept("decode error"); // Any path not starting with the `"/"` character if (!startsWithSlash) - reject += "must start with /; "; + rejection.accept("must start with /"); // Any path starting with an initial segment of `".."` if (!segments.isEmpty() && segments.get(0).equals("..")) - reject += "leading dot-dot-segment; "; + rejection.accept("leading dot-dot-segment"); // The encoded `"/"` character if (uriPath.contains("%2f") || uriPath.contains("%2F")) - reject += "encoded /; "; + rejection.accept("encoded /"); // Any `"."` or `".."` segment that had a path parameter if (dotSegmentWithParam) - reject += "dot segment with parameter; "; + rejection.accept("dot segment with parameter"); // Any `"."` or `".."` segment with any encoded characters if (encodedDotSegment) - reject += "encoded dot segment; "; + rejection.accept("encoded dot segment"); // Any `".."` segment preceded by an empty segment if (emptySegmentBeforeDotDot) - reject += "empty segment before dot dot; "; + rejection.accept("empty segment before dot dot"); // Any empty segment with parameters if (emptySegmentWithParam) - reject += "empty segment with parameters; "; + rejection.accept("empty segment with parameters"); // The `"\"` character encoded or not. if (path.contains("\\")) - reject += "backslash character; "; + rejection.accept("backslash character"); // Any control characters either encoded or not. - for (char c : path.toCharArray()) { if (c < 0x20 || c == 0x7f) { - reject = "control character; "; + rejection.accept("control character"); break; } } - System.err.printf("| `%s` | `%s` | %s %n", - uriPath, - path, - reject.length() == 0 ? "" : ("400 " + reject.substring(0, reject.length() - 2))); - return path; } @@ -235,12 +229,12 @@ private static CharBuffer fromUtf8(byte[] bytes) { "/foo%E2%82%ACbar", "/foo%20bar", "/foo%E2%82", - "/foo%E2%82%AC/%E2%82%ACfoo%E2%820bar", + "/foo%E2%82bar", "/foo%-1bar", "/foo%XX/bar", "/foo%/bar", "/foo/bar%0", - + "/good%20/bad%/%20mix%", "/foo/bar?q", "/foo/bar#f", "/foo/bar?q#f", @@ -268,6 +262,16 @@ private static CharBuffer fromUtf8(byte[] bytes) { }) public void testCanonicalUriPath(String path) { - canonicalUriPath(path); + List rejections = new ArrayList<>(); + String canonical = canonicalUriPath(path, rejections::add); + + System.err.printf("| `%s` | `%s` | ", path, canonical); + if (!rejections.isEmpty()) { + for (int i = 0; i < rejections.size(); i++) { + System.err.print(i == 0 ? "400 " : " & "); + System.err.print(rejections.get(i)); + } + } + System.err.println(); } } diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 335cdb87e..d5a362cfa 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1416,7 +1416,7 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/foo/../../bar` | `/../bar` | 400 leading dot-dot-segment | `/../foo/bar` | `/../foo/bar` | 400 leading dot-dot-segment | `/foo/%2e%2E/bar` | `/bar` | 400 encoded dot segment -| `/foo/%2e%2e/%2E%2E/bar` | `/../bar` | 400 leading dot-dot-segment; encoded dot segment +| `/foo/%2e%2e/%2E%2E/bar` | `/../bar` | 400 leading dot-dot-segment & encoded dot segment | `/foo/./../bar` | `/bar` | | `/foo/..;/bar` | `/bar` | 400 dot segment with parameter | `/foo/%2e%2E;/bar` | `/bar` | 400 encoded dot segment @@ -1436,11 +1436,12 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/foo%E2%82%ACbar` | `/foo€bar` | | `/foo%20bar` | `/foo bar` | | `/foo%E2%82` | `/foo%E2%82` | 400 decode error -| `/foo%E2%82%AC/%E2%82%ACfoo%E2%820bar` | `/foo€/%E2%82%ACfoo%E2%820bar` | 400 decode error +| `/foo%E2%82bar` | `/foo%E2%82bar` | 400 decode error | `/foo%-1bar` | `/foo%-1bar` | 400 decode error | `/foo%XX/bar` | `/foo%XX/bar` | 400 decode error | `/foo%/bar` | `/foo%/bar` | 400 decode error | `/foo/bar%0` | `/foo/bar%0` | 400 decode error +| `/good%20/bad%/%20mix%` | `/good /bad%/%20mix%` | 400 decode error | `/foo/bar?q` | `/foo/bar` | | `/foo/bar#f` | `/foo/bar` | | `/foo/bar?q#f` | `/foo/bar` | @@ -1459,10 +1460,10 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/../` | `/../` | 400 leading dot-dot-segment | `foo/bar/` | `/foo/bar/` | 400 must start with / | `./foo/bar/` | `/foo/bar/` | 400 must start with / -| `%2e/foo/bar/` | `/foo/bar/` | 400 must start with /; encoded dot segment -| `../foo/bar/` | `/../foo/bar/` | 400 must start with /; leading dot-dot-segment -| `.%2e/foo/bar/` | `/../foo/bar/` | 400 must start with /; leading dot-dot-segment; encoded dot segment -| `;/foo/bar/` | `/foo/bar/` | 400 must start with /; empty segment with parameters +| `%2e/foo/bar/` | `/foo/bar/` | 400 must start with / & encoded dot segment +| `../foo/bar/` | `/../foo/bar/` | 400 must start with / & leading dot-dot-segment +| `.%2e/foo/bar/` | `/../foo/bar/` | 400 must start with / & leading dot-dot-segment & encoded dot segment +| `;/foo/bar/` | `/foo/bar/` | 400 must start with / & empty segment with parameters |=== From 7e56e060c2ecf1d85424d8dfeb13a09340f21f72 Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Tue, 12 Oct 2021 13:52:10 +0100 Subject: [PATCH 26/42] Drop reference to HTTP/0.9 --- spec/src/main/asciidoc/servlet-spec-body.adoc | 2 -- 1 file changed, 2 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index d5a362cfa..16aaff90d 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1309,8 +1309,6 @@ The path portion of the URI of an HTTP identifies the resource to be processed. The process described here adapts and extends the URI canonicalization process described in [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986) to create a standard Servlet URI path canonicalization process that ensures that URIs can be mapped to Servlets, Filters and security constraints in an unambiguous manner. It is also intended to provide information to reverse proxy implementations so they are aware of how requests they pass to servlet containers will be processed. ==== Obtaining the URI Path -HTTP/0.9:: The URI path is the `document address` as defined by the W3C [Original HTTP](https://www.w3.org/Protocols/HTTP/AsImplemented.html) document. - HTTP/1.0:: The URI path is extracted from the `Request-URI` in the `Request-Line` as defined by [RFC 1945](https://datatracker.ietf.org/doc/html/rfc1945#section-5.1). URIs in `abs_path` form are the URI path. URIs in `absoluteURI` have the protocol and authority removed to convert them to `origin-form` and thus obtain the URI path. HTTP/1.1:: The URI path is extracted from the `request-target` as defined by [RFC 7230](https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.1). URIs in `origin-form` are the URI path. URIs in `absolute-form` have the protocol and authority removed to convert them to `origin-form` and thus obtain the URI path. URIs in `authority-form` or `asterisk-form` are outside of the scope of this specification. From f77f904d8f6f3d0e575822acc5811ec39cf48936 Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Tue, 12 Oct 2021 13:53:45 +0100 Subject: [PATCH 27/42] Fix formatting. --- spec/src/main/asciidoc/servlet-spec-body.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 16aaff90d..027dc188a 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1329,7 +1329,7 @@ The path is split by the first occurrence of any `"\#"` character. The `"#"` and . **Separation of path and query.** + -The URI is split by the first occurrence of any '"?"' character to path and query. The query is preserved for later handling and the following steps applied to the path. +The URI is split by the first occurrence of any `"?"` character to path and query. The query is preserved for later handling and the following steps applied to the path. . **Split path into segments.** + From 7209be4a10daab1b2957fd3728565f048d7ae9ba Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Tue, 12 Oct 2021 14:01:52 +0100 Subject: [PATCH 28/42] Possible alternative --- spec/src/main/asciidoc/servlet-spec-body.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 027dc188a..43e1fa407 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1333,7 +1333,7 @@ The URI is split by the first occurrence of any `"?"` character to path and quer . **Split path into segments.** + -The path is split into multiple segments divided by the `"/"` character. A leading `"/"` does not indicate an initial empty segment. +The path is split into segments using the `"/"` character as a separator. The separating `"/"` does not form part of the resulting segments . **Remove path parameters.** + From ecdaf3b150bf3d802f8ccd95c12ca5a46019e05e Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Tue, 12 Oct 2021 14:02:37 +0100 Subject: [PATCH 29/42] Re-word decode --- spec/src/main/asciidoc/servlet-spec-body.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 43e1fa407..26d999f2d 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1339,9 +1339,9 @@ The path is split into segments using the `"/"` character as a separator. The se + Any segment containing the `";"` character is split at the first occurrence of `";"`. The segment is replaced by the character sequence preceding the `";"`. The characters following the `";"` are considered path parameters and may be preserved by the container for later decoding and/or processing (eg `jsessionid`). -. **Decode characters.** +. **Decode.** + -Characters that are encoded in `%nn` form are decoded in each segment. The resulting octet sequences is treated as UTF-8 and converted to a character sequence that replaces the segment. +Octets that are encoded in `%nn` form are decoded in each segment. The resulting octet sequence is treated as UTF-8 and converted to a character sequence that replaces the segment. . **Remove Empty Segments.** + From be4d3668cf733a0673218d5d10b44ab2f0f4779f Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Tue, 12 Oct 2021 14:03:10 +0100 Subject: [PATCH 30/42] Format / typos --- spec/src/main/asciidoc/servlet-spec-body.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 26d999f2d..40d42ff4f 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1349,11 +1349,11 @@ Empty segments are removed. Containers may be configured to retain empty segmen . **Remove dot-segments.** + -All segments that are exactly `"."` are removed from the segment series. Segments that are exactly `".."` and that are preceded by a non `".."` segment are removed together with the preceding segment. This normalization differs from RFC3986 in that segments with parameters may be treated as dot segments. +All segments that are exactly `"."` are removed from the segment series. Segments that are exactly `".."` and that are preceded by a non `".."` segment are removed together with the preceding segment. This normalization differs from RFC3986 in that segments with parameters may be treated as dot segments. . **Concatenate segments.** + -The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If the original path after stripping query and fragment ends with a `"/"` or all segmentes have been removed, then append a final `"/"` character. +The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If the original path after stripping query and fragment ends with a `"/"` or all segments have been removed, then append a final `"/"` character. If a segment contains the `"/"` or `"%"` characters, and the container is configured to not reject the request, then the container should re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. . **Mapping URI to context and resource.** From b4b6d3c81057b3a95e5e9abc0986ed7404e20669 Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Tue, 12 Oct 2021 14:06:05 +0100 Subject: [PATCH 31/42] Typo --- spec/src/main/asciidoc/servlet-spec-body.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 40d42ff4f..8074ca1a6 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1362,7 +1362,7 @@ The decoded path is used to map the request to a context and resource within the . **Rejecting Suspicious Sequences.** + -If suspicious sequences are discovered during the prior processing steps, the request must be rejected with a 400 bad request rather than dispatched to the target servlet. If a context is matched then the error handling of the context may be used to generate the 400 response. By default, the set of suspicious sequences below, but may be configured differently by a container: +If suspicious sequences are discovered during the prior processing steps, the request must be rejected with a 400 bad request rather than dispatched to the target servlet. If a context is matched then the error handling of the context may be used to generate the 400 response. By default, the set of suspicious sequences is defined below, but may be configured differently by a container: * Any path not starting with the `"/"` character (e.g. `path/info`) * Any path starting with an initial segment of `".."` (e.g. `/../path/info`) From 238a77fac83cff287a1dab8513735c18fb93cfbf Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Tue, 12 Oct 2021 14:30:51 +0100 Subject: [PATCH 32/42] Align output with text description --- .../servlet/http/CanonicalUriPathTest.java | 17 +++++++++-------- spec/src/main/asciidoc/servlet-spec-body.adoc | 6 +++--- 2 files changed, 12 insertions(+), 11 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 40b1a2925..f6cc3ec01 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -33,15 +33,7 @@ public static String canonicalUriPath(String uriPath, Consumer rejection if (uriPath == null) throw new IllegalArgumentException("null path"); - // Remember start/end conditions String path = uriPath; - boolean startsWithSlash = path.startsWith("/"); - boolean endsWithSlash = path.endsWith("/"); - boolean dotSegmentWithParam; - boolean encodedDotSegment; - boolean emptySegmentWithParam; - boolean emptySegmentBeforeDotDot = false; - boolean decodeError = false; // Discard fragment. if (path.contains("#")) @@ -51,6 +43,15 @@ public static String canonicalUriPath(String uriPath, Consumer rejection if (path.contains("?")) path = path.substring(0, path.indexOf('?')); + // Remember start/end conditions + boolean startsWithSlash = path.startsWith("/"); + boolean endsWithSlash = path.endsWith("/"); + boolean dotSegmentWithParam; + boolean encodedDotSegment; + boolean emptySegmentWithParam; + boolean emptySegmentBeforeDotDot = false; + boolean decodeError = false; + // Split path into segments. List segments = new ArrayList<>(Arrays.asList(path.substring(startsWithSlash ? 1 : 0).split("/"))); diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 8074ca1a6..bbd59ccef 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1443,9 +1443,9 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/foo/bar?q` | `/foo/bar` | | `/foo/bar#f` | `/foo/bar` | | `/foo/bar?q#f` | `/foo/bar` | -| `/foo/bar/?q` | `/foo/bar` | -| `/foo/bar/#f` | `/foo/bar` | -| `/foo/bar/?q#f` | `/foo/bar` | +| `/foo/bar/?q` | `/foo/bar/` | +| `/foo/bar/#f` | `/foo/bar/` | +| `/foo/bar/?q#f` | `/foo/bar/` | | `/foo/bar;?q` | `/foo/bar` | | `/foo/bar;#f` | `/foo/bar` | | `/foo/bar;?q#f` | `/foo/bar` | From 5e1e9d062dd5e68dded56c0eab4142a45dbff23a Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Tue, 12 Oct 2021 14:42:47 +0100 Subject: [PATCH 33/42] Fix formatting --- .../test/java/jakarta/servlet/http/CanonicalUriPathTest.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index f6cc3ec01..bbc68c94b 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -51,7 +51,7 @@ public static String canonicalUriPath(String uriPath, Consumer rejection boolean emptySegmentWithParam; boolean emptySegmentBeforeDotDot = false; boolean decodeError = false; - + // Split path into segments. List segments = new ArrayList<>(Arrays.asList(path.substring(startsWithSlash ? 1 : 0).split("/"))); From fb26bc2d5abc64ca52cc179da0868a17ed1c90b2 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Wed, 13 Oct 2021 09:10:12 +1100 Subject: [PATCH 34/42] Unit test with assertions. Updated test with expected results that are authored (rather than generated). Test expected to fail at this stage until we fix trailing '/' handling. --- .../servlet/http/CanonicalUriPathTest.java | 175 +++++++++--------- 1 file changed, 86 insertions(+), 89 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index bbc68c94b..74ffec3dd 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -14,8 +14,11 @@ import java.util.Set; import java.util.TreeMap; import java.util.function.Consumer; +import java.util.stream.Stream; +import org.junit.jupiter.api.Assertions; import org.junit.jupiter.params.ParameterizedTest; -import org.junit.jupiter.params.provider.ValueSource; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; public class CanonicalUriPathTest { @@ -173,99 +176,93 @@ private static CharBuffer fromUtf8(byte[] bytes) { } } - @ParameterizedTest - @ValueSource(strings = { - "foo/bar", - "/foo/bar", - - "/foo/bar/", - "/foo;/bar;", - "/foo;/bar;/;", - "/foo%00/bar/", - "/foo%7F/bar/", - "/foo%2Fbar", - "/foo\\bar", - "/foo%5Cbar", - "/foo;%2E/bar", - "/foo;%2F/bar", - - "/foo/./bar", - "/foo/././bar", - "/./foo/bar", - "/foo/%2e/bar", - "/foo/.;/bar", - "/foo/%2e;/bar", - "/foo/.%2Fbar", - "/foo/.%5Cbar", - "/foo/bar/.", - "/foo/bar/./", - "/foo/bar/.;", - "/foo/bar/./;", - "/foo/.bar", - - "/foo/../bar", - "/foo/../../bar", - "/../foo/bar", - "/foo/%2e%2E/bar", - "/foo/%2e%2e/%2E%2E/bar", - "/foo/./../bar", - "/foo/..;/bar", - "/foo/%2e%2E;/bar", - "/foo/..%2Fbar", - "/foo/..%5Cbar", - "/foo/bar/..", - "/foo/bar/../", - "/foo/bar/..;", - "/foo/bar/../;", - "/foo/..bar", - - "/foo/.../bar", - - "/foo//bar", - "//foo//bar//", - "/;/foo;/;/bar/;/;", - "/foo//../bar", - "/foo/;/../bar", + public static Stream data() { + List data = new ArrayList<>(); + data.add(new Object[] { "foo/bar", "/foo/bar", true }); + data.add(new Object[] { "/foo/bar", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar;jsessionid=1234", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar/", "/foo/bar/", false }); + data.add(new Object[] { "/foo/bar/;jsessionid=1234", "/foo/bar/", false }); + data.add(new Object[] { "/foo;/bar;", "/foo/bar", false }); + data.add(new Object[] { "/foo;/bar;/;", "/foo/bar/", false }); + data.add(new Object[] { "/foo%00/bar/", "/foo\00/bar/", true }); + data.add(new Object[] { "/foo%2Fbar", "/foo/bar", true }); + data.add(new Object[] { "/foo\\bar", "/foo\\bar", true }); + data.add(new Object[] { "/foo%5Cbar", "/foo\\bar", true }); + data.add(new Object[] { "/foo;%2F/bar", "/foo/bar", true }); + data.add(new Object[] { "/foo/./bar", "/foo/bar", false }); + data.add(new Object[] { "/foo/././bar", "/foo/bar", false }); + data.add(new Object[] { "/./foo/bar", "/foo/bar", false }); + data.add(new Object[] { "/foo/%2e/bar", "/foo/bar", true }); + data.add(new Object[] { "/foo/.;/bar", "/foo/bar", true }); + data.add(new Object[] { "/foo/%2e;/bar", "/foo/bar", true }); + data.add(new Object[] { "/foo/.%2Fbar", "/foo/./bar", true }); + data.add(new Object[] { "/foo/.%5Cbar", "/foo/.\\bar", true }); + data.add(new Object[] { "/foo/bar/.", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar/./", "/foo/bar/", false }); + data.add(new Object[] { "/foo/bar/.;", "/foo/bar", true }); + data.add(new Object[] { "/foo/bar/./;", "/foo/bar/", false }); + data.add(new Object[] { "/foo/.bar", "/foo/.bar", false }); + data.add(new Object[] { "/foo/../bar", "/bar", false }); + data.add(new Object[] { "/foo/../../bar", "/../bar", true }); + data.add(new Object[] { "/../foo/bar", "/../foo/bar", true }); + data.add(new Object[] { "/foo/%2e%2E/bar", "/bar", true }); + data.add(new Object[] { "/foo/%2e%2e/%2E%2E/bar", "/../bar", true }); + data.add(new Object[] { "/foo/./../bar", "/bar", false }); + data.add(new Object[] { "/foo/..;/bar", "/bar", true }); + data.add(new Object[] { "/foo/%2e%2E;/bar", "/bar", true }); + data.add(new Object[] { "/foo/..%2Fbar", "/foo/../bar", true }); + data.add(new Object[] { "/foo/..%5Cbar", "/foo/..\\bar", true }); + data.add(new Object[] { "/foo/bar/..", "/foo", false }); + data.add(new Object[] { "/foo/bar/../", "/foo/", false }); + data.add(new Object[] { "/foo/bar/..;", "/foo", true }); + data.add(new Object[] { "/foo/bar/../;", "/foo/", false }); + data.add(new Object[] { "/foo/..bar", "/foo/..bar", false }); + data.add(new Object[] { "/foo/.../bar", "/foo/.../bar", false }); + data.add(new Object[] { "/foo//bar", "/foo/bar", false }); + data.add(new Object[] { "//foo//bar//", "/foo/bar/", false }); + data.add(new Object[] { "/;/foo;/;/bar/;/;", "/foo/bar/", true }); + data.add(new Object[] { "/foo//../bar", "/bar", false }); + data.add(new Object[] { "/foo/;/../bar", "/bar", true }); + data.add(new Object[] { "/foo%E2%82%ACbar", "/foo€bar", false }); + data.add(new Object[] { "/foo%20bar", "/foo bar", false }); + data.add(new Object[] { "/foo%E2%82", "/foo%E2%82", true }); + data.add(new Object[] { "/foo%E2%82bar", "/foo%E2%82bar", true }); + data.add(new Object[] { "/foo%XX/bar", "/foo%XX/bar", true }); + data.add(new Object[] { "/foo%/bar", "/foo%/bar", true }); + data.add(new Object[] { "/foo/bar%0", "/foo/bar%0", true }); + data.add(new Object[] { "/good%20/bad%/%20mix%", "/good /bad%/%20mix%", true }); + data.add(new Object[] { "/foo/bar#f", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar?q#f", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar/?q", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar/#f", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar/?q#f", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar;?q", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar;#f", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar;?q#f", "/foo/bar", false }); + data.add(new Object[] { "/", "/", false }); + data.add(new Object[] { "//", "/", false }); + data.add(new Object[] { "/;/", "/", true }); + data.add(new Object[] { "/.", "/", false }); + data.add(new Object[] { "/..", "/..", true }); + data.add(new Object[] { "/./", "/", false }); + data.add(new Object[] { "/../", "/../", true }); + data.add(new Object[] { "foo/bar/", "/foo/bar/", true }); + data.add(new Object[] { ";/foo/bar/", "/foo/bar/", true }); - "/foo%E2%82%ACbar", - "/foo%20bar", - "/foo%E2%82", - "/foo%E2%82bar", - "/foo%-1bar", - "/foo%XX/bar", - "/foo%/bar", - "/foo/bar%0", - "/good%20/bad%/%20mix%", - "/foo/bar?q", - "/foo/bar#f", - "/foo/bar?q#f", - "/foo/bar/?q", - "/foo/bar/#f", - "/foo/bar/?q#f", - "/foo/bar;?q", - "/foo/bar;#f", - "/foo/bar;?q#f", - - "/", - "//", - "/;/", - "/.", - "/..", - "/./", - "/../", - - "foo/bar/", - "./foo/bar/", - "%2e/foo/bar/", - "../foo/bar/", - ".%2e/foo/bar/", - ";/foo/bar/", - }) + return data.stream().map(Arguments::of); + } - public void testCanonicalUriPath(String path) { + @ParameterizedTest + @MethodSource("data") + public void testCanonicalUriPath(String path, String expected, boolean rejected) { List rejections = new ArrayList<>(); String canonical = canonicalUriPath(path, rejections::add); + Assertions.assertEquals(expected, canonical); + Assertions.assertEquals(rejected, !rejections.isEmpty()); + + // print for inclusion in adoc System.err.printf("| `%s` | `%s` | ", path, canonical); if (!rejections.isEmpty()) { for (int i = 0; i < rejections.size(); i++) { From e8f0f10fae307dbadb65274e46087fae28d97f7f Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Wed, 13 Oct 2021 09:57:54 +1100 Subject: [PATCH 35/42] Handle trailing / with last empty segment --- .../servlet/http/CanonicalUriPathTest.java | 42 ++++++++++++------- spec/src/main/asciidoc/servlet-spec-body.adoc | 24 +++++------ 2 files changed, 38 insertions(+), 28 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 74ffec3dd..5bbd63f94 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -13,6 +13,7 @@ import java.util.ListIterator; import java.util.Set; import java.util.TreeMap; +import java.util.concurrent.atomic.AtomicInteger; import java.util.function.Consumer; import java.util.stream.Stream; import org.junit.jupiter.api.Assertions; @@ -48,18 +49,17 @@ public static String canonicalUriPath(String uriPath, Consumer rejection // Remember start/end conditions boolean startsWithSlash = path.startsWith("/"); - boolean endsWithSlash = path.endsWith("/"); boolean dotSegmentWithParam; boolean encodedDotSegment; - boolean emptySegmentWithParam; + boolean emptyNonLastSegmentWithParam; boolean emptySegmentBeforeDotDot = false; boolean decodeError = false; // Split path into segments. - List segments = new ArrayList<>(Arrays.asList(path.substring(startsWithSlash ? 1 : 0).split("/"))); + List segments = new ArrayList<>(Arrays.asList(path.substring(startsWithSlash ? 1 : 0).split("/", -1))); // Remove path parameters. - emptySegmentWithParam = segments.stream().anyMatch(s -> s.startsWith(";")); + emptyNonLastSegmentWithParam = segments.stream().limit(segments.size() - 1).anyMatch(s -> s.startsWith(";")); dotSegmentWithParam = segments.stream().anyMatch(s -> s.startsWith(".;") || s.startsWith("..;")); segments.replaceAll(s -> (s.contains(";")) ? s.substring(0, s.indexOf(';')) : s); @@ -71,8 +71,9 @@ public static String canonicalUriPath(String uriPath, Consumer rejection decodeError = true; } - // Remove Empty Segments - segments.removeIf(s -> s.length() == 0); + // Remove Empty Segments other than the last + AtomicInteger last = new AtomicInteger(segments.size()); + segments.removeIf(s -> last.decrementAndGet() != 0 && s.length() == 0); // Remove dot-segments int count = 0; @@ -94,11 +95,13 @@ public static String canonicalUriPath(String uriPath, Consumer rejection } // Concatenate segments - StringBuilder buf = new StringBuilder(); - segments.forEach(s -> buf.append("/").append(s)); - if (endsWithSlash || segments.size() == 0) - buf.append("/"); - path = buf.toString(); + if (segments.size() == 0) + path = "/"; + else { + StringBuilder buf = new StringBuilder(); + segments.forEach(s -> buf.append("/").append(s)); + path = buf.toString(); + } // Rejecting Errors and Suspicious Sequences if (decodeError) @@ -122,7 +125,7 @@ public static String canonicalUriPath(String uriPath, Consumer rejection if (emptySegmentBeforeDotDot) rejection.accept("empty segment before dot dot"); // Any empty segment with parameters - if (emptySegmentWithParam) + if (emptyNonLastSegmentWithParam) rejection.accept("empty segment with parameters"); // The `"\"` character encoded or not. if (path.contains("\\")) @@ -185,7 +188,8 @@ public static Stream data() { data.add(new Object[] { "/foo/bar/;jsessionid=1234", "/foo/bar/", false }); data.add(new Object[] { "/foo;/bar;", "/foo/bar", false }); data.add(new Object[] { "/foo;/bar;/;", "/foo/bar/", false }); - data.add(new Object[] { "/foo%00/bar/", "/foo\00/bar/", true }); + data.add(new Object[] { "/foo%00/bar/", "/foo\000/bar/", true }); + data.add(new Object[] { "/foo%7Fbar", "/foo\177bar", true }); data.add(new Object[] { "/foo%2Fbar", "/foo/bar", true }); data.add(new Object[] { "/foo\\bar", "/foo\\bar", true }); data.add(new Object[] { "/foo%5Cbar", "/foo\\bar", true }); @@ -228,15 +232,17 @@ public static Stream data() { data.add(new Object[] { "/foo%20bar", "/foo bar", false }); data.add(new Object[] { "/foo%E2%82", "/foo%E2%82", true }); data.add(new Object[] { "/foo%E2%82bar", "/foo%E2%82bar", true }); + data.add(new Object[] { "/foo%-1/bar", "/foo%-1/bar", true }); data.add(new Object[] { "/foo%XX/bar", "/foo%XX/bar", true }); data.add(new Object[] { "/foo%/bar", "/foo%/bar", true }); data.add(new Object[] { "/foo/bar%0", "/foo/bar%0", true }); data.add(new Object[] { "/good%20/bad%/%20mix%", "/good /bad%/%20mix%", true }); + data.add(new Object[] { "/foo/bar?q", "/foo/bar", false }); data.add(new Object[] { "/foo/bar#f", "/foo/bar", false }); data.add(new Object[] { "/foo/bar?q#f", "/foo/bar", false }); - data.add(new Object[] { "/foo/bar/?q", "/foo/bar", false }); - data.add(new Object[] { "/foo/bar/#f", "/foo/bar", false }); - data.add(new Object[] { "/foo/bar/?q#f", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar/?q", "/foo/bar/", false }); + data.add(new Object[] { "/foo/bar/#f", "/foo/bar/", false }); + data.add(new Object[] { "/foo/bar/?q#f", "/foo/bar/", false }); data.add(new Object[] { "/foo/bar;?q", "/foo/bar", false }); data.add(new Object[] { "/foo/bar;#f", "/foo/bar", false }); data.add(new Object[] { "/foo/bar;?q#f", "/foo/bar", false }); @@ -248,6 +254,10 @@ public static Stream data() { data.add(new Object[] { "/./", "/", false }); data.add(new Object[] { "/../", "/../", true }); data.add(new Object[] { "foo/bar/", "/foo/bar/", true }); + data.add(new Object[] { "./foo/bar/", "/foo/bar/", true }); + data.add(new Object[] { "%2e/foo/bar/", "/foo/bar/", true }); + data.add(new Object[] { "../foo/bar/", "/../foo/bar/", true }); + data.add(new Object[] { ".%2e/foo/bar/", "/../foo/bar/", true }); data.add(new Object[] { ";/foo/bar/", "/foo/bar/", true }); return data.stream().map(Arguments::of); diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index bbd59ccef..a1eb3a987 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1333,7 +1333,7 @@ The URI is split by the first occurrence of any `"?"` character to path and quer . **Split path into segments.** + -The path is split into segments using the `"/"` character as a separator. The separating `"/"` does not form part of the resulting segments +The path is split into segments using the `"/"` character as a prefix to each segment. The separating `"/"` does not form part of the resulting segments. For example, the path `"/foo/bar/"` is split into 3 segments: `"foo"`, `"bar"` and `""`. The prefix `"/"` for the fist segment is optional, but URIs without a leading `"/"` should be rejected below. . **Remove path parameters.** + @@ -1345,7 +1345,7 @@ Octets that are encoded in `%nn` form are decoded in each segment. The resulting . **Remove Empty Segments.** + -Empty segments are removed. Containers may be configured to retain empty segments. +Empty segments, other than the last segment, are removed. Containers may be configured to retain all empty segments. . **Remove dot-segments.** + @@ -1353,8 +1353,7 @@ All segments that are exactly `"."` are removed from the segment series. Segment . **Concatenate segments.** + -The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If the original path after stripping query and fragment ends with a `"/"` or all segments have been removed, then append a final `"/"` character. - If a segment contains the `"/"` or `"%"` characters, and the container is configured to not reject the request, then the container should re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. +The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If there are no segments remaining, the resulting path is `"/"`. If a segment contains the `"/"` or `"%"` characters, and the container is configured to not reject the request, then the container should re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. . **Mapping URI to context and resource.** + @@ -1370,7 +1369,7 @@ If suspicious sequences are discovered during the prior processing steps, the re * Any `"."` or `".."` segment that had a path parameter (e.g. `/path/..;/info`) * Any `"."` or `".."` segment with any encoded characters (e.g. `/path/%2e%2e/info`) * Any `".."` segment preceded by an empty segment (e.g. `/path//../info`) - * Any empty segment with parameters (e.g. `/path/;param/info` ) + * Any empty segment other than the last segment, with parameters (e.g. `/path/;param/info` ) * The `"\"` character encoded or not. (e.g. `/path\info`) * Any control characters either encoded or not. (e.g. `/path%00/info`) * Any illegal hex sequences following a % character @@ -1387,15 +1386,16 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `foo/bar` | `/foo/bar` | 400 must start with / | `/foo/bar` | `/foo/bar` | +| `/foo/bar;jsessionid=1234` | `/foo/bar` | | `/foo/bar/` | `/foo/bar/` | +| `/foo/bar/;jsessionid=1234` | `/foo/bar/` | | `/foo;/bar;` | `/foo/bar` | -| `/foo;/bar;/;` | `/foo/bar` | 400 empty segment with parameters +| `/foo;/bar;/;` | `/foo/bar/` | | `/foo%00/bar/` | `/foo/bar/` | 400 control character -| `/foo%7F/bar/` | `/foo/bar/` | 400 control character +| `/foo%7Fbar` | `/foobar` | 400 control character | `/foo%2Fbar` | `/foo/bar` | 400 encoded / | `/foo\bar` | `/foo\bar` | 400 backslash character | `/foo%5Cbar` | `/foo\bar` | 400 backslash character -| `/foo;%2E/bar` | `/foo/bar` | | `/foo;%2F/bar` | `/foo/bar` | 400 encoded / | `/foo/./bar` | `/foo/bar` | | `/foo/././bar` | `/foo/bar` | @@ -1408,7 +1408,7 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/foo/bar/.` | `/foo/bar` | | `/foo/bar/./` | `/foo/bar/` | | `/foo/bar/.;` | `/foo/bar` | 400 dot segment with parameter -| `/foo/bar/./;` | `/foo/bar` | 400 empty segment with parameters +| `/foo/bar/./;` | `/foo/bar/` | | `/foo/.bar` | `/foo/.bar` | | `/foo/../bar` | `/bar` | | `/foo/../../bar` | `/../bar` | 400 leading dot-dot-segment @@ -1423,19 +1423,19 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/foo/bar/..` | `/foo` | | `/foo/bar/../` | `/foo/` | | `/foo/bar/..;` | `/foo` | 400 dot segment with parameter -| `/foo/bar/../;` | `/foo` | 400 empty segment with parameters +| `/foo/bar/../;` | `/foo/` | | `/foo/..bar` | `/foo/..bar` | | `/foo/.../bar` | `/foo/.../bar` | | `/foo//bar` | `/foo/bar` | | `//foo//bar//` | `/foo/bar/` | -| `/;/foo;/;/bar/;/;` | `/foo/bar` | 400 empty segment with parameters +| `/;/foo;/;/bar/;/;` | `/foo/bar/` | 400 empty segment with parameters | `/foo//../bar` | `/bar` | | `/foo/;/../bar` | `/bar` | 400 empty segment with parameters | `/foo%E2%82%ACbar` | `/foo€bar` | | `/foo%20bar` | `/foo bar` | | `/foo%E2%82` | `/foo%E2%82` | 400 decode error | `/foo%E2%82bar` | `/foo%E2%82bar` | 400 decode error -| `/foo%-1bar` | `/foo%-1bar` | 400 decode error +| `/foo%-1/bar` | `/foo%-1/bar` | 400 decode error | `/foo%XX/bar` | `/foo%XX/bar` | 400 decode error | `/foo%/bar` | `/foo%/bar` | 400 decode error | `/foo/bar%0` | `/foo/bar%0` | 400 decode error From e47f03a545c7bb009c12ad9af77434780300eb10 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Wed, 13 Oct 2021 13:37:25 +1100 Subject: [PATCH 36/42] Handle trailing / with last empty segment --- spec/src/main/asciidoc/servlet-spec-body.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index a1eb3a987..9c80dedcf 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1368,7 +1368,7 @@ If suspicious sequences are discovered during the prior processing steps, the re * The encoded `"/"` character (e.g. `/path%2Finfo`) * Any `"."` or `".."` segment that had a path parameter (e.g. `/path/..;/info`) * Any `"."` or `".."` segment with any encoded characters (e.g. `/path/%2e%2e/info`) - * Any `".."` segment preceded by an empty segment (e.g. `/path//../info`) + * If empty segments are not removed, then `".."` segment preceded by an empty segment (e.g. `/path//../info`) * Any empty segment other than the last segment, with parameters (e.g. `/path/;param/info` ) * The `"\"` character encoded or not. (e.g. `/path\info`) * Any control characters either encoded or not. (e.g. `/path%00/info`) From 2c3e13662158cbd055c065b3eea4d692de227057 Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Wed, 13 Oct 2021 15:15:35 +0100 Subject: [PATCH 37/42] Treat fragments as suspicious --- .../servlet/http/CanonicalUriPathTest.java | 40 ++++++++++++------- spec/src/main/asciidoc/servlet-spec-body.adoc | 1 + 2 files changed, 27 insertions(+), 14 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 5bbd63f94..82a7f23e8 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -39,21 +39,27 @@ public static String canonicalUriPath(String uriPath, Consumer rejection String path = uriPath; + // Remember start/end conditions + boolean fragment = false; + boolean startsWithSlash; + boolean dotSegmentWithParam; + boolean encodedDotSegment; + boolean emptyNonLastSegmentWithParam; + boolean emptySegmentBeforeDotDot = false; + boolean decodeError = false; + // Discard fragment. - if (path.contains("#")) + if (path.contains("#")) { path = path.substring(0, path.indexOf('#')); + fragment = true; + } // Separation of path and query. if (path.contains("?")) path = path.substring(0, path.indexOf('?')); - // Remember start/end conditions - boolean startsWithSlash = path.startsWith("/"); - boolean dotSegmentWithParam; - boolean encodedDotSegment; - boolean emptyNonLastSegmentWithParam; - boolean emptySegmentBeforeDotDot = false; - boolean decodeError = false; + // This needs to be checked after removal ot + startsWithSlash = path.startsWith("/"); // Split path into segments. List segments = new ArrayList<>(Arrays.asList(path.substring(startsWithSlash ? 1 : 0).split("/", -1))); @@ -104,6 +110,8 @@ public static String canonicalUriPath(String uriPath, Consumer rejection } // Rejecting Errors and Suspicious Sequences + if (fragment) + rejection.accept("fragment"); if (decodeError) rejection.accept("decode error"); // Any path not starting with the `"/"` character @@ -238,14 +246,14 @@ public static Stream data() { data.add(new Object[] { "/foo/bar%0", "/foo/bar%0", true }); data.add(new Object[] { "/good%20/bad%/%20mix%", "/good /bad%/%20mix%", true }); data.add(new Object[] { "/foo/bar?q", "/foo/bar", false }); - data.add(new Object[] { "/foo/bar#f", "/foo/bar", false }); - data.add(new Object[] { "/foo/bar?q#f", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar#f", "/foo/bar", true }); + data.add(new Object[] { "/foo/bar?q#f", "/foo/bar", true }); data.add(new Object[] { "/foo/bar/?q", "/foo/bar/", false }); - data.add(new Object[] { "/foo/bar/#f", "/foo/bar/", false }); - data.add(new Object[] { "/foo/bar/?q#f", "/foo/bar/", false }); + data.add(new Object[] { "/foo/bar/#f", "/foo/bar/", true }); + data.add(new Object[] { "/foo/bar/?q#f", "/foo/bar/", true }); data.add(new Object[] { "/foo/bar;?q", "/foo/bar", false }); - data.add(new Object[] { "/foo/bar;#f", "/foo/bar", false }); - data.add(new Object[] { "/foo/bar;?q#f", "/foo/bar", false }); + data.add(new Object[] { "/foo/bar;#f", "/foo/bar", true }); + data.add(new Object[] { "/foo/bar;?q#f", "/foo/bar", true }); data.add(new Object[] { "/", "/", false }); data.add(new Object[] { "//", "/", false }); data.add(new Object[] { "/;/", "/", true }); @@ -259,6 +267,10 @@ public static Stream data() { data.add(new Object[] { "../foo/bar/", "/../foo/bar/", true }); data.add(new Object[] { ".%2e/foo/bar/", "/../foo/bar/", true }); data.add(new Object[] { ";/foo/bar/", "/foo/bar/", true }); + data.add(new Object[] { "/#f", "/", true }); + data.add(new Object[] { "#f", "/", true }); + data.add(new Object[] { "/?q", "/", false }); + data.add(new Object[] { "?q", "/", true }); return data.stream().map(Arguments::of); } diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 877498041..2f352d145 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1363,6 +1363,7 @@ The decoded path is used to map the request to a context and resource within the + If suspicious sequences are discovered during the prior processing steps, the request must be rejected with a 400 bad request rather than dispatched to the target servlet. If a context is matched then the error handling of the context may be used to generate the 400 response. By default, the set of suspicious sequences is defined below, but may be configured differently by a container: + * The presence of a fragment in the URI * Any path not starting with the `"/"` character (e.g. `path/info`) * Any path starting with an initial segment of `".."` (e.g. `/../path/info`) * The encoded `"/"` character (e.g. `/path%2Finfo`) From e5b0bb008c6ee692b240af91d9f13b49666545bf Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Wed, 13 Oct 2021 18:10:54 +0100 Subject: [PATCH 38/42] Re-encode / as %2F (and % as %25) --- .../servlet/http/CanonicalUriPathTest.java | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 82a7f23e8..93c5bd000 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -58,7 +58,7 @@ public static String canonicalUriPath(String uriPath, Consumer rejection if (path.contains("?")) path = path.substring(0, path.indexOf('?')); - // This needs to be checked after removal ot + // This needs to be checked after removal of path and query startsWithSlash = path.startsWith("/"); // Split path into segments. @@ -105,6 +105,9 @@ public static String canonicalUriPath(String uriPath, Consumer rejection path = "/"; else { StringBuilder buf = new StringBuilder(); + if (!decodeError) { + segments.replaceAll(CanonicalUriPathTest::encode); + } segments.forEach(s -> buf.append("/").append(s)); path = buf.toString(); } @@ -179,6 +182,14 @@ private static String decode(String segment) { return segment; } + private static String encode(String segment) { + if (segment.contains("%") || segment.contains("/")) { + segment = segment.replace("%", "%25"); + segment = segment.replace("/", "%2F"); + } + return segment; + } + private static CharBuffer fromUtf8(byte[] bytes) { try { return StandardCharsets.UTF_8.newDecoder().onMalformedInput(CodingErrorAction.REPORT).decode(ByteBuffer.wrap(bytes)); @@ -198,7 +209,7 @@ public static Stream data() { data.add(new Object[] { "/foo;/bar;/;", "/foo/bar/", false }); data.add(new Object[] { "/foo%00/bar/", "/foo\000/bar/", true }); data.add(new Object[] { "/foo%7Fbar", "/foo\177bar", true }); - data.add(new Object[] { "/foo%2Fbar", "/foo/bar", true }); + data.add(new Object[] { "/foo%2Fbar", "/foo%2Fbar", true }); data.add(new Object[] { "/foo\\bar", "/foo\\bar", true }); data.add(new Object[] { "/foo%5Cbar", "/foo\\bar", true }); data.add(new Object[] { "/foo;%2F/bar", "/foo/bar", true }); @@ -208,7 +219,7 @@ public static Stream data() { data.add(new Object[] { "/foo/%2e/bar", "/foo/bar", true }); data.add(new Object[] { "/foo/.;/bar", "/foo/bar", true }); data.add(new Object[] { "/foo/%2e;/bar", "/foo/bar", true }); - data.add(new Object[] { "/foo/.%2Fbar", "/foo/./bar", true }); + data.add(new Object[] { "/foo/.%2Fbar", "/foo/.%2Fbar", true }); data.add(new Object[] { "/foo/.%5Cbar", "/foo/.\\bar", true }); data.add(new Object[] { "/foo/bar/.", "/foo/bar", false }); data.add(new Object[] { "/foo/bar/./", "/foo/bar/", false }); @@ -223,7 +234,7 @@ public static Stream data() { data.add(new Object[] { "/foo/./../bar", "/bar", false }); data.add(new Object[] { "/foo/..;/bar", "/bar", true }); data.add(new Object[] { "/foo/%2e%2E;/bar", "/bar", true }); - data.add(new Object[] { "/foo/..%2Fbar", "/foo/../bar", true }); + data.add(new Object[] { "/foo/..%2Fbar", "/foo/..%2Fbar", true }); data.add(new Object[] { "/foo/..%5Cbar", "/foo/..\\bar", true }); data.add(new Object[] { "/foo/bar/..", "/foo", false }); data.add(new Object[] { "/foo/bar/../", "/foo/", false }); From 3a8db59df2ac7fcf0fcf93398d22fbe7d04938d9 Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Wed, 13 Oct 2021 18:16:18 +0100 Subject: [PATCH 39/42] Update table. Fix formatting --- .../servlet/http/CanonicalUriPathTest.java | 4 +- spec/src/main/asciidoc/servlet-spec-body.adoc | 88 ++++++++++--------- 2 files changed, 48 insertions(+), 44 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 93c5bd000..3da534357 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -106,7 +106,7 @@ public static String canonicalUriPath(String uriPath, Consumer rejection else { StringBuilder buf = new StringBuilder(); if (!decodeError) { - segments.replaceAll(CanonicalUriPathTest::encode); + segments.replaceAll(CanonicalUriPathTest::encode); } segments.forEach(s -> buf.append("/").append(s)); path = buf.toString(); @@ -189,7 +189,7 @@ private static String encode(String segment) { } return segment; } - + private static CharBuffer fromUtf8(byte[] bytes) { try { return StandardCharsets.UTF_8.newDecoder().onMalformedInput(CodingErrorAction.REPORT).decode(ByteBuffer.wrap(bytes)); diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 2f352d145..e8b16befc 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1386,54 +1386,54 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | Encoded URI path | Decoded Path | Rejected | `foo/bar` | `/foo/bar` | 400 must start with / -| `/foo/bar` | `/foo/bar` | -| `/foo/bar;jsessionid=1234` | `/foo/bar` | -| `/foo/bar/` | `/foo/bar/` | -| `/foo/bar/;jsessionid=1234` | `/foo/bar/` | -| `/foo;/bar;` | `/foo/bar` | -| `/foo;/bar;/;` | `/foo/bar/` | -| `/foo%00/bar/` | `/foo/bar/` | 400 control character +| `/foo/bar` | `/foo/bar` | +| `/foo/bar;jsessionid=1234` | `/foo/bar` | +| `/foo/bar/` | `/foo/bar/` | +| `/foo/bar/;jsessionid=1234` | `/foo/bar/` | +| `/foo;/bar;` | `/foo/bar` | +| `/foo;/bar;/;` | `/foo/bar/` | +| `/foo%00/bar/` | `/foo | `/foo%7Fbar` | `/foobar` | 400 control character -| `/foo%2Fbar` | `/foo/bar` | 400 encoded / +| `/foo%2Fbar` | `/foo%2Fbar` | 400 encoded / | `/foo\bar` | `/foo\bar` | 400 backslash character | `/foo%5Cbar` | `/foo\bar` | 400 backslash character | `/foo;%2F/bar` | `/foo/bar` | 400 encoded / -| `/foo/./bar` | `/foo/bar` | -| `/foo/././bar` | `/foo/bar` | -| `/./foo/bar` | `/foo/bar` | +| `/foo/./bar` | `/foo/bar` | +| `/foo/././bar` | `/foo/bar` | +| `/./foo/bar` | `/foo/bar` | | `/foo/%2e/bar` | `/foo/bar` | 400 encoded dot segment | `/foo/.;/bar` | `/foo/bar` | 400 dot segment with parameter | `/foo/%2e;/bar` | `/foo/bar` | 400 encoded dot segment -| `/foo/.%2Fbar` | `/foo/./bar` | 400 encoded / +| `/foo/.%2Fbar` | `/foo/.%2Fbar` | 400 encoded / | `/foo/.%5Cbar` | `/foo/.\bar` | 400 backslash character -| `/foo/bar/.` | `/foo/bar` | -| `/foo/bar/./` | `/foo/bar/` | +| `/foo/bar/.` | `/foo/bar` | +| `/foo/bar/./` | `/foo/bar/` | | `/foo/bar/.;` | `/foo/bar` | 400 dot segment with parameter -| `/foo/bar/./;` | `/foo/bar/` | -| `/foo/.bar` | `/foo/.bar` | -| `/foo/../bar` | `/bar` | +| `/foo/bar/./;` | `/foo/bar/` | +| `/foo/.bar` | `/foo/.bar` | +| `/foo/../bar` | `/bar` | | `/foo/../../bar` | `/../bar` | 400 leading dot-dot-segment | `/../foo/bar` | `/../foo/bar` | 400 leading dot-dot-segment | `/foo/%2e%2E/bar` | `/bar` | 400 encoded dot segment | `/foo/%2e%2e/%2E%2E/bar` | `/../bar` | 400 leading dot-dot-segment & encoded dot segment -| `/foo/./../bar` | `/bar` | +| `/foo/./../bar` | `/bar` | | `/foo/..;/bar` | `/bar` | 400 dot segment with parameter | `/foo/%2e%2E;/bar` | `/bar` | 400 encoded dot segment -| `/foo/..%2Fbar` | `/foo/../bar` | 400 encoded / +| `/foo/..%2Fbar` | `/foo/..%2Fbar` | 400 encoded / | `/foo/..%5Cbar` | `/foo/..\bar` | 400 backslash character -| `/foo/bar/..` | `/foo` | -| `/foo/bar/../` | `/foo/` | +| `/foo/bar/..` | `/foo` | +| `/foo/bar/../` | `/foo/` | | `/foo/bar/..;` | `/foo` | 400 dot segment with parameter -| `/foo/bar/../;` | `/foo/` | -| `/foo/..bar` | `/foo/..bar` | -| `/foo/.../bar` | `/foo/.../bar` | -| `/foo//bar` | `/foo/bar` | -| `//foo//bar//` | `/foo/bar/` | +| `/foo/bar/../;` | `/foo/` | +| `/foo/..bar` | `/foo/..bar` | +| `/foo/.../bar` | `/foo/.../bar` | +| `/foo//bar` | `/foo/bar` | +| `//foo//bar//` | `/foo/bar/` | | `/;/foo;/;/bar/;/;` | `/foo/bar/` | 400 empty segment with parameters -| `/foo//../bar` | `/bar` | +| `/foo//../bar` | `/bar` | | `/foo/;/../bar` | `/bar` | 400 empty segment with parameters -| `/foo%E2%82%ACbar` | `/foo€bar` | -| `/foo%20bar` | `/foo bar` | +| `/foo%E2%82%ACbar` | `/foo€bar` | +| `/foo%20bar` | `/foo bar` | | `/foo%E2%82` | `/foo%E2%82` | 400 decode error | `/foo%E2%82bar` | `/foo%E2%82bar` | 400 decode error | `/foo%-1/bar` | `/foo%-1/bar` | 400 decode error @@ -1441,21 +1441,21 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/foo%/bar` | `/foo%/bar` | 400 decode error | `/foo/bar%0` | `/foo/bar%0` | 400 decode error | `/good%20/bad%/%20mix%` | `/good /bad%/%20mix%` | 400 decode error -| `/foo/bar?q` | `/foo/bar` | -| `/foo/bar#f` | `/foo/bar` | -| `/foo/bar?q#f` | `/foo/bar` | -| `/foo/bar/?q` | `/foo/bar/` | -| `/foo/bar/#f` | `/foo/bar/` | -| `/foo/bar/?q#f` | `/foo/bar/` | -| `/foo/bar;?q` | `/foo/bar` | -| `/foo/bar;#f` | `/foo/bar` | -| `/foo/bar;?q#f` | `/foo/bar` | -| `/` | `/` | -| `//` | `/` | +| `/foo/bar?q` | `/foo/bar` | +| `/foo/bar#f` | `/foo/bar` | 400 fragment +| `/foo/bar?q#f` | `/foo/bar` | 400 fragment +| `/foo/bar/?q` | `/foo/bar/` | +| `/foo/bar/#f` | `/foo/bar/` | 400 fragment +| `/foo/bar/?q#f` | `/foo/bar/` | 400 fragment +| `/foo/bar;?q` | `/foo/bar` | +| `/foo/bar;#f` | `/foo/bar` | 400 fragment +| `/foo/bar;?q#f` | `/foo/bar` | 400 fragment +| `/` | `/` | +| `//` | `/` | | `/;/` | `/` | 400 empty segment with parameters -| `/.` | `/` | +| `/.` | `/` | | `/..` | `/..` | 400 leading dot-dot-segment -| `/./` | `/` | +| `/./` | `/` | | `/../` | `/../` | 400 leading dot-dot-segment | `foo/bar/` | `/foo/bar/` | 400 must start with / | `./foo/bar/` | `/foo/bar/` | 400 must start with / @@ -1463,6 +1463,10 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `../foo/bar/` | `/../foo/bar/` | 400 must start with / & leading dot-dot-segment | `.%2e/foo/bar/` | `/../foo/bar/` | 400 must start with / & leading dot-dot-segment & encoded dot segment | `;/foo/bar/` | `/foo/bar/` | 400 must start with / & empty segment with parameters +| `/#f` | `/` | 400 fragment +| `#f` | `/` | 400 fragment & must start with / +| `/?q` | `/` | +| `?q` | `/` | 400 must start with / |=== From 74b39c19e9193af546c0b2b1263ad43ba4e8f1bb Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Thu, 14 Oct 2021 17:29:32 +1100 Subject: [PATCH 40/42] only leave % and / encoded if there is an encoded / --- .../jakarta/servlet/http/CanonicalUriPathTest.java | 11 ++++++++--- spec/src/main/asciidoc/servlet-spec-body.adoc | 12 +++++++----- 2 files changed, 15 insertions(+), 8 deletions(-) diff --git a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java index 3da534357..08f8fedf2 100644 --- a/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java +++ b/api/src/test/java/jakarta/servlet/http/CanonicalUriPathTest.java @@ -34,6 +34,10 @@ public class CanonicalUriPathTest { } public static String canonicalUriPath(String uriPath, Consumer rejection) { + + // The code presented here is a non-normative implementation of the algorithm + // from section 3.5 of the specification. + if (uriPath == null) throw new IllegalArgumentException("null path"); @@ -105,7 +109,7 @@ public static String canonicalUriPath(String uriPath, Consumer rejection path = "/"; else { StringBuilder buf = new StringBuilder(); - if (!decodeError) { + if (!decodeError && uriPath.toLowerCase().contains("%2f")) { segments.replaceAll(CanonicalUriPathTest::encode); } segments.forEach(s -> buf.append("/").append(s)); @@ -124,7 +128,7 @@ public static String canonicalUriPath(String uriPath, Consumer rejection if (!segments.isEmpty() && segments.get(0).equals("..")) rejection.accept("leading dot-dot-segment"); // The encoded `"/"` character - if (uriPath.contains("%2f") || uriPath.contains("%2F")) + if (uriPath.toLowerCase().contains("%2f")) rejection.accept("encoded /"); // Any `"."` or `".."` segment that had a path parameter if (dotSegmentWithParam) @@ -169,7 +173,6 @@ private static String decode(String segment) { buf.append(fromUtf8(utf8.toByteArray())); utf8.reset(); } - buf.append(c); } } @@ -210,6 +213,8 @@ public static Stream data() { data.add(new Object[] { "/foo%00/bar/", "/foo\000/bar/", true }); data.add(new Object[] { "/foo%7Fbar", "/foo\177bar", true }); data.add(new Object[] { "/foo%2Fbar", "/foo%2Fbar", true }); + data.add(new Object[] { "/foo%2Fb%25r", "/foo%2Fb%25r", true }); + data.add(new Object[] { "/foo/b%25r", "/foo/b%r", false }); data.add(new Object[] { "/foo\\bar", "/foo\\bar", true }); data.add(new Object[] { "/foo%5Cbar", "/foo\\bar", true }); data.add(new Object[] { "/foo;%2F/bar", "/foo/bar", true }); diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index e8b16befc..80c83772b 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1353,7 +1353,11 @@ All segments that are exactly `"."` are removed from the segment series. Segment . **Concatenate segments.** + -The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If there are no segments remaining, the resulting path is `"/"`. If a segment contains the `"/"` or `"%"` characters, and the container is configured to not reject the request, then the container should re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. +The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If there are no segments remaining, the resulting path is `"/"`. + +If a segment contains the "/" or "%" characters, and the container is configured to not reject the request for containing an encoded `"/"`, then the container should re-encode those characters to the %nn form. + +If a segment contains the `"/"` or `"%"` characters, and the container is configured to not reject the request, then the container should re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. . **Mapping URI to context and resource.** + @@ -1377,10 +1381,6 @@ If suspicious sequences are discovered during the prior processing steps, the re * Any illegal UTF-8 code sequences. ==== Example URIs -> TODO more work needed on this section. Perhaps some parts need to be moved to other sections. - -Consider the following URIs with a forbidden security constraint at `/secret/*` and a servlet registered at /dispatch/* that forwards request to the string return from pathInfo() - . Example URIs |=== | Encoded URI path | Decoded Path | Rejected @@ -1395,6 +1395,8 @@ Consider the following URIs with a forbidden security constraint at `/secret/*` | `/foo%00/bar/` | `/foo | `/foo%7Fbar` | `/foobar` | 400 control character | `/foo%2Fbar` | `/foo%2Fbar` | 400 encoded / +| `/foo%2Fb%25r` | `/foo%2Fb%25r` | 400 encoded / +| `/foo/b%25r` | `/foo/b%r` | | `/foo\bar` | `/foo\bar` | 400 backslash character | `/foo%5Cbar` | `/foo\bar` | 400 backslash character | `/foo;%2F/bar` | `/foo/bar` | 400 encoded / From ec8d4bb11c638348234bfab9687315a88d413be3 Mon Sep 17 00:00:00 2001 From: Greg Wilkins Date: Fri, 15 Oct 2021 08:31:09 +1100 Subject: [PATCH 41/42] removed duplication --- spec/src/main/asciidoc/servlet-spec-body.adoc | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/spec/src/main/asciidoc/servlet-spec-body.adoc b/spec/src/main/asciidoc/servlet-spec-body.adoc index 80c83772b..170a74fa8 100644 --- a/spec/src/main/asciidoc/servlet-spec-body.adoc +++ b/spec/src/main/asciidoc/servlet-spec-body.adoc @@ -1353,11 +1353,7 @@ All segments that are exactly `"."` are removed from the segment series. Segment . **Concatenate segments.** + -The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If there are no segments remaining, the resulting path is `"/"`. - -If a segment contains the "/" or "%" characters, and the container is configured to not reject the request for containing an encoded `"/"`, then the container should re-encode those characters to the %nn form. - -If a segment contains the `"/"` or `"%"` characters, and the container is configured to not reject the request, then the container should re-encode those characters to the `%nn` form. If any characters are re-encoded, then the `"%"` must also be re-encoded. +The segments are concatenated into a single path string with each segment preceded by the `"/"` character. If there are no segments remaining, the resulting path is `"/"`. If a segment contains the "/" or "%" characters, and the container is configured to not reject the request for containing an encoded `"/"`, then the container should re-encode those characters to the %nn form. If any characters are re-encoded, then the `"%"` must also be re-encoded. . **Mapping URI to context and resource.** + From e1292f9bdc2cb831ee2117946bf6d1cbd8dd366d Mon Sep 17 00:00:00 2001 From: Mark Thomas Date: Fri, 15 Oct 2021 09:12:04 +0100 Subject: [PATCH 42/42] Add canonicalization text to getPathInfo() and getContextPath() --- .../servlet/http/HttpServletRequest.java | 20 ++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java b/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java index 8c0b17857..de56f21ff 100644 --- a/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java +++ b/api/src/main/java/jakarta/servlet/http/HttpServletRequest.java @@ -249,9 +249,14 @@ public String toString() { *

* This method returns null if there was no extra path information. * - * @return a String, decoded by the web container, specifying extra path information that comes after the - * servlet path but before the query string in the request URL; or null if the URL does not have any extra - * path information + * @return a String specifying extra path information that comes after the servlet path but before the + * query string in the request URL; or null if the URL does not have any extra path information. The path + * will be canonicalized as per section 3.5 of the specification. This method will not return any encoded characters + * unless the container is configured specifically to allow them. + * @throws IllegalArgumentException In standard configuration, this method will never throw. However, a container may be + * configured to not reject some suspicious sequences identified by 3.5.2, furthermore the container may be configured + * to allow such paths to only be accessed via safer methods like {@link #getRequestURI()} and to throw + * IllegalArgumentException if this method is called for such suspicious paths. */ public String getPathInfo(); @@ -299,8 +304,13 @@ default public PushBuilder newPushBuilder() { * {@link jakarta.servlet.ServletContext#getContextPath()} should be considered as the prime or preferred context path * of the application. * - * @return a String specifying the portion of the request URI that indicates the context of the request - * + * @return a String specifying the portion of the request URI that indicates the context of the request. + * The path will be canonicalized as per section 3.5 of the specification. This method will not return any encoded + * characters unless the container is configured specifically to allow them. + * @throws IllegalArgumentException In standard configuration, this method will never throw. However, a container may be + * configured to not reject some suspicious sequences identified by 3.5.2, furthermore the container may be configured + * to allow such paths to only be accessed via safer methods like {@link #getRequestURI()} and to throw + * IllegalArgumentException if this method is called for such suspicious paths. * @see jakarta.servlet.ServletContext#getContextPath() */ public String getContextPath();