Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #18 URI path processing #424

Merged
merged 45 commits into from
Oct 19, 2021
Merged
Changes from 2 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
bd12ce5
Issue #18 URI path processing
gregw Oct 7, 2021
fabaadd
Issue #18 URI path processing
gregw Oct 7, 2021
c05f08b
Issue #18 URI path processing
gregw Oct 7, 2021
ad8f3d1
Issue #18 URI path processing
gregw Oct 7, 2021
f0d56b0
Issue #18 URI path processing
gregw Oct 7, 2021
bb45889
Issue #18 URI path processing
gregw Oct 7, 2021
c46ed03
Issue 225 do head content length (#405)
gregw Oct 8, 2021
7b5ea0d
Merge branch 'master' into issue-18-uri-handling
gregw Oct 8, 2021
c4a20a7
added change note
gregw Oct 8, 2021
fc4f32d
more URI example updates
gregw Oct 8, 2021
ec63dfc
alternate segment based algorithm
gregw Oct 8, 2021
1d77733
Merge remote-tracking branch 'jakarta/master' into issue-18-uri-handling
gregw Oct 8, 2021
ee3b679
First attempt at generating URI example table
gregw Oct 8, 2021
77f602e
handle final slash
gregw Oct 8, 2021
639ad85
revert auto change to pom.xml
gregw Oct 8, 2021
38f3cdd
More rejections in the uri test
gregw Oct 8, 2021
4c9d28b
Fixed trailing / issue
gregw Oct 8, 2021
c8c86e6
should re-encode / if not rejected
gregw Oct 8, 2021
d433fc8
discard fragment before query
gregw Oct 8, 2021
223f75d
encoded ``%2e` is equivalent to `.` for normilization.
gregw Oct 8, 2021
c304e6b
empty segment with parameters
gregw Oct 8, 2021
049dc01
lame attempt at better verbage.
gregw Oct 8, 2021
8381785
format
gregw Oct 8, 2021
35a24f2
+ simplified code
gregw Oct 10, 2021
4bfdd07
handle /.
gregw Oct 10, 2021
d475532
handle % and UTF-8 decoding errors
gregw Oct 10, 2021
2b4d50c
better handling of rejections
gregw Oct 10, 2021
7e56e06
Drop reference to HTTP/0.9
markt-asf Oct 12, 2021
f77f904
Fix formatting.
markt-asf Oct 12, 2021
7209be4
Possible alternative
markt-asf Oct 12, 2021
ecdaf3b
Re-word decode
markt-asf Oct 12, 2021
be4d366
Format / typos
markt-asf Oct 12, 2021
b4b6d3c
Typo
markt-asf Oct 12, 2021
238a77f
Align output with text description
markt-asf Oct 12, 2021
5e1e9d0
Fix formatting
markt-asf Oct 12, 2021
fb26bc2
Unit test with assertions.
gregw Oct 12, 2021
e8f0f10
Handle trailing / with last empty segment
gregw Oct 12, 2021
e47f03a
Handle trailing / with last empty segment
gregw Oct 13, 2021
98cef97
Merge branch 'master' into issue-18-uri-handling
gregw Oct 13, 2021
2c3e136
Treat fragments as suspicious
markt-asf Oct 13, 2021
e5b0bb0
Re-encode / as %2F (and % as %25)
markt-asf Oct 13, 2021
3a8db59
Update table. Fix formatting
markt-asf Oct 13, 2021
74b39c1
only leave % and / encoded if there is an encoded /
gregw Oct 14, 2021
ec8d4bb
removed duplication
gregw Oct 14, 2021
e1292f9
Add canonicalization text to getPathInfo() and getContextPath()
markt-asf Oct 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions spec/src/main/asciidoc/servlet-spec-body.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1300,6 +1300,59 @@ the header value to an `int`, a `NumberFormatException` is thrown. If
the `getDateHeader` method cannot translate the header to a `Date`
object, an `IllegalArgumentException` is thrown.

=== Request URI Processing
The process described here adapts and extends the URI canonicalization process described in [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986) to create a standard Servlet URI path canonicalization process that ensures that URIs can be mapped to Servlets, Filters and security constraints in an unambiguous manner. It is also intended to provide information to reverse proxy implementations so they are aware of how requests they pass to servlet containers will be processed.

Servlet containers may implement the standard Servlet URI path canonicalization in any manner they see fit as long as the end result is identical to the end result of the process described here. Servlet containers may provide container specific configuration options to vary the standard canonicalization process. Any such variations may have security implications and both Servlet container implementors and users are advised to be sure that they understand the implications of any such container specific canonicalization options.

==== URI Transport
===== HTTP/1.1
gregw marked this conversation as resolved.
Show resolved Hide resolved
The URI is extracted from the `request-target` as defined by [RFC 7230](https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.1). URIs in `origin-form` or `asterisk-form` are passed unchanged to stage 2. URIs in `absolute-form` have the protocol and authority removed to convert them to `origin-form` and are then passed to stage 2. URIs in `authority-form` are outside of the scope of this specification.

===== HTTP/2
gregw marked this conversation as resolved.
Show resolved Hide resolved
The URI is the `:path` pseudo header as defined by [RFC 7540](https://datatracker.ietf.org/doc/html/rfc7540#section-8.1.2.3) and is passed unchanged to stage 2.

===== Other protocols
Containers may support other protocols. Containers should extract an appropriate URI for the request from the protocol and pass it to stage 2.

==== Separation of path and query
The URI is split by the first occurrence of any '?' character to path and query. The query is preserved for later handling and the following steps applied to the path.

==== Discard fragment
A fragment in the path is indicated by the first occurrence of a `\#` character. Any `#` character and following fragment is removed from the path and discarded.

==== Decoding of non-special characters
gregw marked this conversation as resolved.
Show resolved Hide resolved
Characters other than `/`, `;` and `%` that are encoded in `%nn` form are decoded and the resulting octet sequences is treated as UTF-8 and converted to a character sequence.
gregw marked this conversation as resolved.
Show resolved Hide resolved

==== Collapse sequences of multiple `"/"` characters
gregw marked this conversation as resolved.
Show resolved Hide resolved
Any sequence of more than one `"/"` character in the URI must be replaced with a single `"/"`.

==== Remove dot-segments+
* A path not starting with "/" must be rejected with a 400 response.
* Sequences of the form `"/./"` must be replaced with `"/"`.
* Sequences of the form `"/" + segment + "/../"` must be replaced with `"/"`.
* If there is no preceding segment for a `".."` segment then return a 400 response.

==== Removal of path parameters
A path segment containing the `";"` character is split at the first occurence of `";"`. The segment is replaced by the character sequence preceeding the `";"`. The characters following the `";"` are considered a path parameters and may be preserved by the container for later processing (eg `jsessionid`).
gregw marked this conversation as resolved.
Show resolved Hide resolved

==== Decoding of remaining `%nn` sequences
Any remaining `%nn` sequences in the path should be decoded. Some containers may be configured to leave some specific characters encoded (eg. the characters '/' and '%' may be left decoded by some container configuration).

==== Mapping URI to context and resource
The decoded path is used to map the request to a context and resource within the context. This form of the URI path is used for all subsequent mapping (web applications, servlet, filters and security constraints).

==== Rejecting Suspicious Sequences
If suspicious sequences are discovered during the prior steps, the request must be rejected with a 400 bad request using the error handling of the matched context. By default the set of suspicious sequences includes:

* The encoded `"/"` character
gregw marked this conversation as resolved.
Show resolved Hide resolved
* Any `"."` or `".."` segment that had a path parameter
* Any `"."` or `".."` segment with any encoded characters
gregw marked this conversation as resolved.
Show resolved Hide resolved
* The `"\"` character encoded or not.
gregw marked this conversation as resolved.
Show resolved Hide resolved
* Any control characters either encoded or not.
gregw marked this conversation as resolved.
Show resolved Hide resolved

A container or context may be configured to have a different set of rejected sequences.

gregw marked this conversation as resolved.
Show resolved Hide resolved
=== Request Path Elements

The request path that leads to a servlet
Expand Down