Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work in progress for Resource cleanup #6518

Closed

Conversation

joakime
Copy link
Contributor

@joakime joakime commented Jul 15, 2021

Responsibilities:

  • Resource.addPath(String) should enforce base / child behaviors
  • Accessing Resources via HTTP should support proper URI encoding.

Signed-off-by: Joakim Erdfelt joakim.erdfelt@gmail.com

Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
@joakime joakime requested review from lachlan-roberts and gregw July 15, 2021 00:10
* <li>The path provided is ensured to be relative and belongs within the Resource, attempts to reference a path
* outside of the Resource will result in an exception.</li>
* </ul>
* </p>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregw @lachlan-roberts I could use feedback on this clarification on addPath

* <p>
* Constraints:
* <ul>
* <li>The path is expected to be in URI path-like format.</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "URI path-like" format is the ambiguous decoded path space that is returned from getPathInfo etc.
I think it is this space that needs to be better defined, which is part of the conversation at: jakartaee/servlet#18 (comment)

The path is "URI-like" in that it is / separated segments, but it is not a URI as it cannot contain encoded characters, specifically it cannot contain an encoded /, so there are some paths that cannot be specified.

* Constraints:
* <ul>
* <li>The path is expected to be in URI path-like format.</li>
* <li>The path provided will not be URI decoded by the Resource implementation.</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit of a convoluated way of saying that the path is decoded. Ie any % characters will not result in further decoding, but will themselves be encoded

Comment on lines +426 to +428
* <li>URI canonicalization will be attempted before being used against the Resource.
* This can result in a normal Resource, perhaps with it being an alias to a different Resource,
* perhaps even rejected with an Exception if there is an attempt to navigate above this Resource.</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is exactly correct. Rather the path is checked to see if conanicalized, would it navigate above the base and if so an MUE us thrown. Otherwise the possibly non-canonical path is used, which may result in a non-normal Resource which will return true from isAlias

Comment on lines +429 to +430
* <li>The path provided will be given to the Resource implementation for interpretation on how it
* will behave (such as using "a\\b.txt" on a Path based in linux vs windows).</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm probably true... but really not very desirable.
Ideally \ would never be treated as a separator as the separator is '/' in this space.

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.equalTo;

public class ResourceAddPathTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much rather test cases be added to the existing ResourceTest rather than create a special purpose unit test. By creating multiple special purpose test that overlap with other test cases it just becomes too complex to see what is tested and then gaps emerge. We have already gone down this path to some extent as we have ResourceTest, FileSystemResourceTest, PathResourceTest, ResourceAliasTest.

We need to simplify and rationalize the testing rather than duplicate tests, so that ResourceTest does all the tests it can over all the different implementations of Resource. Then we can have a PathResourceTest iff there are some tests that can only be done on PathResources and not other resources in general. We should strive to get rid of ResourceAliasTest, FileSystemResourceTest and definitely not add ResourceAddPathTest

/**
* Collection of Path encoding / decoding utilities for URI, Http Headers, and HTML
*/
public class PathUtil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is well named. It reads like it is for utility methods for the Path class.

Instead it has methods that work on the strange decoded URI path space that is ill defined by the servlet specification. Currently we have such methods on URIUtil, which I guess is similarly badly named as they are not URI paths either.... well I guess they are decoded URI paths, so it's a little bit better.

Eitherway, I think we should strive to not add extra classes like this, unless their purpose is clearly defined.

Comment on lines +31 to +40
URI_ENCODED_CHARS = new String[127];
Arrays.fill(URI_ENCODED_CHARS, null);

// URI Reserved Chars
String uriReservedGenDelims = ":?#[]@"; // intentionally missing "/"
String uriReservedSubDelims = "!$&'()*+,;=";
// Extra Reserved Chars (specified by Jetty)
String jettyReserved = "%\"<> \\^`{}|";

String reserved = uriReservedGenDelims + uriReservedSubDelims + jettyReserved;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this array approach, but I think we should switch around how we create it to defining what characters will not be encoded rather than which characters will be encoded.

We should not encode / and the unreserved characters from the URI spec:

  • ALPHA: A - Z + a - z
  • DIGIT: 0 - 9
  • - / . / _ / ~

I think we should encode everything else.

return buf.toString();
}

public static String encodePathDelayAlloc(String rawpath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the JMH has indicated enough that this is a better algorithm than the URIUtil.encodePath. Just update that method and don't need to check in and maintain the jmh or alternatives.

If we do want to check in a JMH, it should be a general comparison between doing a static array lookup vs switch(char)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to jmh compare the approach used in HttpParser, which looks up characters in an array of Token[256] and then does a switch on the enum obtained from token.getType.

@joakime joakime closed this Sep 15, 2021
@joakime joakime deleted the jetty-10.0.x-resources-addpath-constraint branch September 15, 2021 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants