-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved URIUtil.safeDecodePath(String)
#9462
Conversation
* will use replacement characters instead of throwing an error.
buffer.append(bytes[i]); | ||
} | ||
|
||
assertEquals("ةة", buffer.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case seems out of place for this PR.
But I thought I discovered a bug in Utf8Appendable with bidi unicode glyphs.
Turns out it works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to protect control characters.
I'm moderately sure we don't need extra protection of bad UTF-8
I'm pondering if we should ever fallback to ISO8859-1 and would like to work out why we do that or if anybody else does. Maybe time to stop it.
jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/URIUtil.java
Outdated
Show resolved
Hide resolved
if (!fallbackToIso88591) | ||
throw new IllegalArgumentException("Not UTF-8, not decoding in ISO-8859-1", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think we need this... unless an illegal UTF8 character can be decoded as /
in iso8859-1, which I don't believe is possible... but even if it were, then we'd leave the /
encoded and there would be nothing ambigous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if anything, I'm dubious that we should ever fall back to iso8859-1
buffer.append(bytes[i]); | ||
} | ||
|
||
assertEquals("ةة", buffer.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this test.
|
||
@ParameterizedTest | ||
@MethodSource("safeDecodePathSource") | ||
public void testSafeDecodePath(String encodedPath, String decodedPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good test to have, but I think you are making too many things safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to @gregw.
I think falling back to ISO 8859-1 is a thing of the past, as it seems to me that every new spec, everybody and everything is on UTF-8, so I won't do it.
Arguments.of("/foo%2fbar", "/foo%2Fbar"), | ||
Arguments.of("/foo%252fbar", "/foo%252fbar"), | ||
Arguments.of("/foo%3bbar", "/foo;bar"), | ||
Arguments.of("/foo%3fbar", "/foo%3Fbar"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were the f
in %2f
capitalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because of our canonical path mechanism. We produce an encoded path that can be string compared. I.e. we take out all variations, leave only characters that need encoding encoded and uniform capitalization
@@ -81,6 +81,7 @@ public abstract class Utf8Appendable implements CharsetStringBuilder | |||
}; | |||
|
|||
private int _codep; | |||
private boolean _throwOnInvalid = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this?
If we keep it, it cannot be mutable, should be set as final
in the constructor and never modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not. See my other PR that is taking another approach to solving this problem.
More testing of Issue #9444 revealed other cases where we should be safer with our decoding.
This includes new tests for
URIUtil.safeDecodePath(String)
, along with the ability to configureUtf8Appendable
instances to use a replacement character instead of always throwing an exception in the case of bad UTF-8 sequences.