Improved `URIUtil.safeDecodePath(String)` #9462

joakime · 2023-03-03T16:46:29Z

More testing of Issue #9444 revealed other cases where we should be safer with our decoding.

This includes new tests for URIUtil.safeDecodePath(String), along with the ability to configure Utf8Appendable instances to use a replacement character instead of always throwing an exception in the case of bad UTF-8 sequences.

* will use replacement characters instead of throwing an error.

joakime · 2023-03-03T16:48:07Z

jetty-core/jetty-util/src/test/java/org/eclipse/jetty/util/Utf8AppendableTest.java

+            buffer.append(bytes[i]);
+        }
+
+        assertEquals("ةة", buffer.toString());


This test case seems out of place for this PR.
But I thought I discovered a bug in Utf8Appendable with bidi unicode glyphs.
Turns out it works as expected.

I like this test.

gregw

I don't think we need to protect control characters.
I'm moderately sure we don't need extra protection of bad UTF-8
I'm pondering if we should ever fallback to ISO8859-1 and would like to work out why we do that or if anybody else does. Maybe time to stop it.

jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/URIUtil.java

gregw · 2023-03-03T19:31:40Z

jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/URIUtil.java

+            if (!fallbackToIso88591)
+                throw new IllegalArgumentException("Not UTF-8, not decoding in ISO-8859-1", e);


I also don't think we need this... unless an illegal UTF8 character can be decoded as / in iso8859-1, which I don't believe is possible... but even if it were, then we'd leave the / encoded and there would be nothing ambigous.

if anything, I'm dubious that we should ever fall back to iso8859-1

gregw · 2023-03-03T19:34:50Z

jetty-core/jetty-util/src/test/java/org/eclipse/jetty/util/Utf8AppendableTest.java

+            buffer.append(bytes[i]);
+        }
+
+        assertEquals("ةة", buffer.toString());


I like this test.

gregw · 2023-03-03T19:35:36Z

jetty-core/jetty-util/src/test/java/org/eclipse/jetty/util/URIUtilTest.java

+
+    @ParameterizedTest
+    @MethodSource("safeDecodePathSource")
+    public void testSafeDecodePath(String encodedPath, String decodedPath)


A good test to have, but I think you are making too many things safe.

sbordet

I'll defer to @gregw.

I think falling back to ISO 8859-1 is a thing of the past, as it seems to me that every new spec, everybody and everything is on UTF-8, so I won't do it.

sbordet · 2023-03-06T09:13:01Z

jetty-core/jetty-util/src/test/java/org/eclipse/jetty/util/URIUtilTest.java

+            Arguments.of("/foo%2fbar", "/foo%2Fbar"),
+            Arguments.of("/foo%252fbar", "/foo%252fbar"),
+            Arguments.of("/foo%3bbar", "/foo;bar"),
+            Arguments.of("/foo%3fbar", "/foo%3Fbar"),


Why were the f in %2f capitalized?

That's because of our canonical path mechanism. We produce an encoded path that can be string compared. I.e. we take out all variations, leave only characters that need encoding encoded and uniform capitalization

sbordet · 2023-03-06T09:15:37Z

jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/Utf8Appendable.java

@@ -81,6 +81,7 @@ public abstract class Utf8Appendable implements CharsetStringBuilder
    };

    private int _codep;
+    private boolean _throwOnInvalid = true;


Do we really need this?
If we keep it, it cannot be mutable, should be set as final in the constructor and never modified.

I think not. See my other PR that is taking another approach to solving this problem.

joakime · 2023-03-13T21:27:09Z

Closing as PR #9479 was the final version that was merged.

PR #9496 has some of the bits around removing the ISO-8859-1 fallback.

joakime added 2 commits March 3, 2023 10:44

Fixes #9444 - Allow configuring Utf8Appendable to not throw exception.

fcc96dd

* will use replacement characters instead of throwing an error.

Fixes #9444 - Improved URIUtil.safeDecodePath with tests

8a38518

joakime added the Jetty 12 label Mar 3, 2023

joakime requested review from gregw and sbordet March 3, 2023 16:46

joakime self-assigned this Mar 3, 2023

joakime added this to the 12.0.x milestone Mar 3, 2023

joakime linked an issue Mar 3, 2023 that may be closed by this pull request

Unexpected encoding in request.getPathInfo() with Jetty 12 beta 0 #9444

Closed

joakime commented Mar 3, 2023

View reviewed changes

gregw requested changes Mar 3, 2023

View reviewed changes

joakime added 2 commits March 3, 2023 13:52

Fixes #9444 - Remove protection of control characters

77b9a3c

Fixes #9444 - One more test case

c7f2163

sbordet reviewed Mar 6, 2023

View reviewed changes

This was referenced Mar 9, 2023

Fully decode #9444 #9465

Closed

Jetty 12.0.x 9444 servlet paths fully decoded #9479

Merged

joakime mentioned this pull request Mar 10, 2023

Jetty 12 - Remove ISO-8859-1 fallback during URIUtil.decodePath(String) #9489

Closed

joakime closed this Mar 13, 2023

joakime deleted the fix/12.0.x/safedecodepath-testing branch March 13, 2023 21:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved `URIUtil.safeDecodePath(String)` #9462

Improved `URIUtil.safeDecodePath(String)` #9462

joakime commented Mar 3, 2023

joakime Mar 3, 2023

gregw Mar 3, 2023

gregw left a comment

gregw Mar 3, 2023

gregw Mar 3, 2023

gregw Mar 3, 2023

gregw Mar 3, 2023

sbordet left a comment

sbordet Mar 6, 2023

gregw Mar 6, 2023

sbordet Mar 6, 2023

gregw Mar 6, 2023

joakime commented Mar 13, 2023

		if (!fallbackToIso88591)
		throw new IllegalArgumentException("Not UTF-8, not decoding in ISO-8859-1", e);

Improved URIUtil.safeDecodePath(String) #9462

Improved URIUtil.safeDecodePath(String) #9462

Conversation

joakime commented Mar 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbordet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joakime commented Mar 13, 2023

Improved `URIUtil.safeDecodePath(String)` #9462

Improved `URIUtil.safeDecodePath(String)` #9462