Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #1076: Dialogue allows colons in http paths and query parameters #2360

Merged
merged 10 commits into from
Sep 23, 2024
13 changes: 13 additions & 0 deletions changelog/@unreleased/pr-2360.v2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
type: break
break:
description: |-
Dialogue more closely follows the URI specification as defined in [rfc3986 section 3.3](https://datatracker.ietf.org/doc/html/rfc3986#section-3.3), and allows colons in http paths and query parameters.

Note that this is not an API break, however we're using a breaking changelog entry for visibility in case of unknown non-compliant servers.

Previously the `:` character would be encoded as `%3A`, which is also allowed by rfc3986, however not required. Some server implementations, GCP APIs in particular, require colons in path strings not to be encoded.
This encoding is an implementation detail within dialogue, where either way is valid for servers which are compliant with the rfc.

It is possible, though unlikely, that some custom servers or proxies do not handle unencoded colons correctly. Please reach out to us if you find cases where servers do not behave as expected!
links:
- https://github.com/palantir/dialogue/pull/2360
Original file line number Diff line number Diff line change
Expand Up @@ -239,16 +239,37 @@ public String toString() {
static class UrlEncoder {
private static final CharMatcher DIGIT = CharMatcher.inRange('0', '9');
private static final CharMatcher ALPHA = CharMatcher.inRange('a', 'z').or(CharMatcher.inRange('A', 'Z'));

// unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
private static final CharMatcher UNRESERVED = DIGIT.or(ALPHA).or(CharMatcher.anyOf("-._~"));

// sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
private static final CharMatcher SUB_DELIMS = CharMatcher.anyOf("!$&'()*+,;=");
private static final CharMatcher IS_HOST = UNRESERVED.or(SUB_DELIMS);
private static final CharMatcher IS_P_CHAR = UNRESERVED;
private static final CharMatcher IS_PATH = UNRESERVED.or(SUB_DELIMS).or(CharMatcher.anyOf("/"));

// The RFC permits percent-encoding any character. We also percent encode sub-delimiters to avoid
// incompatibilities with http specification beyond the general URI definition per
// https://tools.ietf.org/html/rfc3986#section-3.3
// > URI producing applications often use the reserved characters allowed in a segment to
// > delimit scheme-specific or dereference-handler-specific subcomponents.

// If memory serves, we've encountered issues with '+' being interpreted by some servers as an
// encoded space, and ';' in paths being interpreted as the beginning of http matrix parameters.

// pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
// A strict implementation of the rfc allow 'UNRESERVED.or(CharMatcher.anyOf(":@"))',
// however we have updated it from 'UNRESERVED' to allow colons, and would like to
// make changes as slowly as possible.
private static final CharMatcher IS_P_CHAR = UNRESERVED.or(CharMatcher.is(':'));

// For historical reasons, we allow sub-delims in the base uri if provided directly,
// but escape them in paths we create/encode. Base-uri paths tend to be simple, so I wouldn't
// expect the sub-delims to be relevant here in the vast majority of cases. With sufficient
// research and testing, we should incorporate relevant sub-delims into IS_P_CHAR and update this
// to: 'IS_P_CHAR.or(CharMatcher.is('/'))'.
private static final CharMatcher IS_PATH = IS_P_CHAR.or(SUB_DELIMS).or(CharMatcher.is('/'));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated IS_PATH to be derived from IS_P_CHAR so we don't have to worry about additions to IS_P_CHAR getting out of sync. This matcher is only used on the input base-url path component, not subsequent sections.


// query = *( pchar / "/" / "?" )
private static final CharMatcher IS_QUERY_CHAR = IS_P_CHAR.or(CharMatcher.anyOf("/?"));

static boolean isHost(String maybeHost) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,13 @@ public void encodesPaths() throws Exception {
.build()
.toString())
.isEqualTo("http://host:80/%21%40%23%24%25%5E%26%2A%28%29_%2B%7B%7D"
+ "%5B%5D%7C%5C%7C%22%27%3A%3B%2F%3F.%3E%2C%3C~%60");
+ "%5B%5D%7C%5C%7C%22%27:%3B%2F%3F.%3E%2C%3C~%60");
assertThat(minimalUrl()
.pathSegments(List.of("!@#$%^&*()_+{}", "[]|\\|\"':;/?.>,<~`"))
.build()
.toString())
.isEqualTo("http://host:80/%21%40%23%24%25%5E%26%2A%28%29_%2B%7B%7D"
+ "/%5B%5D%7C%5C%7C%22%27%3A%3B%2F%3F.%3E%2C%3C~%60");
+ "/%5B%5D%7C%5C%7C%22%27:%3B%2F%3F.%3E%2C%3C~%60");
}

@Test
Expand Down Expand Up @@ -135,6 +135,7 @@ public void encodesQueryParams() throws Exception {
assertThat(minimalUrl().queryParam("foo", "bar").build().toString()).isEqualTo("http://host:80?foo=bar");
assertThat(minimalUrl().queryParam("question?&", "answer!&").build().toString())
.isEqualTo("http://host:80?question?%26=answer%21%26");
assertThat(minimalUrl().queryParam("q", ":").build().toString()).isEqualTo("http://host:80?q=:");
}

@Test
Expand Down Expand Up @@ -190,7 +191,24 @@ public void urlEncoder_encodeQuery_onlyEncodesNonReservedChars() {

@Test
public void urlEncoder_encodeQuery_encodesPlusSign() {
assertThat(BaseUrl.UrlEncoder.encodeQueryNameOrValue("+")).isEqualTo("%2B");
assertThat(BaseUrl.UrlEncoder.encodeQueryNameOrValue("+"))
.as("'+' must be encoded, otherwise many servers will interpret it as a url encoded space")
.isEqualTo("%2B");
}

@Test
public void urlEncoder_encodePath_encodesSemicolon() {
assertThat(BaseUrl.UrlEncoder.encodePathSegment(";"))
.as("';' must be encoded, otherwise many servers will "
+ "interpret as a delimiter for matrix parameters")
.isEqualTo("%3B");
}

@Test
public void urlEncoder_encodePath_allowsColon() {
assertThat(BaseUrl.UrlEncoder.encodePathSegment(":"))
.as("Several GCP APIs use ':' within paths, and do not handle encoded colons")
.isEqualTo(":");
}

@Test
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,15 @@ public void encodesPathParameters() throws InterruptedException {
assertThat(server.takeRequest().getRequestUrl()).isEqualTo(server.url("/%2F%C3%BC%2F"));
}

@Test
public void allowsColonPathParameter() throws InterruptedException {
endpoint.renderPath = (_params, url) -> url.pathSegment("foo:bar");
channel.execute(endpoint, request);
assertThat(server.takeRequest().getRequestUrl())
.as("Several GCP APIs require colons in url paths")
.isEqualTo(server.url("/foo:bar"));
}

@Test
public void fillsHeaders() throws Exception {
request = Request.builder()
Expand Down
Loading