Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #1076: Dialogue allows colons in http paths and query parameters #2360

Merged
merged 10 commits into from
Sep 23, 2024

Conversation

carterkozak
Copy link
Contributor

@carterkozak carterkozak commented Sep 23, 2024

This is required by some GCP APIs, where encoded colons are not respected.

Most places where we'd end up with colons in URIs are within path variable values, where we'd expect values to be handled correctly regardless of percent encoding. For the most part, percent encoding should work fine unless a server uses precise matching on uri segments, where we run into issues.

rfc3986 section 3.3 clearly allows colon as a pchar, however not all servers are necessarily fully compliant with the rfc. We've had issues with specific sub-delimiters in the past. This shouldn't be terribly risky, esepcially given this implementaiton has never allowed unencoded colons before, and we have no evidence to suggest they should be problematic.

Before this PR

After this PR

==COMMIT_MSG==
fix #1076: Dialogue allows colons in http paths and query parameters
==COMMIT_MSG==

Possible downsides?

It's possible that some servers, proxies, etc do not handle unencoded colons well in paths.

Behavior of common http clients:

Note that the default behavior of these clients doesn't necessarily mean that it's unsafe to do otherwise, however we should tread carefully around nonstandard behavior.

🔴 Python 3.9.6 urllib.parse

import urllib.parse
urllib.parse.quote('foo:bar')
'foo%3Abar'

🔴 Apache HttpComponents (httpcore-5.3) UriBuilder

System.out.println(new org.apache.hc.core5.net.URIBuilder()
        .setScheme("https")
        .setHost("localhost")
        .setPort(8443)
        .appendPathSegments(Collections.singletonList("foo:bar"))
        .toString());
// Produces: https://localhost:8443/foo%3Abar

🟢 Curl 8.10.1

caveat -- curl sends what you give it with the exception of unicode characters. Not clear that this is a reasonable test, since the raw URI is provided as an input.

curl 'http://localhost:8443/foo:bar'

GET /foo:bar HTTP/1.1
Host: localhost:8443
User-Agent: curl/8.10.1
Accept: */*

🟢 OkHttp

similar to curl, the uri is provided as an input, and it doesn't further escape colons

This is required by some GCP APIs, where encoded colons are not
respected.
[rfc3986 section 3.3](https://datatracker.ietf.org/doc/html/rfc3986#section-3.3)
clearly allows colon as a pchar, however not all servers are necessarily fully
compliant with the rfc. We've had issues with specific sub-delimiters
in the past. This shouldn't be terribly risky, esepcially given this
implementaiton has never allowed unencoded colons before, and we have
no evidence to suggest they should be problematic.
@changelog-app
Copy link

changelog-app bot commented Sep 23, 2024

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Dialogue more closely follows the URI specification as defined in rfc3986 section 3.3, and allows colons in http paths and query parameters.

Note that this is not an API break, however we're using a breaking changelog entry for visibility in case of unknown non-compliant servers.

Previously the : character would be encoded as %3A, which is also allowed by rfc3986, however not required. Some server implementations, GCP APIs in particular, require colons in path strings not to be encoded.
This encoding is an implementation detail within dialogue, where either way is valid for servers which are compliant with the rfc.

It is possible, though unlikely, that some custom servers or proxies do not handle unencoded colons correctly. Please reach out to us if you find cases where servers do not behave as expected!

Check the box to generate changelog(s)

  • Generate changelog entry

// expect the sub-delims to be relevant here in the vast majority of cases. With sufficient
// research and testing, we should incorporate relevant sub-delims into IS_P_CHAR and update this
// to: 'IS_P_CHAR.or(CharMatcher.is('/'))'.
private static final CharMatcher IS_PATH = IS_P_CHAR.or(SUB_DELIMS).or(CharMatcher.is('/'));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated IS_PATH to be derived from IS_P_CHAR so we don't have to worry about additions to IS_P_CHAR getting out of sync. This matcher is only used on the input base-url path component, not subsequent sections.

@bjlaub
Copy link
Contributor

bjlaub commented Sep 23, 2024

Shall we update the encodesQueryParams test to ensure colons are not encoded there as well? https://github.com/palantir/dialogue/blob/develop/dialogue-core/src/test/java/com/palantir/dialogue/core/UrlBuilderTest.java#L133-L138

@carterkozak
Copy link
Contributor Author

good call, updated!

@bjlaub
Copy link
Contributor

bjlaub commented Sep 23, 2024

👍 lgtm

@bulldozer-bot bulldozer-bot bot merged commit dc18fbf into develop Sep 23, 2024
6 checks passed
@bulldozer-bot bulldozer-bot bot deleted the ckozak/colons branch September 23, 2024 19:45
@autorelease3
Copy link

autorelease3 bot commented Sep 23, 2024

⚠️ Release Failed

Label releases are not allowed to increment the major version number

If you are unsure how to resolve this error, please reach out to #help-devtools for assistance and include the following information:

Error ID: crosakk1qf3g64lmmdng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

URL validation fails when absolute path contains a colon
3 participants