Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Include ':' and '@' in pchar definition for url encoding #2298

Closed
wants to merge 1 commit into from

Conversation

fawind
Copy link

@fawind fawind commented Jun 10, 2024

Before this PR

Putting this up for potential discussion - not sure if we actually want to make this change.

Context: We ran into the case where requests of a dialogue client get rejected by google-container-registry because dialogue would url encode the colon : in path segments (e.g. sha256:c48bxxx) while GCR only accepts non-encoded : in path segments.

Looking into Dialogue's url encoding, I noticed that Dialogue's implementation doesn't fully match the referenced RFC-3986. Most notably, Dialogue is defining the pchar matcher as pchar = unreserved, while the RFC is a bit more permissive here and also includes sub-delims, :, and @:

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

Note that we have another explicit divergence for query params but this one is well documented and for compatibility reasons:

// The RFC permits percent-encoding any character. We also percent encode sub-delimiters to avoid
// incompatibilities with http specification beyond the general URI definition per
// https://tools.ietf.org/html/rfc3986#section-3.3
// > URI producing applications often use the reserved characters allowed in a segment to
// > delimit scheme-specific or dereference-handler-specific subcomponents.

Unclear points:

  • In the RFC, pchar also includes sub-delims. But given the comment above, it seems like we want to purposfully encode sub-delims?
  • This logic has been around since early 2019 (PR). Given this never came up as an issue, maybe we don't feel like its worth touching this code?
  • This is just a spec, and it's hard for me to judge the impact of such a change across all the consumers.

After this PR

Extend the pchar matcher to also include : and @. This will result in those characters no longer being url encoded in path segments.

==COMMIT_MSG==
Include ':' and '@' in pchar definition for url encoding
==COMMIT_MSG==

Possible downsides?

@changelog-app
Copy link

changelog-app bot commented Jun 10, 2024

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Include ':' and '@' in pchar definition for url encoding

Check the box to generate changelog(s)

  • Generate changelog entry

@schlosna
Copy link
Contributor

Also relevant in RFC 3986 Section 2.4. When to Encode or Decode which seems to indicate that it should be valid to percent-encode the colon in the path component part, though it would not be required as it is not an unreserved character

Under normal circumstances, the only time when octets within a URI
are percent-encoded is during the process of producing the URI from
its component parts. This is when an implementation determines which
of the reserved characters are to be used as subcomponent delimiters
and which can be safely used as data. Once produced, a URI is always
in its percent-encoded form.

When a URI is dereferenced, the components and subcomponents
significant to the scheme-specific dereferencing process (if any)
must be parsed and separated before the percent-encoded octets within
those components can be safely decoded, as otherwise the data may be
mistaken for component delimiters. The only exception is for
percent-encoded octets corresponding to characters in the unreserved
set, which can be decoded at any time. For example, the octet
corresponding to the tilde ("") character is often encoded as "%7E"
by older URI processing implementations; the "%7E" can be replaced by
"
" without changing its interpretation.

Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" for that
octet to be used as data within a URI. Implementations must not
percent-encode or decode the same string more than once, as decoding
an already decoded string might lead to misinterpreting a percent
data octet as the beginning of a percent-encoding, or vice versa in
the case of percent-encoding an already percent-encoded string.

@carterkozak
Copy link
Contributor

We attempt to aggressively encode parameters because not all server implementations implement the same spec (or do so correctly). For most known server implementations, this produces slightly more verbose, but less ambiguous results.

It's possible that this proposal wouldn't harm compatibility with known webservers, but that's difficult to know ahead of time

@carterkozak
Copy link
Contributor

We're planning to partially implement this in #2360 as it has come up a few more times. Only for colon, not @ (not quite yet, anyhow).

@fawind
Copy link
Author

fawind commented Sep 23, 2024

Nice! Looking forward to deleting my forked class 🎉

Will close this in favor of the other PR. Thanks for pushing this through!

@fawind fawind closed this Sep 23, 2024
@fawind fawind deleted the fw/extend-pchar-spec branch September 23, 2024 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants