Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better URL encoding/decoding #262

Closed
tmenier opened this issue Dec 19, 2017 · 0 comments
Closed

Better URL encoding/decoding #262

tmenier opened this issue Dec 19, 2017 · 0 comments

Comments

@tmenier
Copy link
Owner

tmenier commented Dec 19, 2017

Flurl.Url has always contained a few undocumented public static methods for URL-encoding and decoding. In an effort to fix a couple reported bugs and, more generally, fix some known quirks in the .NET world, these methods have gotten an overhaul, a couple renames, and a cohesive new story, so I'm now happy to "advertise" them. :)

What the RFC says

When dealing with URL-encoding, characters fit into one of 3 categories:

  • unreserved (legal in URLs): alphanumeric and -._~
  • reserved (legal, but may have special meaning in URLs): :/?#[]@!$&'()*+,;=
  • everything else (illegal in URLs, must be encoded)

One notable special case is the % character. When used as part of a %-encoding sequence (e.g. %20 to represent a space), it is legal in the URL. Otherwise, it must be encoded.

Another thing to note is that although the RFC says nothing about encoding space characters as +, the HTML spec does specify this for URL-encoded form data, and it is also a common practice in query strings.

What .NET gives us

  • Uri.EscapeDataString is our best option for encoding both illegal and reserved characters, but it has the following shortcomings:
    • It chokes with a UriFormatException at 65,520 characters, which is a realistic problem when using it to URL-encode form data.
    • It has no option to encode space characters as +.
  • Uri.EscapeUriString is our best option for encoding illegal characters only. For example, with a string like "1 2/3 4", it'll encode the spaces for you but assumes you want to keep the / as a path separator. But it has one major quirk:
    • It always encodes the % character, even if it's proceeded by 2 hex characters, which is a %-encoded sequence and perfectly legal in a URL.
  • Uri.UnescapeDataString is our best option for URL decoding, but it too has a shortcoming:
    • It has no option to interpret + characters as spaces.
  • WebUtility.UrlEncode is our best option for...pretty much nothing.

How Flurl improves on these

Flurl sets out to replace the methods above and correct their quirks with the following static methods:

  • Url.Encode(string s, bool encodeSpaceAsPlus) encodes both illegal and reserved characters. It has no string size limit and gives you the option to encode space characters as +.
  • Url.EncodeIllegalCharacters(string s, bool encodeSpaceAsPlus) encodes illegal characters only, and will not encode % if it is part of a %-hex-hex sequence, so there is no worry of already-encoded strings getting double-encoded.
  • Url.Decode(string s, bool interpretPlusAsSpace) decodes any size string and gives you the option to decode + characters to spaces.

What's breaking?

As mentioned, Url has always had encoding/decoding methods, but with their new purpose in life, 2 have been renamed and, effectively, superseded:

  • Url.EncodeQueryParamValue is superseded by Url.Encode.
  • Url.DecodeQueryParamValue is superseded by Url.Decode.

Since these were mainly for internal use I'm hopeful this won't cause problems for most, but they were public methods so I want to fully disclose this breaking change.

@tmenier tmenier added this to the Flurl 2.6 milestone Dec 19, 2017
tmenier added a commit that referenced this issue Dec 19, 2017
@tmenier tmenier changed the title URL encoding/decoding improvements Better URL encoding/decoding Dec 19, 2017
@tmenier tmenier closed this as completed Dec 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant