Better URL encoding/decoding #262

tmenier · 2017-12-19T17:16:54Z

Flurl.Url has always contained a few undocumented public static methods for URL-encoding and decoding. In an effort to fix a couple reported bugs and, more generally, fix some known quirks in the .NET world, these methods have gotten an overhaul, a couple renames, and a cohesive new story, so I'm now happy to "advertise" them. :)

What the RFC says

When dealing with URL-encoding, characters fit into one of 3 categories:

unreserved (legal in URLs): alphanumeric and -._~
reserved (legal, but may have special meaning in URLs): :/?#[]@!$&'()*+,;=
everything else (illegal in URLs, must be encoded)

One notable special case is the % character. When used as part of a %-encoding sequence (e.g. %20 to represent a space), it is legal in the URL. Otherwise, it must be encoded.

Another thing to note is that although the RFC says nothing about encoding space characters as +, the HTML spec does specify this for URL-encoded form data, and it is also a common practice in query strings.

What .NET gives us

Uri.EscapeDataString is our best option for encoding both illegal and reserved characters, but it has the following shortcomings:
- It chokes with a UriFormatException at 65,520 characters, which is a realistic problem when using it to URL-encode form data.
- It has no option to encode space characters as +.
Uri.EscapeUriString is our best option for encoding illegal characters only. For example, with a string like "1 2/3 4", it'll encode the spaces for you but assumes you want to keep the / as a path separator. But it has one major quirk:
- It always encodes the % character, even if it's proceeded by 2 hex characters, which is a %-encoded sequence and perfectly legal in a URL.
Uri.UnescapeDataString is our best option for URL decoding, but it too has a shortcoming:
- It has no option to interpret + characters as spaces.
WebUtility.UrlEncode is our best option for...pretty much nothing.

How Flurl improves on these

Flurl sets out to replace the methods above and correct their quirks with the following static methods:

Url.Encode(string s, bool encodeSpaceAsPlus) encodes both illegal and reserved characters. It has no string size limit and gives you the option to encode space characters as +.
Url.EncodeIllegalCharacters(string s, bool encodeSpaceAsPlus) encodes illegal characters only, and will not encode % if it is part of a %-hex-hex sequence, so there is no worry of already-encoded strings getting double-encoded.
Url.Decode(string s, bool interpretPlusAsSpace) decodes any size string and gives you the option to decode + characters to spaces.

What's breaking?

As mentioned, Url has always had encoding/decoding methods, but with their new purpose in life, 2 have been renamed and, effectively, superseded:

Url.EncodeQueryParamValue is superseded by Url.Encode.
Url.DecodeQueryParamValue is superseded by Url.Decode.

Since these were mainly for internal use I'm hopeful this won't cause problems for most, but they were public methods so I want to fully disclose this breaking change.

The text was updated successfully, but these errors were encountered:

tmenier added breaking enhancement labels Dec 19, 2017

tmenier added this to the Flurl 2.6 milestone Dec 19, 2017

tmenier added a commit that referenced this issue Dec 19, 2017

#262 better Url.Decode

350173a

tmenier changed the title ~~URL encoding/decoding improvements~~ Better URL encoding/decoding Dec 19, 2017

tmenier closed this as completed Dec 19, 2017

NathanTurnbow mentioned this issue Jan 5, 2018

Update all nuget packages to latest versions available jordansjones/Draft#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better URL encoding/decoding #262

Better URL encoding/decoding #262

tmenier commented Dec 19, 2017 •

edited

Loading

Better URL encoding/decoding #262

Better URL encoding/decoding #262

Comments

tmenier commented Dec 19, 2017 • edited Loading

What the RFC says

What .NET gives us

How Flurl improves on these

What's breaking?

tmenier commented Dec 19, 2017 •

edited

Loading