Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add %-encoding in ClientRequest.init, remove in HTTPServerRequest.init #171

Merged
merged 1 commit into from
Mar 14, 2019

Conversation

pushkarnk
Copy link
Contributor

@pushkarnk pushkarnk commented Feb 20, 2019

This is related to a test failure in CouchDB: Kitura/Kitura-CouchDB#91

During the initialisation of ClientRequest, the path component, which, in ClientRequest, includes query component, needs to be percent encoded. On the other side, during the initialisation of an HTTPServerRequest, we'd need to remove the percent encoding.

ClientRequest.Options.path includes the fragment, path and query components. The only character which is permissible in a fragment and a query but not in a path is ?. To avoid further parsing here, we could go with percent-encoding a ? in the path string too.

A word on encoding ?
This does not pertain to the ? as in: https://66o.tech/path?q=v. This pertains to a ? as in: https://66o.tech/pa?th?q=?v which after %-encoding will become https://66o.tech/pa%3Fth?q=%3Fv, though https://66o.tech/pa%3Fth?q=?v is permissible per the definition of NSCharacterSet.urlQueryAllowed.

In the proposed change, the ? character will not be percent-encoded in the path (fragment, path and query), though it should be encoded in the "path". I'm assuming that paths like "/this?path" are rare in occurrence. Percent encoding '?' will make parsing ClientRequest.path all the more complicated and we may want to avoid it until it is explicitly asked for.

The fact that ClientRequest.Options.path includes path and query is a design weakness in this context.

Other components like host and user may also be percent-encoded, but are those cases really common?

@codecov-io
Copy link

codecov-io commented Feb 20, 2019

Codecov Report

Merging #171 into master will not change coverage.
The diff coverage is 50%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #171   +/-   ##
=======================================
  Coverage   68.64%   68.64%           
=======================================
  Files          20       20           
  Lines        1298     1298           
=======================================
  Hits          891      891           
  Misses        407      407
Flag Coverage Δ
#KituraNet 68.64% <50%> (ø) ⬆️
Impacted Files Coverage Δ
Sources/KituraNet/ClientRequest.swift 74.64% <0%> (ø) ⬆️
Sources/KituraNet/HTTP/HTTPServerRequest.swift 73.29% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a54fffe...00583e0. Read the comment docs.

@pushkarnk
Copy link
Contributor Author

@djones6 Could you please review? There are three caveats here:

  1. ClientRequest.Options.path includes path and query. Query allows ?, path doesn't. If we decide to %-encode ?, parsing the ClientRequest.Options.path to get the path and query might get challenging. If we do not encode ? we cannot support paths like /with?question, which I assume are rare. Is this acceptable for now?

  2. The host and user components may also have unsupported characters. Do you think I must include their %-encoding in this PR?

  3. Finally, adding and removing %-encoding will hit performance with Swift 5.

@djones6
Copy link
Contributor

djones6 commented Mar 5, 2019

@pushkarnk How does ClientRequest in Kitura-net handle this? (is it handled for us by curl?)
I couldn't find code (or tests) in Kitura-net relating to this, but I'm presuming it must be handled somehow given that the CouchDB tests are passing.

To answer your questions above:

  1. I think it's acceptable for now to not support embedding ? within the path. We should support it if/when ClientRequest is redesigned to have a separate path and query component.

  2. It seems very unlikely that a hostname would include disallowed characters. Maybe a username might, but the field that seems absolutely likely to contain weird characters is password. I can't see any special handling of that field currently.

Another question would be: should we instead by transmitting the credentials in an Authorization header? I gather that url encoded authorization is strongly discouraged, for reasons such as inadvertent disclosure / leakage eg. if the URL is logged or stored somewhere along the way. Doing so would avoid the need to url encode (as the username:password value is base64 encoded).

  1. I don't think there's a way to avoid the performance hit: escaping is fairly costly, but it's required for correctness, unless we require the user to escape values. (We also must be careful not to double-escape). In terms of the amount of performance cost, the whole process of making an HTTP request and parsing a response is already a fairly expensive operation, so I suspect the cost of escaping the query string won't be significant.

@pushkarnk
Copy link
Contributor Author

Thank for the review @djones6 ... To answer the two questions you asked:

  1. Yes, with Kitura-net, libcurl handles adding and dropping %-encoding.

  2. Yes, if username/password are supplied as ClientRequest.Options we could create an authorisation header instead of encoding them in the URL. And we'll have to do that on Kitura-net too. But it is going to be a breaking change (as in, users will start seeing a different value for ClientRequest.url).

if urlStringPercentEncodingRemoved == nil {
urlStringPercentEncodingRemoved = rawURLString.removingPercentEncoding ?? rawURLString
}
return urlStringPercentEncodingRemoved!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this code is correct, but could we use guard let here so that we don't need the force-unwrap?

Copy link
Contributor Author

@pushkarnk pushkarnk Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djones6 Using guard-let and not force-unwrapping it can look only as good as this:

    private var _urlString: String {
        guard let urlStringPercentEncodingRemoved = self.urlStringPercentEncodingRemoved else {
            self.urlStringPercentEncodingRemoved = rawURLString.removingPercentEncoding ?? rawURLString
            return rawURLString.removingPercentEncoding ?? rawURLString
        }   
        return urlStringPercentEncodingRemoved
    }

Copy link
Contributor

@djones6 djones6 Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

            let result = rawURLString.removingPercentEncoding ?? rawURLString
            self.urlStringPercentEncodingRemoved = result
            return result

It's important not to run removingPercentEncoding twice as it is quite expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks @djones6 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants