Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI Unicode handling #308

Closed
noraj opened this issue Feb 17, 2023 · 6 comments
Closed

URI Unicode handling #308

noraj opened this issue Feb 17, 2023 · 6 comments

Comments

@noraj
Copy link

noraj commented Feb 17, 2023

This code will trigger URI must be ascii only (URI::InvalidURIError) when Unicode is used.

URI.parse(action).path

URI.parse(URI::Parser.new.escape(url)) should be used instead.

cf. https://stackoverflow.com/questions/46849219/ruby-uriinvalidurierror-uri-must-be-ascii-only/75487328

@jeremyevans
Copy link
Owner

Thanks for the report. Seems like such a change would break existing cases where the action is already correctly percent escaped, so this wouldn't be a backwards compatible change. Therefore, automatically escaping the argument would have to be added as a option (either plugin option, method option, or both).

@jeremyevans
Copy link
Owner

After more thought, I don't think it's worth it to add an option for this. So I'll just update the documentation to make it clear that appropriate URL encoding is expected.

@noraj
Copy link
Author

noraj commented Feb 19, 2023

URL-encoding is something that works when you directly work with Ruby URI:

image

But even when the user put URL-encoded Unicode in a parameter in it's browser that's not necessarily what will come to Roda. It will depends how the web browser, reverse proxy and application server will judge to decode it or not.

@noraj
Copy link
Author

noraj commented Feb 19, 2023

There are options to handle that and without re-encoding already encoded parts:

  • systematically URL-decoded then URL-encode
  • detect if already encoded or not and encode only if it's not already or only the parts that are not

Also see my comment here ruby/webrick#110 (comment) and here ruby/webrick#110 (comment)

@jeremyevans
Copy link
Owner

There are options to handle that and without re-encoding already encoded parts:

  • systematically URL-decoded then URL-encode
  • detect if already encoded or not and encode only if it's not already or only the parts that are not

Both approaches are slower and prone to security issues:

  1. Your application code submits expects reencoding (expects to pass Unicode), but an attacker submits already encoded data
  2. Your application code doesn't expect reencoding (expects to pass properly encoded data), but an attacker finds a way to get to pass invalid data.

In general it's a bad idea for library code to make guesses as to whether to encode. It should always work in the same way. It's simpler and backwards compatible to assume it is always already properly encoded. While we could add an option to toggle the behavior, I think it's better to document the expected behavior. Users can and should make sure the URL or URL path they are passing is valid.

@noraj
Copy link
Author

noraj commented Feb 19, 2023

Upstream issue ruby/uri#40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants