URI Unicode handling #308

noraj · 2023-02-17T17:18:06Z

This code will trigger URI must be ascii only (URI::InvalidURIError) when Unicode is used.

roda/lib/roda/plugins/route_csrf.rb

Line 240 in 81a93bc

URI.parse(action).path

URI.parse(URI::Parser.new.escape(url)) should be used instead.

cf. https://stackoverflow.com/questions/46849219/ruby-uriinvalidurierror-uri-must-be-ascii-only/75487328

The text was updated successfully, but these errors were encountered:

jeremyevans · 2023-02-17T17:33:11Z

Thanks for the report. Seems like such a change would break existing cases where the action is already correctly percent escaped, so this wouldn't be a backwards compatible change. Therefore, automatically escaping the argument would have to be added as a option (either plugin option, method option, or both).

jeremyevans · 2023-02-19T04:06:30Z

After more thought, I don't think it's worth it to add an option for this. So I'll just update the documentation to make it clear that appropriate URL encoding is expected.

noraj · 2023-02-19T15:05:23Z

URL-encoding is something that works when you directly work with Ruby URI:

But even when the user put URL-encoded Unicode in a parameter in it's browser that's not necessarily what will come to Roda. It will depends how the web browser, reverse proxy and application server will judge to decode it or not.

noraj · 2023-02-19T15:28:38Z

There are options to handle that and without re-encoding already encoded parts:

systematically URL-decoded then URL-encode
detect if already encoded or not and encode only if it's not already or only the parts that are not

Also see my comment here ruby/webrick#110 (comment) and here ruby/webrick#110 (comment)

jeremyevans · 2023-02-19T17:35:59Z

There are options to handle that and without re-encoding already encoded parts:

systematically URL-decoded then URL-encode

detect if already encoded or not and encode only if it's not already or only the parts that are not

Both approaches are slower and prone to security issues:

Your application code submits expects reencoding (expects to pass Unicode), but an attacker submits already encoded data
Your application code doesn't expect reencoding (expects to pass properly encoded data), but an attacker finds a way to get to pass invalid data.

In general it's a bad idea for library code to make guesses as to whether to encode. It should always work in the same way. It's simpler and backwards compatible to assume it is always already properly encoded. While we could add an option to toggle the behavior, I think it's better to document the expected behavior. Users can and should make sure the URL or URL path they are passing is valid.

noraj · 2023-02-19T23:53:56Z

Upstream issue ruby/uri#40

jeremyevans closed this as completed in 365b18d Feb 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URI Unicode handling #308

URI Unicode handling #308

noraj commented Feb 17, 2023

jeremyevans commented Feb 17, 2023

jeremyevans commented Feb 19, 2023

noraj commented Feb 19, 2023

noraj commented Feb 19, 2023

jeremyevans commented Feb 19, 2023

noraj commented Feb 19, 2023

URI Unicode handling #308

URI Unicode handling #308

Comments

noraj commented Feb 17, 2023

jeremyevans commented Feb 17, 2023

jeremyevans commented Feb 19, 2023

noraj commented Feb 19, 2023

noraj commented Feb 19, 2023

jeremyevans commented Feb 19, 2023

noraj commented Feb 19, 2023