Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url::from_str fails on strings containing non-ascii #8486

Closed
metajack opened this issue Aug 13, 2013 · 4 comments · Fixed by #16076
Closed

url::from_str fails on strings containing non-ascii #8486

metajack opened this issue Aug 13, 2013 · 4 comments · Fixed by #16076
Labels
A-Unicode Area: Unicode

Comments

@metajack
Copy link
Contributor

Test case:

badpath.rs:

extern mod extra;

use std::os;
use extra::url;

fn main() {
    let p = ~"../foo";
    let p = ~"file://" + os::getcwd().push(p).to_str();
    let u = url::from_str(p);
    printfln!("%?", u);
}

to reproduce:

mkdir 例; cd 例; ../badpath

results:

Err(~"Invalid character in path.")

Originally reported as servo/servo#722

@daniel-dressler
Copy link

Thank you for opening this jack.

Assuming my understanding of the abstraction is correct I'm inclined is to make two fixes, one here in rust and one in servo. In rust from_str() should fail with a better error message if the str is missing a defined scheme. In servo the scheme should be autodetected with heuristics matching firefox's (so file: if substring(0,1) == '/').

An alternative approach would be for from_str() to perform the autodetection, would that be useful or frustrating?

Sorry Jack, I appear to be failing at keeping my thought train in one bug report.

@daniel-dressler
Copy link

Update on my search: I was wrong about this being a servo issue. Rust does not like any url with unicode. The url's unicode must be encoded before hitting DNS but never for locla files.

My understanding is thus: Rust should allow unicode urls but encode if sheme is not file:

@daniel-dressler
Copy link

I am not sure how to handle unicode. It appears my general issue is URLs are ANSI only and it is IRIs (https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier) which provide unicode support.

So in theory the URL class should remain ANSI only and a second IRI class should exist that wraps URL and performs encoding. Yet that might just confuse developers who will default to using url.

So the perfect solution should provide unicode support without developers paying attention. Current take might be to extend the URL class to act as IRI. I'll read the IRI RFC and assuming no one has any objections I'll extend the URL class to be an IRI class, no name change of course.

@SimonSapin
Copy link
Contributor

The URL Standard describes how to parse an URL containing non-ASCII characters, and what to do for the various parts of the parsed URL (IDNA/punnycode, UTF-8 + percent-encoding, etc)

@pzol pzol added the A-unicode label Feb 26, 2014
SimonSapin added a commit to SimonSapin/rust that referenced this issue Jul 30, 2014
bors added a commit that referenced this issue Jul 31, 2014
The replacement is [rust-url](https://github.com/servo/rust-url), which can be used with Cargo.

Fix #15874
Fix #10707
Close #10706
Close #10705
Close #8486
flip1995 pushed a commit to flip1995/rust that referenced this issue Jun 30, 2022
Fix `let_undescore_lock` false-positive when binding without locking

Fixes rust-lang#8486.

changelog: Fix `let_undescore_lock` false-positive when binding without locking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Unicode Area: Unicode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants