Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDNA support #119

Merged
merged 12 commits into from
Jan 18, 2016
Merged

IDNA support #119

merged 12 commits into from
Jan 18, 2016

Conversation

valenting
Copy link
Collaborator

I wrote a UTS 46 implementation based on http://intertwingly.net/projects/pegurl/idna.js
It needs a fast path for ascii codepoints, but I wanted to get some feedback on it first.

Review on Reviewable

# except according to those terms.


# Run as: python make_idna_table.py idna_table.txt > src/idna_table.rs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is idna_table.txt provided by Unicode? Where can it be found?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this URL in a code comment.

@SimonSapin
Copy link
Member

Could you also integrate the tests form http://www.unicode.org/Public/idna/latest/IdnaTest.txt ?


I’m considering rewriting the Python scripts to Rust and running them as Cargo build scripts to avoid having generated files in the repository. However in Gecko it’s not as easy if you’re not using Cargo. It would be a separate .rs file to be compiled for the host (does Gecko do cross-compilation?), run with an OUT_DIR environment variable, and have that run be a dependency of building the url crate, which should be done with the same environment variable.

How painful is it to do in Gecko? If it would make your like more difficult I can stick to Python and generated files in the repository.

@valenting
Copy link
Collaborator Author

It would be a bit harder to have the Gecko build system run the rust files at build time, but at the moment that's not a blocker. We can include the the generated rust-generated files just as we do with the python generated ones at the moment.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #125) made this pull request unmergeable. Please resolve the merge conflicts.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #129) made this pull request unmergeable. Please resolve the merge conflicts.

@SimonSapin SimonSapin mentioned this pull request Dec 5, 2015
@SimonSapin
Copy link
Member

See #152

@valenting
Copy link
Collaborator Author

I've implemented the BIDI checks, and now all of the tests pass (except the contextJ tests, which client software isn't required to pass).
For combining_mark I call into unicode_normalization::char::canonical_combining_class - which I fear might not map correctly to General_Category=Mark. The unicode-normalization table generator has this data, but doesn't use it:
https://github.com/unicode-rs/unicode-normalization/blob/master/scripts/unicode.py#L92

@SimonSapin
Copy link
Member

Other than a few nits I think this is ready to land. Thanks again!

@SimonSapin
Copy link
Member

@bors-servo r+

@SimonSapin
Copy link
Member

Not sure what’s happening with homu, but the Travis build for the PR push is green.

SimonSapin added a commit that referenced this pull request Jan 18, 2016
@SimonSapin SimonSapin merged commit d6e1ac5 into servo:master Jan 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants