-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation for unic-ucd-unihan #225
base: master
Are you sure you want to change the base?
Conversation
Failures are unrelated; I'll fix those as soon as I get a chance this weekend. At a glance-over, this looks good; I'll do a more detailed pass this weekend. |
Thanks for building this, @eyeplum! I'm in the process of moving source data files out of the repository, to be imported as submodules. (The download+unzip scripts will be in Python, I guess.) Having the data externally, we won't have to deal with downloading/unzipping ourselves, which would be much better for this repo. Also, allowing easier addition of other sources and models. If you like, we can rebase and try to land this work, and drop the data retrieving parts later. Or, we can just wait for the external sources and drop the data source work from this PR. What do you think? |
Sure, wait for the external sources sounds like a better option 👍 |
Ping ? |
In #247, I have added the complete Unicode UCD data package, under the new address: So, there's no Also, as a reminder, since the new source data files are imported as submodules, you need to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 50 files at r1.
Reviewable status: 3 of 51 files reviewed, 3 unresolved discussions (waiting on @eyeplum and @behnam)
.gitignore, line 4 at r2 (raw file):
*.rs.bk *.rs.rustfmt *.zip
We shouldn't need this, either, as we don't download files anymore.
data/Cargo.toml, line 22 at r2 (raw file):
# Parsing zip files in UCD zip = "0.3"
Neither these.
data/sources.toml, line 23 at r2 (raw file):
"Unihan.zip" = "Unihan.zip"
This file is gone.
Hey @behnam , thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 38 files reviewed, 3 unresolved discussions (waiting on @behnam and @eyeplum)
.gitignore, line 4 at r2 (raw file):
Previously, behnam (Behnam Esfahbod ❄) wrote…
We shouldn't need this, either, as we don't download files anymore.
Done.
data/Cargo.toml, line 22 at r2 (raw file):
Previously, behnam (Behnam Esfahbod ❄) wrote…
# Parsing zip files in UCD zip = "0.3"
Neither these.
Done.
data/sources.toml, line 23 at r2 (raw file):
Previously, behnam (Behnam Esfahbod ❄) wrote…
This file is gone.
Done.
Hey @behnam , I'm planning to merge this in this weekend. Although the functionalities are very limited at the moment, I imagine this might be somewhat useful for users that needs Unihan. I'm planning to map most of the Unihan tables later this year, hopefully I will have enough time to make it happen. As of now, before I start mapping more Unihan contents, I'm thinking about tackling the Unicode 11.0 upgrade for rust-unic first. Mainly because Unicode 12.0 is coming and I think it would be nice for us to at least update to Unicode 11.0 so future updates will be more manageable. I may need some help planning the work as there seems to be a lot involved. I will probably create a new issue so we can have more detailed discussions there. What do you think? |
The feature is named "unihan" and a user needs to opt-in explicitly to use it
Is there any news? |
Is this still being worked on? |
@asg0451, I'm not planning to merge this anytime soon, you could try it out in my fork if you are interested in Unihan https://github.com/eyeplum/rust-unic |
This is a partial implementation of
ucd-unihan
#224 .Changed areas
gen
ucd/unihan
Notes
One other thing to consider is - as Unihan is a CJK centric module in the Unicode standard - maybe we could make this crate an optional subcrate of the
rust-unic
super crate and user needs to opt-in explicitly to use it.This change is