-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
option to save as a dictionary instead of a list #233
Comments
This is a much more useful structure, and also unifies the file structure with the augmentation file. I've opened a ticket with unihan_etl asking to add dictionary structuring as an option: cihai/unihan-etl#233.
@garfieldnate I missed this message! Sorry about that! Is there anything I can do at this time? Looks like you have stuff going on here https://github.com/garfieldnate/uniunihan-db |
Thanks for noticing :D I obviously have a workaround already, but I do still think that a |
@garfieldnate We can add it, and also make it available via Python API |
In the most recent unihan_etl the code I pasted above fails with this error. Not sure if my usage of the API is wrong or if there's an issue in the library.
|
@garfieldnate Thank you! Does wiping cache and the DB file and rerunning change anything? |
That was a really fast response :D This is actually my bad; the latest unihan_etl already has a fix for this in place, and I mistakenly thought I had updated. The issue is a typo in the kRSUnicode field for 亀: https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=%E4%BA%80. It has two apostrophes, which does not follow the syntax specified in the standard. unihan_etl has already updated its parsing to allow the second apostrophe. I did have to update my code for some unihan_etl changes, but nothing crazy. |
@garfieldnate Thank you for the added information. I created an issue in case anyone bumps into this issue to let them know updating works! |
I have found that I always need to convert the data into a dictionary (instead of the default list) when I'm using it. Because of this, I decided to always store the file in dictionary format. My method for doing so is a bit hacky, and it would be great to have a
--structure <dict|list>
or even--dictionary
parameter to do this within unihan_etl.Here's my current code. It relies on the undocumented
python
formatting option:The text was updated successfully, but these errors were encountered: