-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update japan_dict.txt #13142
base: main
Are you sure you want to change the base?
Update japan_dict.txt #13142
Conversation
Update japan_dict.txt to include missing jouyou kanji
|
Will this affect the previous jp model? |
It is my understanding this file is only used when training a new model, so I don't think it would have any effect until a new model is trained, but there is a chance I misunderstood everything wrong and it works in a different way 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since they use the same character dictionary, using the previous model may result in inconsistent output, and I think it would be better to use a different filename (e.g., ja_ext_dict.txt).
If this file can indeed affect the current model, I would propose to hold this PR until a new version of the japanese model is going to be trained. I don't like too much the idea of creating a new file and called it "extended" because this is not really extending the dictionary, is fixing a fault in it. |
@madmalkav, That makes sense. |
Can I ask why this similar PR for another language was already merged? What is the difference with my PR? I want to understand to see if I can do something else to move this forward. |
I don't think that PR being merged will work properly with the previous model. |
Maybe we could try to keep the version information of the character dictionary (a new column in the model list is used to show the dictionary used by the model), so that new character dictionaries can be merged in, and old models use old character dictionaries.
|
Update japan_dict.txt to include missing jouyou kanji ( #12940 )