-
Title explain my issue. What should i do?Can it learn with Arabic letters? Any suggestion or experience about it? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
You can finetune one of the sal or English models by following the fine-tuning tutorial. Let us know if you have issues |
Beta Was this translation helpful? Give feedback.
-
A response from Riva Team member Use UTF-8. There are already UTF-8 characters in the n-gram, vocab, and lexicon files even for english :) I recommend you consider training against a normalized unicode to reduce the facts the network needs to learn. You can think of normalization as a tool to ensure the same glyph always gets the same encoding. In particular NFC will merge the diacritic with the character (for a short more direct encoding). Such as عً which is one codepoint (several bytes) in NFC, and two codepoints in NFD (the diacritic and the base character separate).
There are plenty of examples of language models trained directly on even un-normalized UTF-8. |
Beta Was this translation helpful? Give feedback.
A response from Riva Team member
Use UTF-8. There are already UTF-8 characters in the n-gram, vocab, and lexicon files even for english :)
I recommend you consider training against a normalized unicode to reduce the facts the network needs to learn. You can think of normalization as a tool to ensure the same glyph always gets the same encoding. In particular NFC will merge the diacritic with the character (for a short more direct encoding). Such as عً which is one codepoint (several bytes) in NFC, and two codepoints in NFD (the diacritic and the base character separate).
There are plenty of examples of language models trained dir…