This repository contains grapheme2phoneme Ukrainian model.
This is sequence-to-sequence model with 1024 GRU and 10 Attention units
Model was trained on 160k of normalized ukrainian words with their phone transcription.
Train set -- 152k, val set -- 8k
For model evaluation was used Word Accuracy metric (WAcc). For training we used different scenarios:
- Removing stress from the training and val data. It means that we are trying to predict phonemes w/o effect of stressed letters.
- Keeping stressed letters in the training and val data. In this case we are trying to predict not only the phoneme of the word, but also the position of stress in the sequence.
- Keeping stressed letters only in the training data. Here we train model on the phonemes that contains stressed letters, but for validation we ignoring the position on the stress. It allows us to include stress as a feature, but ignore it's impact on the final result.
- Removing stress and simplifying phonemes ay -> a in all data.
scenario | train WAcc, % | val WAcc, % |
---|---|---|
1 | - | 87.10 |
2 | 87.23 | 83.45 |
3 | 92.46 | 90.60 |
4 | 99.4 | 99.0 |
Example 1:
Example 2: