g2p_uk

This repository contains grapheme2phoneme Ukrainian model.

This is sequence-to-sequence model with 1024 GRU and 10 Attention units

Model was trained on 160k of normalized ukrainian words with their phone transcription.
Train set -- 152k, val set -- 8k

For model evaluation was used Word Accuracy metric (WAcc). For training we used different scenarios:

Removing stress from the training and val data. It means that we are trying to predict phonemes w/o effect of stressed letters.
Keeping stressed letters in the training and val data. In this case we are trying to predict not only the phoneme of the word, but also the position of stress in the sequence.
Keeping stressed letters only in the training data. Here we train model on the phonemes that contains stressed letters, but for validation we ignoring the position on the stress. It allows us to include stress as a feature, but ignore it's impact on the final result.
Removing stress and simplifying phonemes a^y -> a in all data.

Example 1:

Example 2:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
app		app
data		data
notebooks		notebooks
tf_checkpoints		tf_checkpoints
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback