This STT system trained using Kaldi framework.
System contains of g2p model and kaldi training recipe.
g2p: https://github.com/kotestyle/g2p_uk
g2p model was trained separately with tf. Details in its repo.
asr model: trained on 84 hour of voxforge and librivox data.
training recipe
Language model: SRILM
Audio features: MFCC and CMVN
Acoustic model: HMM-GMM
Training: Delta+delta-delta, LDA-MLLT, SAT
Alignment: fMLLR
Model | LM order (SRILM) | train/test, hours | WAcc, % |
---|---|---|---|
mono | 2 | 1 / 0.1 | 4 % |
mono | 2 | 5 / 1 | 9 % |
Tri5 (LDA + MLLT + SAT) | 2-3 | 83 / 1 | 31.13 % |
Source | link |
---|---|
voxforge | http://www.repository.voxforge1.org/downloads/uk/ |
librivox | https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/ |
youtube (in progress) | youtube-data.xlsx |
- build Kaldi from source
- prepare voxforge data with sample notebook
- prepare librivox data with sample notebook
- prepare kaldi project with project notebook
- make changes to configs and recipe
- cross fingers and hope it will run w/o errors :-)
- Place model to appropriate folder in kaldi project
- fill config.py
- run decode_kaldi.py with file_path argument