Skip to content

scott--/kaldi-helpers

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoEDL Kaldi pipeline

A set of scripts to use in preparing a corpus for speech-to-text processing with Kaldi.

Read about setting up Docker to run all this.

For more information about data requirements, see the data guide.

Read about the tasks that can be run.

Workflow

Kaldi pipeline

custom_mark digraph G { f1 [label="Format 1: Elan"] f2 [label="Format 2: Transcriber"] f3 [label="Format 3: Praat"] conversion [shape="box", label="Conversion", fontsize="20"] standard [shape="box", label="Standard format. JSON file"] normalise [shape="box", label="Normalisation", fontsize="20"] norm_model [label="Normalisation rules"] pronunciation [shape="box", label="Pronunication", fontsize="20"] pron_model [label="Pronunciation rules"] kaldi [shape="box", label="Kaldi", fontsize="20"] \ f1 -> conversion f2 -> conversion f3 -> conversion conversion -> standard standard -> normalise [label="TEXT", fontcolor="green"] standard -> kaldi [label="AUDIO", fontcolor ="green"] norm_model -> normalise normalise -> pronunciation pron_model -> pronunciation pronunciation -> kaldi ;}) custom_mark

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 77.6%
  • Shell 22.4%