CoEDL Kaldi pipeline

A set of scripts to use in preparing a corpus for speech-to-text processing with Kaldi.

Read about setting up Docker to run all this.

For more information about data requirements, see the data guide.

Read about the tasks that can be run.

Workflow

custom_mark digraph G { f1 [label="Format 1: Elan"] f2 [label="Format 2: Transcriber"] f3 [label="Format 3: Praat"] conversion [shape="box", label="Conversion", fontsize="20"] standard [shape="box", label="Standard format. JSON file"] normalise [shape="box", label="Normalisation", fontsize="20"] norm_model [label="Normalisation rules"] pronunciation [shape="box", label="Pronunication", fontsize="20"] pron_model [label="Pronunciation rules"] kaldi [shape="box", label="Kaldi", fontsize="20"] \ f1 -> conversion f2 -> conversion f3 -> conversion conversion -> standard standard -> normalise [label="TEXT", fontcolor="green"] standard -> kaldi [label="AUDIO", fontcolor ="green"] norm_model -> normalise normalise -> pronunciation pron_model -> pronunciation pronunciation -> kaldi ;}) custom_mark

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
corpora		corpora
guides		guides
screenshots		screenshots
scripts		scripts
templates		templates
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
Taskfile.yml		Taskfile.yml
Taskvars.yml		Taskvars.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoEDL Kaldi pipeline

Workflow

About

Releases

Packages

Languages

scott--/kaldi-helpers

Folders and files

Latest commit

History

Repository files navigation

CoEDL Kaldi pipeline

Workflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages