Installation

git clone the kaldi repository and follow the instructions written in INSTALL file git clone https://github.com/kaldi-asr/kaldi.git
Go to the cloned directory and then to tools cd kaldi/tools
To check the prerequisites for Kaldi, run the following command. ./extras/check_dependencies.sh

And see if there are any packages you need to install. And Make sure you get following output after running the above command: ./extras/check_dependencies.sh: all OK.

For more info please visit Software required to install and run Kaldi
Then build the programs using make command make -j 4

Note : here 4 is the number of parallel jobs
Install the irstlm sudo ./extras/install_irstlm.sh

Note : you can also install this later
Build the programs in src cd ../src ./configure --shared make depend -j 4 make -j 4

Note : this build can take a while

Required Files, Folders and Scripts

make a directory where you will run the experiment, lets say asr_lab mkdir /home/<user>/asr_lab
copy the conf local utils steps cmd.sh path.sh from kalid/egs/wsj/s5/ to /home//asr_lab/ cd kaldi/egs/wsj/s5 cp -r conf local utils steps cmd.sh path.sh ../../../../asr_lab
Scripts needed: create_lm.sh (to create the language model) myrun.sh (to train the HMM GMM model) online_speech1.sh (to decode the audio from HMM GMM model)[optioanl] make_graph.sh (to create or update the graph for GMM HMM)[optioanl] make_graph_nnet3.sh (to create or update the graph for DNN) run_dnn.sh (to train the DNN model)[optional] online_speech_DNN.sh (to decode the audio from DNN model)[optional] easy-kaldi.sh (for data prepration)[optional] lm-tool_2 (folder for lexicon expension)[optional]

Note : you can copy some of the scripts from here here
After copying, /home/<user>/asr_lab should contains following files and folders and scripts: cmd.sh conf create_lm.sh local make_graph.sh myrun.sh online_speech1.sh path.sh steps utils easy-kaldi.sh

DATA Preparation

Split your data into train and test data which contains wav files (.wav) and its equivalent transcripted files (.txt).
- make the folder wav. Inside wav make train_data and test_Data mkdir wav wav/train_data wav/test_data
- copy the splitted training data into the train_data folder and splitted testing data into the test_data folder.
  
  For example copy the TECH_ART and STORIES folders to wav/train_data folder. Move some of the wav and their equivalent transcripted files form train_data to test_data
The structure will be something simillar to:
```
wav
├── test_data
│   ├── 43.txt
│   └── 43.wav
│        ...
└── train_data
    ├── first
    │   ├── 1.txt
    │   ├── 1.wav
    │   ├── 2.txt
    │   ├── 2.wav
    │   ├── 3.txt
    │   └── 3.wav
    │        ...
    ├── second
    │   ├── 11.txt
    │   └── 11.wav
    │        ...
    └── third
        ├── 13.txt
        ├── 13.wav
        ├── 23.txt
        └── 23.wav
             ...
        ...
```
Note : Folder name and path can be anything.
Make the data directory inside the asr_lab folder. mkdir asr_lab/data
Now we will prepare 4 files for both train_data and the test_data which are as follows - spk2utt text utt2spk wav.scp

./easy-kaldi.sh --train /home/<user>/asr_lab/wav/train_data ./easy-kaldi.sh --test /home/<user>/asr_lab/wav/test_data

Note : You can also create these files manually
Above command will create train and test folders. Move them inside data folder. mv train test data/
Create a folder local inside data folder. And plain-text folder inside local. cd data mkdir -p local/plain-text
Now we need to grab the text from train/text and test/text from data folder to create the text_c1 and lexicon cut -f2 train/text > train/train_text cut -f2 test/text > test/test_text
Combine train_text and test_text files and move it to plain-text then sort and remove the duplicates cat train/train_text test/test_text > text_c1 mv text_c1 local/plain-text/ cd local/plain-text sort text_c1 | uniq > text_c1_sorted_unique mv text_c1_sorted_unique text_c1

Note : You can also use text editer for this
In text_c1 we need to replace the starting SIL with <s> and ending SIL with </s>

sed -i 's/SIL /<s> /g' text_c1_sorted_unique sed -i 's| SIL| </s>|g' text_c1_sorted_unique

Note : You can also use text editer for this

Lexicon Expension

In order to create lexicon create dict folder inside local folder mkdir dict
Then we have to keep only unique words from plain-text/text_c1 , so replace the " "(space) with the "\n" and remove the duplicaes from this data. and also remove the fist 2 lines that contains <s> and </s>

sed 's/ /\n/g' plain-text/text_c1_sorted_unique > unique_words sort unique_words | uniq > unique_words_sorted tail -n +3 unique_words_sorted > unique_words

Note : You can also use text editer for this
Next Let us create the lexicon expension using the script lm-tools.sh cd asr_lab/lm-tools_2/ ./lm-tools.sh ../data/local/uniqeue_words Go to asr_lab dir cd ../asr_lab cp lm-tools_2/temp/tmp.parse lexicon.txt
you need to do follwing operation on lexicon.txt
1. Replace the 2 spaces followed by a single double quote( ") with tab(\t). sed -i 's/ "/\t/g' lexicon.txt
2. Replace the single double quote followed by a tab with a single space. sed -i 's/" / /g' lexicon.txt
3. Replace the double quote present at the end of the line(" ) with nothing. sed -i 's/" //g' lexicon.txt
4. Remove the first blank line if it is present from the file. sed -i '1d' lexicon.txt
5. add the following two lines at the start of the file !SIL sil SIL sil sed -i '1s/^/!SIL\tsil\nSIL\tsil\n/' lexicon.txt
The format of lexicon.txt should be something like:
```
!SIL	sil
SIL	sil
अकेले	a k ee l ee
अगर	a g a r
अगले	a g l ee
अच्छी	a c ch ii
अच्छी-अच्छी	a c ch ii a c ch ii
अंजान	a q j aa n
....
```
Note : lmtool.sh work best with hindi script only

For other indic language you can use TTS scripts present at: /media/linux/TTS/scripts/programs_pranaw/pd_for_hts/<language>/test.pl unique_words For english you can also use cmu lexicon tool here
now from this lexicon.txt run the below command to take only the second column seperated by tab. Save the output in the nonsilence_phones.txt cut -f2 lexicon.txt > nonsilence_phones_raw.txt
open the file, replace the spaces with newline and then remove the duplicates and save it. sed -i 's/ /\n/g' nonsilence_phones_raw.txt sort nonsilence_phones_raw.txt | uniq > nonsilence_phones_unique.txt
Remove the first blank line. sed -i '1d' nonsilence_phones.txt
Remove the first 2 lines from nonsilence_phones.txt which contains SIL and !SIL tail -n +3 nonsilence_phones_unique > nonsilence_phones.txt

create optional_silence.txt and silence_phones.txt file and write 'sil' at first line echo "sil" | cat >> optional_silence.txt echo "sil" | cat >> silence_phones.txt

data folder should have following structure :

├── local
│   ├── dict
│   │   ├── lexicon.txt
│   │   ├── nonsilence_phones.txt
│   │   ├── optional_silence.txt
│   │   └── silence.txt
│   └── plain-text
│        └── text_c1
├── test
│   ├── spk2utt
│   ├── text
│   ├── utt2spk
│   └── wav.scp
└── train
     ├── spk2utt
     ├── text
     ├── utt2spk
     └── wav.scp

Create LM

Now, open the path.sh file and change the kaldi path to new kaldi path where it is installed Update the kaldi path in create_lm.sh at "KALDI_ROOT" and "export IRSTLM".

export KALDI_ROOT=new_kaldi_path export IRSTLM=new_kaldi_path/tools/irstlm

Note : Change the 'n' parameter at the following line in the create_lm.sh script. Here 'n' is the no of words which should be taken into consideration by LM to predict the next word. IRSTLM/bin/build-lm.sh -i data/local/plain-text/text_c1 -n 2 -o data/local/tmp/lm_phone_bg.ilm.gz

Read more at the given link for N-gram Language Model
Now run the scirpt ./create_lm.sh

Output of the script should show the number something like 0.064155 -0.020947(values can very) but ensure first number is postitive and second is negative Note : Before running the script again you might need to remove the following file and directories:

rm -rf data/lang data/local/lang data/local/temp data/local/dict/lexiconp.txt

Create AM

myrun.sh is used to train the Acoustic Model.
- You can change the values of sen and gauss to find the optimum result
```
for sen in 400 500 600 700 800 900; do
    for gauss in 4 5 6 7 8 9 10; do
```
- while rerunning the script you can skip the specific task by switching 1 to 0
```
mfcc=1
mono=1
tri1=1
```
- parllel process can be increased or decreased by changing the nj value
```
decode_nj=4
train_nj=4
```
- change the sampling frequency to 8000 in conf/mfcc.conf echo "--sample-frequency=8000" >> mfcc.conf
Run the myrun.sh

./myrun.sh

Note : For DNN training, please run run_dnn.sh or Run_tdnn_1i.sh script. To run the scirpt in background use nohup or screen

DNN Model will be stored in following path : asr_lab/exp/chain/tdnn1a_sp_online/

DNN graph will be present at /home/anchal/gst_nnet3_model/exp/chain/tree_sp/

View WER of DNN model /exp/chain/tdnn1a_sp_online/decode_/decode_<test_set>/scoring_kaldi/best_wer

Decoding

Update the model path in online_speech1.sh to he model dir which has lower WER.
create the test folder and place the recorded or test audio wav inside it mkdir test_audio
And run the following command

./online_speech1.sh test_audio/ audio_name.wav

Note : For DNN decoding, run the online_speech_DNN.sh script. use the following command. ./online_speech_DNN.sh test_audio/ audio.wav

The transcribed text will appear on the the terminal and in test_audio/recog.txt If nothing appear you can check for error in log at test_audio/out.txt

Handling errors

ERROR: FstHeader::Read: Bad FST header: data/lang/G.fst -

If you get this error then try to copy this file from /home/rushi/kaldi/src/lmbin/arpa2fst to /usr/bin. You can use the following command => sudo cp ../../kaldi/src/lmbin/arpa2fst /usr/bin/
utils/validate_data_dir.sh: file data/train/utt2spk is not sorted or has duplicates

./utils/fix_data_dir.sh data/train/ ./utils/fix_data_dir.sh data/test/
frequency mismatch : make sure you have added folowing line in conf/mfcc.conf sample-frequency=8000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Installation

Required Files, Folders and Scripts

DATA Preparation

Lexicon Expension

Create LM

Create AM

Decoding

Handling errors

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
Run_tdnn_1i.sh		Run_tdnn_1i.sh
create_lm.sh		create_lm.sh
make_graph.sh		make_graph.sh
make_graph_nnet3.sh		make_graph_nnet3.sh
myrun.sh		myrun.sh
online_speech1.sh		online_speech1.sh
online_speech_DNN.sh		online_speech_DNN.sh
run_dnn.sh		run_dnn.sh
update_graph.sh		update_graph.sh

License

TTS-cdac-mumbai/kaldi_asr

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Installation

Required Files, Folders and Scripts

DATA Preparation

Lexicon Expension

Create LM

Create AM

Decoding

Handling errors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages