Audio Classification using Deep Learning Networks (Keyword spotting)

The aim of the project is to analyse performance of various neural networks in identifying the word spoken by a person. The data used in this process is Google command dataset. It contains ~65k audio files each of which has a word spoken by a person anf tag for that file which is the text for that audio file. There are 30 different words in the dataset spoken by different people. Thus, the task is to classify the audio files based on the word spoken. We ran the following neural networks to perform the task:

Lenet
VGG
ResNet
CNNRNN
CNN-1D
Parallel Net ( combination of two networks trained in parallel)

The output/ directory contains the training logs for these networks. The models/ directory contains the trained models for the best performing configurations. These models can be loaded directly in memory and used for classification.

Steps to train/test a model

Please follow the steps to train/test a model:

Setup the environment

mkdir bdml
cd bdml
git clone
module load anaconda3/5.3.1
conda env create -f requirements.yaml
source activate bdml

Load the dataset

Download Speech data to this directory
wget "http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz"
gunzip speech_commands_v0.01.tar.gz
mkdir data; mv speech_commands_v0.01.tar data
cd data
tar xopf ..path_to/speech_commands_v0.01.tar
cd ..
mkdir speechdata
cd BDML
python create_dataset.py ../data --out_path ../speechdata

Run the model

python run.py --train_path ../speechdata/train/ --valid_path ../speechdata/valid --test_path ../speechdata/test --model CNN1D

Note: You can specify/change arguments to the run.py script like batch_size, model e.t.c. The information on other options is present in the run.py script.

We have run this project on NYU Prince server using Slurm batch script.

To run the batch script on NYU server:

sbatch runbatch.s

Note: You can change the arguments in the runbatch.s script to run with various network configuration.

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
models		models
output		output
.gitignore		.gitignore
AudioFileExploration.ipynb		AudioFileExploration.ipynb
ExploringDataSet.ipynb		ExploringDataSet.ipynb
PrecisionRecall.ipynb		PrecisionRecall.ipynb
README.md		README.md
ResultAnalysis.ipynb		ResultAnalysis.ipynb
create_dataset.py		create_dataset.py
data_loader.py		data_loader.py
data_loader_parallel.py		data_loader_parallel.py
data_loader_tester.py		data_loader_tester.py
model.py		model.py
requirements.yaml		requirements.yaml
run.py		run.py
runbatch.s		runbatch.s
train.py		train.py
train_parallel.py		train_parallel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Classification using Deep Learning Networks (Keyword spotting)

Steps to train/test a model

Setup the environment

Load the dataset

Run the model

To run the batch script on NYU server:

About

Releases

Packages

Languages

amala2895/Audio-Classification-using-Deep-Learning-Networks

Folders and files

Latest commit

History

Repository files navigation

Audio Classification using Deep Learning Networks (Keyword spotting)

Steps to train/test a model

Setup the environment

Load the dataset

Run the model

To run the batch script on NYU server:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages