Skip to content

Latest commit

 

History

History
65 lines (42 loc) · 4 KB

README.md

File metadata and controls

65 lines (42 loc) · 4 KB

PyTorch and Kaldi Speaker Identification and Diarization

A speaker identification and diarization solution based on PyTorch and the VoxCeleb v2 example from Kaldi.

cat

What is this

This work is a speaker identification system based on the Kaldi VoxCeleb v2 example. It enhances it by replacing the nnet3 based neural network with one implemented using the PyTorch machine learning framework. This allows an easier and more dynamic change of the network architecture.

In addition to speaker identification with VoxCeleb this project also adds the ability to run diarization tasks.

sturcutre

Setup

Make sure the requirements listed in What you need are given. The follow the steps described in How to Install.

What you need

Before you can run this make sure you have the required tools available. You need:

How To Install

Follow these steps in order to be able to run this project. If something does not work or you don't understand something please open up an issue and ask I'll be happy to help:

  1. Make sure Kaldi and CUDA are installed and work correctly.
  2. Download this repo: git clone https://github.com/theScrabi/kaldi_voxceleb_pytorch
  3. Enter the root directory of the project: cd kaldi_voxceleb_pytorch
  4. Create a new Python virtual environment: virtualenv venv
  5. Activate the virtual environment: source venv/bin/activate
  6. Install the required Python packages: pip install -r requirements.txt
  7. Edit the file sid/path.sh and set the KALDI variable to the path of your kaldi installation. (e.g.: KALDI=/opt/kaldi)
  8. If you want to use diarization you need to edit diarization/path.sh and also set the KALDI variable there
  9. Enter the diarization directory and run ./install.sh. This will set the required symlinks.

How To use

You can use the run.sh scripts in the sid folder for speaker identification or in the diarization folder for running training and testing.

For speaker identification please read the README.md inside the sid folder. For diarization read the README.md in the diarization folder.

Purpouse

The purpose of this work was to see if Angular Softmax with Cosine distance comparison can enhance end to end speaker identification and diarization. The goal was to find out if this could eventually outperform and replace the additional use of PLDA. Additionally it was checked if the use of an Attention Layer can also enhance speaker identification and diarization.

This was part of my Bachelor Thesis.

Also Interesting

  • Sphereface: The original implementation of the Angular margin based softmax implementation for face recognition.
  • Speech Brain An all in one PyTorch speech recognition framework.
  • pyannote.metric: A framework for diarization evaluation and error analysis.
  • kaldi with tensorflow dnn: A Tensorflow implementation of x-vector topology on top of kaldi.