- A C++ compiler with good C++ 11 support (e.g. g++ >= 4.8)
- cmake — version 3.5.1 or later, make
- flashlight is required. flashlight must be built with distributed training enabled.
- libsndfile is required for loading
audio. If using wav2letter++ with
flac
files,libsndfile
must be built withOgg
,Vorbis
andFLAC
libraries. - Intel's Math Kernel Library is required for featurization.
- FFTW is required for featurization.
- KenLM is required for the decoder. One of LZMA, BZip2, or Z is required for LM compression with KenLM.
- gflags is required.
- glog is required.
The following dependencies are automatically downloaded/built on build:
- gtest and gmock 1.8.1 is built if building tests.
- If using the CUDA criterion backend (see below), NVIDIA cub 1.8.0 is downloaded and linked to criterion CUDA kernels.
flashlight
requires CUDA >= 9.2; if building wav2letter++ with theCUDA
criterion backend, CUDA >= 9.2 is required. Using CUDA 9.2 is recommended.- If building with the
CPU
criterion backend, wav2letter++ will try to compile with OpenMP, for better performance.
Options | Configuration | Default Value |
---|---|---|
W2L_CRITERION_BACKEND | CUDA, CPU | CUDA |
W2L_BUILD_TESTS | ON, OFF | ON |
CMAKE_BUILD_TYPE | CMake build types | Debug |
First, clone the repository:
git clone --recursive https://github.com/facebookresearch/wav2letter.git
and follow the build instructions for your specific OS.
There is no install
procedure currently supported for wav2letter++. Building
produces three binaries in the build
directory:
Train
: given a dataset of input audio and corresponding transcriptions in sub-word units (graphemes, phonemes, etc), trains the acoustic model.Test
: performs inference on a given dataset with an acoustic model.Decode
: given an acoustic model/pre-computed network emissions and a language model, computes the most likely sequence of words for a given dataset.
wav2letter++ has been tested on Ubuntu 16.04 and CentOS 7.5.
Assuming you have ArrayFire, flashlight, libsndfile, and KenLM built/installed, install the below dependencies with apt
(or your distribution's package manager):
sudo apt-get update
sudo apt-get install \
# Audio encoding libs for libsndfile \
libasound2-dev \
libflac-dev \
libogg-dev \
libtool \
libvorbis-dev \
# FFTW for Fourier transforms \
libfftw3-dev \
# Compression libraries for KenLM \
zlib1g-dev \
libbz2-dev \
liblzma-dev \
libboost-all-dev \
# gflags \
libgflags-dev \
libgflags2v5 \
# glog \
libgoogle-glog-dev \
libgoogle-glog0v5 \
MKL and KenLM aren't easily discovered by CMake by default; export environment variables to make sure they're found. On most Linux-based systems, MKL is installed in /opt/intel/mkl
. Since KenLM doesn't support an install step, after building KenLM, point CMake to wherever you downloaded and built KenLM:
export MKLROOT=/opt/intel/mkl # or path to MKL
export KENLM_ROOT_DIR=[path to KenLM]
Once you've downloaded wav2letter++ and built and installed the required dependencies:
# in your wav2letter++ directory
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DW2L_CRITERION_BACKEND=[backend] # Replace backend with CUDA or CPU
make -j4 # (or any number of threads)
wav2letter++ and its dependencies can also be built with the provided Dockerfile. Both CUDA and CPU backends are supported with Docker
To build wav2letter++ with Docker:
-
Install Docker and, if using the CUDA backend, nvidia-docker
-
Run the docker image with CUDA/CPU backend in a new container:
# with CUDA backend sudo docker run --runtime=nvidia --rm -itd --ipc=host --name w2l wav2letter/wav2letter:cuda-latest # or with CPU backend sudo docker run --rm -itd --ipc=host --name w2l wav2letter/wav2letter:cpu-latest sudo docker exec -it w2l bash
-
To run tests inside a container
cd /root/wav2letter/build && make test
-
Build Docker image from the source (using
--no-cache
will provide the latest version offlashlight
inside the image if you have built the image previously for earlier versions ofwav2letter
):git clone --recursive https://github.com/facebookresearch/wav2letter.git cd wav2letter # for CUDA backend sudo docker build --no-cache -f ./Dockerfile-CUDA -t wav2letter . # for CPU backend sudo docker build --no-cache -f ./Dockerfile-CPU -t wav2letter .
For logging during training/testing/decoding inside a container, use the
--logtostderr=1
flag.