-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CTC decoding #198
Support CTC decoding #198
Conversation
To my surprise, I find that only icefall and wenet provide pre-trained models in torchscript format. espnet, speechbrain, NeMo, torchaudio, and fairseq, none of them provides torchscript pre-trained models. I am wondering whether those frameworks care about deployment. Correct me if I am wrong. |
CTC decoding for models from icefallFirst, build sherpa and download the pre-trained model cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10
cd ..
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
cd icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
git lfs pull --include "exp/cpu_jit.pt"
git lfs pull --include "data/lang_bpe_500/tokens.txt"
git lfs pull --include "data/lang_bpe_500/words.txt"
git lfs pull --include "data/lang_bpe_500/HLG.pt" Decode with a ctc_topo (i.e., with H)cd /path/to/sherpa/build
nn_model=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt
tokens=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/tokens.txt
wave1=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav
wave2=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav
wave3=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
./bin/sherpa-offline \
--nn-model=$nn_model \
--tokens=$tokens \
--use-gpu=false \
$wave1 \
$wave2 \
$wave3 Decode with an HLG graphnn_model=..icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt
tokens=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt
hlg=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt
wave1=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav
wave2=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav
wave3=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
./bin/sherpa-offline \
--nn-model=$nn_model \
--tokens=$tokens \
--hlg=$hlg \
--use-gpu=false \
$wave1 \
$wave2 \
$wave3 |
CTC decoding for models from wenetPe-trained models for EnglishFirst, build sherpa and download the pre-trained model cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10
cd ..
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/wenet-english-model
cd wenet-english-model
git lfs pull --include "final.zip" Decode with a ctc_topo (i.e., with H)cd /path/to/sherpa/build
nn_model=../wenet-english-model/final.zip
tokens=../wenet-english-model/units.txt
wave1=../wenet-english-model/test_wavs/1089-134686-0001.wav
wave2=../wenet-english-model/test_wavs/1221-135766-0001.wav
wave3=../wenet-english-model/test_wavs/1221-135766-0002.wav
./bin/sherpa-offline \
--nn-model=$nn_model \
--tokens=$tokens \
--use-gpu=false \
--normalize-samples=false \
$wave1 \
$wave2 \
$wave3 |
CTC decoding for Wav2Vec 2.0 models from torchaudioFirst, build sherpa and download the pre-trained model. (Requires torch >= 1.8.1) cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10
cd ..
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/wav2vec2.0-torchaudio
cd wav2vec2.0-torchaudio
git lfs pull --include "wav2vec2_asr_base_10m.pt" Decode with a ctc_topo (i.e., with H)cd /path/to/sherpa/build
nn_model=../wav2vec2.0-torchaudio/wav2vec2_asr_base_10m.pt
tokens=../wav2vec2.0-torchaudio/tokens.txt
wave1=../wav2vec2.0-torchaudio/test_wavs/1089-134686-0001.wav
wave2=../wav2vec2.0-torchaudio/test_wavs/1221-135766-0001.wav
wave3=../wav2vec2.0-torchaudio/test_wavs/1221-135766-0002.wav
./bin/sherpa-offline \
--nn-model=$nn_model \
--tokens=$tokens \
--use-gpu=false \
$wave1 \
$wave2 \
$wave3 |
The following example shows how to decode a pre-trained model fine-tuned using the voxpopuli dataset (with German): cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10
cd ..
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/wav2vec2.0-torchaudio
cd wav2vec2.0-torchaudio
git lfs pull --include "voxpopuli_asr_base_10k_de.pt"
cd ../build
nn_model=../wav2vec2.0-torchaudio/voxpopuli_asr_base_10k_de.pt
tokens=../wav2vec2.0-torchaudio/tokens-de.txt
wave1=../wav2vec2.0-torchaudio/test_wavs/20120315-0900-PLENARY-14-de_20120315.wav
wave2=../wav2vec2.0-torchaudio/test_wavs/20170517-0900-PLENARY-16-de_20170517.wav
./bin/sherpa-offline \
--nn-model=$nn_model \
--tokens=$tokens \
--use-gpu=false \
$wave1 \
$wave2 It prints:
|
Note: It needs the latest k2, i.e., the master branch k2 as of today.
Supported models:
Will add usages and update links to pre-trained models so that you can easily test this PR.