Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CTC decoding #198

Merged
merged 18 commits into from
Nov 13, 2022
Merged

Conversation

csukuangfj
Copy link
Collaborator

Note: It needs the latest k2, i.e., the master branch k2 as of today.

Supported models:


Will add usages and update links to pre-trained models so that you can easily test this PR.

@csukuangfj
Copy link
Collaborator Author

To my surprise, I find that only icefall and wenet provide pre-trained models in torchscript format.

espnet, speechbrain, NeMo, torchaudio, and fairseq, none of them provides torchscript pre-trained models. I am wondering whether those frameworks care about deployment. Correct me if I am wrong.

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Nov 10, 2022

CTC decoding for models from icefall

First, build sherpa and download the pre-trained model
(We assume you have installed PyTorch, the latest k2 (from the master branch), and kaldifeat)

cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10

cd ..

GIT_LFS_SKIP_SMUDGE=1  git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09

cd icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
git lfs pull --include "exp/cpu_jit.pt"
git lfs pull --include "data/lang_bpe_500/tokens.txt"
git lfs pull --include "data/lang_bpe_500/words.txt"
git lfs pull --include "data/lang_bpe_500/HLG.pt"

Decode with a ctc_topo (i.e., with H)

cd /path/to/sherpa/build

nn_model=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt
tokens=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/tokens.txt

wave1=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav
wave2=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav
wave3=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav

./bin/sherpa-offline \
  --nn-model=$nn_model \
  --tokens=$tokens \
  --use-gpu=false \
  $wave1 \
  $wave2 \
  $wave3

Decode with an HLG graph

nn_model=..icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt
tokens=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt
hlg=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt

wave1=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav
wave2=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav
wave3=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav

./bin/sherpa-offline \
  --nn-model=$nn_model \
  --tokens=$tokens \
  --hlg=$hlg \
  --use-gpu=false \
  $wave1 \
  $wave2 \
  $wave3

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Nov 10, 2022

CTC decoding for models from wenet

Pe-trained models for English

First, build sherpa and download the pre-trained model
(We assume you have installed PyTorch, the latest k2 (from the master branch), and kaldifeat)

cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10

cd ..

GIT_LFS_SKIP_SMUDGE=1  git clone https://huggingface.co/csukuangfj/wenet-english-model

cd wenet-english-model
git lfs pull --include "final.zip"

Decode with a ctc_topo (i.e., with H)

cd /path/to/sherpa/build

nn_model=../wenet-english-model/final.zip
tokens=../wenet-english-model/units.txt

wave1=../wenet-english-model/test_wavs/1089-134686-0001.wav
wave2=../wenet-english-model/test_wavs/1221-135766-0001.wav
wave3=../wenet-english-model/test_wavs/1221-135766-0002.wav

./bin/sherpa-offline \
  --nn-model=$nn_model \
  --tokens=$tokens \
  --use-gpu=false \
  --normalize-samples=false \
  $wave1 \
  $wave2 \
  $wave3

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Nov 10, 2022

CTC decoding for Wav2Vec 2.0 models from torchaudio

First, build sherpa and download the pre-trained model.
(We assume you have installed PyTorch, the latest k2 (from the master branch), and kaldifeat)

(Requires torch >= 1.8.1)

cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10

cd ..

GIT_LFS_SKIP_SMUDGE=1  git clone https://huggingface.co/csukuangfj/wav2vec2.0-torchaudio
cd wav2vec2.0-torchaudio
git lfs pull --include "wav2vec2_asr_base_10m.pt"

Decode with a ctc_topo (i.e., with H)

cd /path/to/sherpa/build

nn_model=../wav2vec2.0-torchaudio/wav2vec2_asr_base_10m.pt
tokens=../wav2vec2.0-torchaudio/tokens.txt

wave1=../wav2vec2.0-torchaudio/test_wavs/1089-134686-0001.wav
wave2=../wav2vec2.0-torchaudio/test_wavs/1221-135766-0001.wav
wave3=../wav2vec2.0-torchaudio/test_wavs/1221-135766-0002.wav

./bin/sherpa-offline \
  --nn-model=$nn_model \
  --tokens=$tokens \
  --use-gpu=false \
  $wave1 \
  $wave2 \
  $wave3

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Nov 10, 2022

The following example shows how to decode a pre-trained model fine-tuned using the voxpopuli dataset (with German):

cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10

cd ..

GIT_LFS_SKIP_SMUDGE=1  git clone https://huggingface.co/csukuangfj/wav2vec2.0-torchaudio
cd wav2vec2.0-torchaudio
git lfs pull --include "voxpopuli_asr_base_10k_de.pt"

cd ../build

nn_model=../wav2vec2.0-torchaudio/voxpopuli_asr_base_10k_de.pt
tokens=../wav2vec2.0-torchaudio/tokens-de.txt

wave1=../wav2vec2.0-torchaudio/test_wavs/20120315-0900-PLENARY-14-de_20120315.wav
wave2=../wav2vec2.0-torchaudio/test_wavs/20170517-0900-PLENARY-16-de_20170517.wav

./bin/sherpa-offline \
  --nn-model=$nn_model \
  --tokens=$tokens \
  --use-gpu=false \
  $wave1 \
  $wave2

It prints:

[I] /root/fangjun/open-source/sherpa/sherpa/csrc/parse-options.cc:495:int sherpa::ParseOptions::Read(int, const char* const*) 2022-11-10 21:10:30 ./bin/sherpa-offline --nn-model=../wav2vec2.0-torchaudio/voxpopuli_asr_base_10k_de.pt --tokens=../wav2vec2.0-torchaudio/tokens-de.txt --use-gpu=false ../wav2vec2.0-torchaudio/test_wavs/20120315-0900-PLENARY-14-de_20120315.wav ../wav2vec2.0-torchaudio/test_wavs/20170517-0900-PLENARY-16-de_20170517.wav

[I] /root/fangjun/open-source/sherpa/sherpa/cpp_api/bin/offline-recognizer.cc:330:int main(int, char**) 2022-11-10 21:10:43
filename: ../wav2vec2.0-torchaudio/test_wavs/20120315-0900-PLENARY-14-de_20120315.wav
result: natürlich gibt es übergriffe des herrscherhauses od des sicherheitsaparats und der saudis die sich dahinter verstecken und die gernbache ein eigentlich in ihren staat integrieren würden

filename: ../wav2vec2.0-torchaudio/test_wavs/20170517-0900-PLENARY-16-de_20170517.wav
result:  heute hat italien und seit neuestemau frankreich mehr absolute schulden als deutschland obwohl dennen wirtschaften spürüber kleiner sind

@csukuangfj csukuangfj merged commit 48e9c68 into k2-fsa:master Nov 13, 2022
@csukuangfj csukuangfj deleted the support-ctc-decoding-1109 branch November 13, 2022 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant