Support CTC decoding #198

csukuangfj · 2022-11-09T14:52:57Z

Note: It needs the latest k2, i.e., the master branch k2 as of today.

Supported models:

Will add usages and update links to pre-trained models so that you can easily test this PR.

csukuangfj · 2022-11-09T15:22:01Z

To my surprise, I find that only icefall and wenet provide pre-trained models in torchscript format.

espnet, speechbrain, NeMo, torchaudio, and fairseq, none of them provides torchscript pre-trained models. I am wondering whether those frameworks care about deployment. Correct me if I am wrong.

csukuangfj · 2022-11-10T04:43:59Z

CTC decoding for models from icefall

First, build sherpa and download the pre-trained model
(We assume you have installed PyTorch, the latest k2 (from the master branch), and kaldifeat)

cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10

cd ..

GIT_LFS_SKIP_SMUDGE=1  git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09

cd icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
git lfs pull --include "exp/cpu_jit.pt"
git lfs pull --include "data/lang_bpe_500/tokens.txt"
git lfs pull --include "data/lang_bpe_500/words.txt"
git lfs pull --include "data/lang_bpe_500/HLG.pt"

Decode with a ctc_topo (i.e., with H)

cd /path/to/sherpa/build

nn_model=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt
tokens=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/tokens.txt

wave1=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav
wave2=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav
wave3=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav

./bin/sherpa-offline \
  --nn-model=$nn_model \
  --tokens=$tokens \
  --use-gpu=false \
  $wave1 \
  $wave2 \
  $wave3

Decode with an HLG graph

nn_model=..icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt
tokens=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt
hlg=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt

wave1=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav
wave2=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav
wave3=../icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav

./bin/sherpa-offline \
  --nn-model=$nn_model \
  --tokens=$tokens \
  --hlg=$hlg \
  --use-gpu=false \
  $wave1 \
  $wave2 \
  $wave3

csukuangfj · 2022-11-10T08:51:39Z

CTC decoding for models from wenet

Pe-trained models for English

First, build sherpa and download the pre-trained model
(We assume you have installed PyTorch, the latest k2 (from the master branch), and kaldifeat)

cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10

cd ..

GIT_LFS_SKIP_SMUDGE=1  git clone https://huggingface.co/csukuangfj/wenet-english-model

cd wenet-english-model
git lfs pull --include "final.zip"

Decode with a ctc_topo (i.e., with H)

cd /path/to/sherpa/build

nn_model=../wenet-english-model/final.zip
tokens=../wenet-english-model/units.txt

wave1=../wenet-english-model/test_wavs/1089-134686-0001.wav
wave2=../wenet-english-model/test_wavs/1221-135766-0001.wav
wave3=../wenet-english-model/test_wavs/1221-135766-0002.wav

./bin/sherpa-offline \
  --nn-model=$nn_model \
  --tokens=$tokens \
  --use-gpu=false \
  --normalize-samples=false \
  $wave1 \
  $wave2 \
  $wave3

csukuangfj · 2022-11-10T11:48:08Z

CTC decoding for Wav2Vec 2.0 models from torchaudio

First, build sherpa and download the pre-trained model.
(We assume you have installed PyTorch, the latest k2 (from the master branch), and kaldifeat)

(Requires torch >= 1.8.1)

cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10

cd ..

GIT_LFS_SKIP_SMUDGE=1  git clone https://huggingface.co/csukuangfj/wav2vec2.0-torchaudio
cd wav2vec2.0-torchaudio
git lfs pull --include "wav2vec2_asr_base_10m.pt"

Decode with a ctc_topo (i.e., with H)

cd /path/to/sherpa/build

nn_model=../wav2vec2.0-torchaudio/wav2vec2_asr_base_10m.pt
tokens=../wav2vec2.0-torchaudio/tokens.txt

wave1=../wav2vec2.0-torchaudio/test_wavs/1089-134686-0001.wav
wave2=../wav2vec2.0-torchaudio/test_wavs/1221-135766-0001.wav
wave3=../wav2vec2.0-torchaudio/test_wavs/1221-135766-0002.wav

./bin/sherpa-offline \
  --nn-model=$nn_model \
  --tokens=$tokens \
  --use-gpu=false \
  $wave1 \
  $wave2 \
  $wave3

csukuangfj · 2022-11-10T13:12:43Z

The following example shows how to decode a pre-trained model fine-tuned using the voxpopuli dataset (with German):

cd /path/to/sherpa
mkdir build
cd build
cmake ..
make -j 10

cd ..

GIT_LFS_SKIP_SMUDGE=1  git clone https://huggingface.co/csukuangfj/wav2vec2.0-torchaudio
cd wav2vec2.0-torchaudio
git lfs pull --include "voxpopuli_asr_base_10k_de.pt"

cd ../build

nn_model=../wav2vec2.0-torchaudio/voxpopuli_asr_base_10k_de.pt
tokens=../wav2vec2.0-torchaudio/tokens-de.txt

wave1=../wav2vec2.0-torchaudio/test_wavs/20120315-0900-PLENARY-14-de_20120315.wav
wave2=../wav2vec2.0-torchaudio/test_wavs/20170517-0900-PLENARY-16-de_20170517.wav

./bin/sherpa-offline \
  --nn-model=$nn_model \
  --tokens=$tokens \
  --use-gpu=false \
  $wave1 \
  $wave2

It prints:

[I] /root/fangjun/open-source/sherpa/sherpa/csrc/parse-options.cc:495:int sherpa::ParseOptions::Read(int, const char* const*) 2022-11-10 21:10:30 ./bin/sherpa-offline --nn-model=../wav2vec2.0-torchaudio/voxpopuli_asr_base_10k_de.pt --tokens=../wav2vec2.0-torchaudio/tokens-de.txt --use-gpu=false ../wav2vec2.0-torchaudio/test_wavs/20120315-0900-PLENARY-14-de_20120315.wav ../wav2vec2.0-torchaudio/test_wavs/20170517-0900-PLENARY-16-de_20170517.wav

[I] /root/fangjun/open-source/sherpa/sherpa/cpp_api/bin/offline-recognizer.cc:330:int main(int, char**) 2022-11-10 21:10:43
filename: ../wav2vec2.0-torchaudio/test_wavs/20120315-0900-PLENARY-14-de_20120315.wav
result: natürlich gibt es übergriffe des herrscherhauses od des sicherheitsaparats und der saudis die sich dahinter verstecken und die gernbache ein eigentlich in ihren staat integrieren würden

filename: ../wav2vec2.0-torchaudio/test_wavs/20170517-0900-PLENARY-16-de_20170517.wav
result:  heute hat italien und seit neuestemau frankreich mehr absolute schulden als deutschland obwohl dennen wirtschaften spürüber kleiner sind

csukuangfj added 6 commits November 9, 2022 15:52

Refactor offline-recognizer.cc to support ctc

6b098ef

Rename

b425142

rename ctc-model to offline-ctc-model

3eae9ef

Support CTC decoding

99c9f23

Support CTC models from wenet

38b985d

Support Wav2Vec2.0

0e2c259

csukuangfj added 3 commits November 10, 2022 12:02

Small fixes

7270edd

Update README

be8aaf9

Update README

58545f9

Support HLG decoding

5c8e573

csukuangfj added 7 commits November 12, 2022 21:05

Add CI tests

242ad00

small fixes

2a14b02

small fixes

f62f662

small fixes

2da6af5

Remove vocab size

2aaddec

small fixes

281daa7

Fix CI

33dc6fc

csukuangfj added the ready label Nov 13, 2022

small fixes

3b41498

csukuangfj merged commit 48e9c68 into k2-fsa:master Nov 13, 2022

csukuangfj deleted the support-ctc-decoding-1109 branch November 13, 2022 05:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support CTC decoding #198

Support CTC decoding #198

csukuangfj commented Nov 9, 2022

csukuangfj commented Nov 9, 2022

csukuangfj commented Nov 10, 2022 •

edited

Loading

csukuangfj commented Nov 10, 2022 •

edited

Loading

csukuangfj commented Nov 10, 2022 •

edited

Loading

csukuangfj commented Nov 10, 2022 •

edited

Loading

Support CTC decoding #198

Support CTC decoding #198

Conversation

csukuangfj commented Nov 9, 2022

csukuangfj commented Nov 9, 2022

csukuangfj commented Nov 10, 2022 • edited Loading

CTC decoding for models from icefall

Decode with a ctc_topo (i.e., with H)

Decode with an HLG graph

csukuangfj commented Nov 10, 2022 • edited Loading

CTC decoding for models from wenet

Pe-trained models for English

Decode with a ctc_topo (i.e., with H)

csukuangfj commented Nov 10, 2022 • edited Loading

CTC decoding for Wav2Vec 2.0 models from torchaudio

Decode with a ctc_topo (i.e., with H)

csukuangfj commented Nov 10, 2022 • edited Loading

csukuangfj commented Nov 10, 2022 •

edited

Loading

csukuangfj commented Nov 10, 2022 •

edited

Loading

csukuangfj commented Nov 10, 2022 •

edited

Loading

csukuangfj commented Nov 10, 2022 •

edited

Loading