Skip to content

Commit

Permalink
Add ctc_decode.py for the model trained with rnnt-loss and ctc-loss (#12
Browse files Browse the repository at this point in the history
)

* Support running icefall outside of a git tracked directory. (k2-fsa#470)

* Support running icefall outside of a git tracked directory.

* Minor fixes.

* Rand combine update result (k2-fsa#467)

* update RESULTS.md

* fix test code in pruned_transducer_stateless5/conformer.py

* minor fix

* delete doc

* fix style

* Simplified memory bank for Emformer (k2-fsa#440)

* init files

* use average value as memory vector for each chunk

* change tail padding length from right_context_length to chunk_length

* correct the files, ln -> cp

* fix bug in conv_emformer_transducer_stateless2/emformer.py

* fix doc in conv_emformer_transducer_stateless/emformer.py

* refactor init states for stream

* modify .flake8

* fix bug about memory mask when memory_size==0

* add @torch.jit.export for init_states function

* update RESULTS.md

* minor change

* update README.md

* modify doc

* replace torch.div() with <<

* fix bug, >> -> <<

* use i&i-1 to judge if it is a power of 2

* minor fix

* fix error in RESULTS.md

* update multi_quantization installation (k2-fsa#469)

* update multi_quantization installation

* Update egs/librispeech/ASR/pruned_transducer_stateless6/train.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* [Ready] [Recipes] add aishell2 (k2-fsa#465)

* add aishell2

* fix aishell2

* add manifest stats

* update prepare char dict

* fix lint

* setting max duration

* lint

* change context size to 1

* update result

* update hf link

* fix decoding comment

* add more decoding methods

* update result

* change context-size 2 default

* [WIP] Rnn-T LM nbest rescoring (k2-fsa#471)

* add compile_lg.py for aishell2 recipe (k2-fsa#481)

* Add RNN-LM rescoring in fast beam search (k2-fsa#475)

* fix for case of None stats

* Update conformer.py for aishell4 (k2-fsa#484)

* update conformer.py for aishell4

* update conformer.py

* add strict=False when model.load_state_dict

* CTC attention model with reworked Conformer encoder and reworked Transformer decoder (k2-fsa#462)

* ctc attention model with reworked conformer encoder and reworked transformer decoder

* remove unnecessary func

* resolve flake8 conflicts

* fix typos and modify the expr of ScaledEmbedding

* use original beam size

* minor changes to the scripts

* add rnn lm decoding

* minor changes

* check whether q k v weight is None

* check whether q k v weight is None

* check whether q k v weight is None

* style correction

* update results

* update results

* upload the decoding results of rnn-lm to the RESULTS

* upload the decoding results of rnn-lm to the RESULTS

* Update egs/librispeech/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/librispeech/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/librispeech/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update doc to add a link to Nadira Povey's YouTube channel. (k2-fsa#492)

* Update doc to add a link to Nadira Povey's YouTube channel.

* fix a typo

* Add stats about duration and padding proportion (k2-fsa#485)

* add stats about duration and padding proportion

* add  for utt_duration

* add stats for other recipes

* add stats for other 2 recipes

* modify doc

* minor change

* Add modified_beam_search for streaming decode (k2-fsa#489)

* Add modified_beam_search for pruned_transducer_stateless/streaming_decode.py

* refactor

* modified beam search for stateless3,4

* Fix comments

* Add real streamng ci

* Fix using G before assignment in pruned_transducer_stateless/decode.py (k2-fsa#494)

* Support using aidatatang_200zh optionally in aishell training (k2-fsa#495)

* Use aidatatang_200zh optionally in aishell training.

* Fix get_transducer_model() for aishell. (k2-fsa#497)

PR k2-fsa#495 introduces an error. This commit fixes it.

* [WIP] Pruned-transducer-stateless5-for-WenetSpeech (offline and streaming) (k2-fsa#447)

* pruned-rnnt5-for-wenetspeech

* style check

* style check

* add streaming conformer

* add streaming decode

* changes codes for fast_beam_search and export cpu jit

* add modified-beam-search for streaming decoding

* add modified-beam-search for streaming decoding

* change for streaming_beam_search.py

* add README.md and RESULTS.md

* change for style_check.yml

* do some changes

* do some changes for export.py

* add some decode commands for usage

* add streaming results on README.md

* [debug] raise remind when git-lfs not available (k2-fsa#504)

* [debug] raise remind when git-lfs not available

* modify comment

* correction for prepare.sh (k2-fsa#506)

* Set overwrite=True when extracting features in batches. (k2-fsa#487)

* correction for get rank id. (k2-fsa#507)

* Fix no attribute 'data' error.

* minor fixes

* correction for get rank id.

* Add other decoding methods (nbest, nbest oracle, nbest LG) for wenetspeech pruned rnnt2 (k2-fsa#482)

* add other decoding methods for wenetspeech

* changes for RESULTS.md

* add ngram-lm-scale=0.35 results

* set ngram-lm-scale=0.35 as default

* Update README.md

* add nbest-scale for flie name

* Support dynamic chunk streaming training in pruned_transcuder_stateless5 (k2-fsa#454)

* support dynamic chunk streaming training

* Add simulate streaming decoding

* Support streaming decoding

* fix causal

* Minor fixes

* fix streaming decode; add results

* liear_fst_with_self_loops (k2-fsa#512)

* Support exporting to ONNX format (k2-fsa#501)

* WIP: Support exporting to ONNX format

* Minor fixes.

* Combine encoder/decoder/joiner into a single file.

* Revert merging three onnx models into a single one.

It's quite time consuming to extract a sub-graph from the combined
model. For instance, it takes more than one hour to extract
the encoder model.

* Update CI to test ONNX models.

* Decode with exported models.

* Fix typos.

* Add more doc.

* Remove ncnn as it is not fully tested yet.

* Fix as_strided for streaming conformer.

* Convert ScaledEmbedding to nn.Embedding for inference. (k2-fsa#517)

* Convert ScaledEmbedding to nn.Embedding for inference.

* Fix CI style issues.

* Fix preparing char based lang and add multiprocessing for wenetspeech text segmentation (k2-fsa#513)

* add multiprocessing for wenetspeech text segmentation

* Fix preparing char based lang for wenetspeech

* fix style

Co-authored-by: WeijiZhuang <zhuangweiji@xiaomi.com>

* change for pruned rnnt5 train.py (k2-fsa#519)

* fix about tensorboard (k2-fsa#516)

* fix metricstracker

* fix style

* Merging onnx models (k2-fsa#518)

* add export function of onnx-all-in-one to export.py

* add onnx_check script for all-in-one onnx model

* minor fix

* remove unused arguments

* add onnx-all-in-one test

* fix style

* fix style

* fix requirements

* fix input/output names

* fix installing onnx_graphsurgeon

* fix instaliing onnx_graphsurgeon

* revert to previous requirements.txt

* fix minor

* Fix loading sampler state dict. (k2-fsa#421)

* Fix loading sampler state dict.

* skip scan_pessimistic_batches_for_oom if params.start_batch > 0

* fix torchaudio version (k2-fsa#524)

* fix torchaudio version

* fix torchaudio version

* Fix computing averaged loss in the aishell recipe. (k2-fsa#523)

* Fix computing averaged loss in the aishell recipe.

* Set find_unused_parameters optionally.

* Sort results to make it more convenient to compare decoding results (k2-fsa#522)

* Sort result to make it more convenient to compare decoding results

* Add cut_id to recognition results

* add cut_id to results for all recipes

* Fix torch.jit.script

* Fix comments

* Minor fixes

* Fix torch.jit.tracing for Pytorch version before v1.9.0

* Add function display_and_save_batch in wenetspeech/pruned_transducer_stateless2/train.py (k2-fsa#528)

* Add function display_and_save_batch in egs/wenetspeech/ASR/pruned_transducer_stateless2/train.py

* Modify function: display_and_save_batch

* Delete empty line in pruned_transducer_stateless2/train.py

* Modify code format

* Filter non-finite losses (k2-fsa#525)

* Filter non-finite losses

* Fixes after review

* propagate changes from k2-fsa#525 to other librispeech recipes (k2-fsa#531)

* propaga changes from k2-fsa#525 to other librispeech recipes

* refactor display_and_save_batch to utils

* fixed typo

* reformat code style

* Fix not enough values to unpack error . (k2-fsa#533)

* Use ScaledLSTM as streaming encoder (k2-fsa#479)

* add ScaledLSTM

* add RNNEncoderLayer and RNNEncoder classes in lstm.py

* add RNN and Conv2dSubsampling classes in lstm.py

* hardcode bidirectional=False

* link from pruned_transducer_stateless2

* link scaling.py pruned_transducer_stateless2

* copy from pruned_transducer_stateless2

* modify decode.py pretrained.py test_model.py train.py

* copy streaming decoding files from pruned_transducer_stateless2

* modify streaming decoding files

* simplified code in ScaledLSTM

* flat weights after scaling

* pruned2 -> pruned4

* link __init__.py

* fix style

* remove add_model_arguments

* modify .flake8

* fix style

* fix scale value in scaling.py

* add random combiner for training deeper model

* add using proj_size

* add scaling converter for ScaledLSTM

* support jit trace

* add using averaged model in export.py

* modify test_model.py, test if the model can be successfully exported by jit.trace

* modify pretrained.py

* support streaming decoding

* fix model.py

* Add cut_id to recognition results

* Add cut_id to recognition results

* do not pad in Conv subsampling module; add tail padding during decoding.

* update RESULTS.md

* minor fix

* fix doc

* update README.md

* minor change, filter infinite loss

* remove the condition of raise error

* modify type hint for the return value in model.py

* minor change

* modify RESULTS.md

Co-authored-by: pkufool <wkang.pku@gmail.com>

* Update asr_datamodule.py (k2-fsa#538)

minor file names correction

* minor fixes to LSTM streaming model (k2-fsa#537)

* Pruned transducer stateless2 for AISHELL-1 (k2-fsa#536)

* Fix not enough values to unpack error .

* [WIP] Pruned transducer stateless2 for AISHELL-1

* fix the style issue

* code format for black

* add pruned-transducer-stateless2 results for AISHELL-1

* simplify result

* consider case of empty tensor (k2-fsa#540)

* fixed import quantization is none (k2-fsa#541)

Signed-off-by: shanguanma <nanr9544@gmail.com>

Signed-off-by: shanguanma <nanr9544@gmail.com>
Co-authored-by: shanguanma <nanr9544@gmail.com>

* fix typo for export jit script (k2-fsa#544)

* some small changes for aidatatang_200zh (k2-fsa#542)

* Update prepare.sh

* Update compute_fbank_aidatatang_200zh.py

* fixed no cut_id error in decode_dataset (k2-fsa#549)

* fixed import quantization is none

Signed-off-by: shanguanma <nanr9544@gmail.com>

* fixed no cut_id error in decode_dataset

Signed-off-by: shanguanma <nanr9544@gmail.com>

* fixed more than one "#"

Signed-off-by: shanguanma <nanr9544@gmail.com>

* fixed code style

Signed-off-by: shanguanma <nanr9544@gmail.com>

Signed-off-by: shanguanma <nanr9544@gmail.com>
Co-authored-by: shanguanma <nanr9544@gmail.com>

* Add clamping operation in Eve optimizer for all scalar weights to avoid (k2-fsa#550)

non stable training in some scenarios. The clamping range is set to (-10,2).
 Note that this change may cause unexpected effect if you resume
training from a model that is trained without clamping.

* minor changes for correct path names && import module text2segments.py (k2-fsa#552)

* Update asr_datamodule.py

minor file names correction

* minor changes for correct path names && import module text2segments.py

* fix scaling converter test for decoder(predictor). (k2-fsa#553)

* Disable CUDA_LAUNCH_BLOCKING in wenetspeech recipes. (k2-fsa#554)

* Disable CUDA_LAUNCH_BLOCKING in wenetspeech recipes.

* minor fixes

* Check that read_manifests_if_cached returns a non-empty dict. (k2-fsa#555)

* Modified prepare_transcripts.py and preprare_lexicon.py of tedlium3 recipe (k2-fsa#567)

* Use modified ctc topo when vocab size is > 500 (k2-fsa#568)

* Add LSTM for the multi-dataset setup. (k2-fsa#558)

* Add LSTM for the multi-dataset setup.

* Add results

* fix style issues

* add missing file

* Adding Dockerfile for Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8 (k2-fsa#572)

* Changed Dockerfile

* Update Dockerfile

* Dockerfile

* Update README.md

* Add Dockerfiles

* Update README.md

Removed misleading CUDA version, as the Ubuntu18.04-pytorch1.7.1-cuda11.0-cudnn8 Dockerfile can only support CUDA versions >11.0.

* support exporting to ncnn format via PNNX (k2-fsa#571)

* Small fixes to the transducer training doc (k2-fsa#575)

* Update kaldifeat in CI tests (k2-fsa#583)

* padding zeros (k2-fsa#591)

* Gradient filter for training lstm model (k2-fsa#564)

* init files

* add gradient filter module

* refact getting median value

* add cutoff for grad filter

* delete comments

* apply gradient filter in LSTM module, to filter both input and params

* fix typing and refactor

* filter with soft mask

* rename lstm_transducer_stateless2 to lstm_transducer_stateless3

* fix typos, and update RESULTS.md

* minor fix

* fix return typing

* fix typo

* Modified train.py of tedlium3 models (k2-fsa#597)

* Add dill to requirements.txt (k2-fsa#613)

* Add dill to requirements.txt

* Disable style check for python 3.7

* update docs (k2-fsa#611)

* update docs

Co-authored-by: unknown <mazhihao@jshcbd.cn>
Co-authored-by: KajiMaCN <moonlightshadowmzh@gmail.com>

* exporting projection layers of joiner separately for onnx (k2-fsa#584)

* exporting projection layers of joiner separately for onnx

* Remove all-in-one for onnx export (k2-fsa#614)

* Remove all-in-one for onnx export

* Exit on error for CI

* Modify ActivationBalancer for speed (k2-fsa#612)

* add a probability to apply ActivationBalancer

* minor fix

* minor fix

* Support exporting to ONNX for the wenetspeech recipe (k2-fsa#615)

* Support exporting to ONNX for the wenetspeech recipe

* Add doc about model export (k2-fsa#618)

* Add doc about model export

* fix typos

* Fix links in the doc (k2-fsa#619)

* fix type hints for decode.py (k2-fsa#623)

* Support exporting LSTM with projection to ONNX (k2-fsa#621)

* Support exporting LSTM with projection to ONNX

* Add missing files

* small fixes

* CSJ Data Preparation (k2-fsa#617)

* workspace setup

* csj prepare done

* Change compute_fbank_musan.py t soft link

* add description

* change lhotse prepare csj command

* split train-dev here

* Add header

* remove debug

* save manifest_statistics

* generate transcript in Lhotse

* update comments in config file

* fix number of parameters in RESULTS.md (k2-fsa#627)

* Add Shallow fusion in modified_beam_search (k2-fsa#630)

* Add utility for shallow fusion

* test batch size == 1 without shallow fusion

* Use shallow fusion for modified-beam-search

* Modified beam search with ngram rescoring

* Fix code according to review

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Add kaldifst to requirements.txt (k2-fsa#631)

* Install kaldifst for GitHub actions (k2-fsa#632)

* Install kaldifst for GitHub actions

* Update train.py (k2-fsa#635)

Add the missing step to add the arguments to the parser.

* Fix type hints for decode.py (k2-fsa#638)

* Fix type hints for decode.py

* Fix flake8

* fix typos (k2-fsa#639)

* Remove onnx and onnxruntime from requirements.txt (k2-fsa#640)

* Remove onnx and onnxruntime from requirements.txt

* Checkout the LM for aishell explicitly (k2-fsa#642)

* Get timestamps during decoding (k2-fsa#598)

* print out timestamps during decoding

* add word-level alignments

* support to compute mean symbol delay with word-level alignments

* print variance of symbol delay

* update doc

* support to compute delay for pruned_transducer_stateless4

* fix bug

* add doc

* remove tail padding for non-streaming models (k2-fsa#625)

* support RNNLM shallow fusion for LSTM transducer

* support RNNLM shallow fusion in stateless5

* update results

* update decoding commands

* update author info

* update

* include previous added decoding method

* minor fixes

* remove redundant test lines

* Update egs/librispeech/ASR/lstm_transducer_stateless2/decode.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update tdnn_lstm_ctc.rst (k2-fsa#647)

* Update README.md (k2-fsa#649)

* Update tdnn_lstm_ctc.rst (k2-fsa#648)

* fix torchaudio version in dockerfile (k2-fsa#653)

* fix torchaudio version in dockerfile

* remove kaldiio

* update docs

* Add fast_beam_search_LG (k2-fsa#622)

* Add fast_beam_search_LG

* add fast_beam_search_LG to commonly used recipes

* fix ci

* fix ci

* Fix error

* Fix LG log file name (k2-fsa#657)

* resolve conflict with timestamp feature

* resolve conflicts

* minor fixes

* remove testing file

* Apply delay penalty on transducer (k2-fsa#654)

* add delay penalty

* fix CI

* fix CI

* Refactor getting timestamps in fsa-based decoding (k2-fsa#660)

* refactor getting timestamps for fsa-based decoding

* fix doc

* fix bug

* add ctc_decode.py

* fix doc

Signed-off-by: shanguanma <nanr9544@gmail.com>
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
Co-authored-by: LIyong.Guo <839019390@qq.com>
Co-authored-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Co-authored-by: ezerhouni <61225408+ezerhouni@users.noreply.github.com>
Co-authored-by: Mingshuang Luo <37799481+luomingshuang@users.noreply.github.com>
Co-authored-by: Daniel Povey <dpovey@gmail.com>
Co-authored-by: Quandwang <quandwang@hotmail.com>
Co-authored-by: Wei Kang <wkang.pku@gmail.com>
Co-authored-by: boji123 <boji123@aliyun.com>
Co-authored-by: Lucky Wong <lekai.huang@gmail.com>
Co-authored-by: LIyong.Guo <guonwpu@qq.com>
Co-authored-by: Weiji Zhuang <zhuangweiji@foxmail.com>
Co-authored-by: WeijiZhuang <zhuangweiji@xiaomi.com>
Co-authored-by: Yunusemre <yunusemreozkose@gmail.com>
Co-authored-by: FNLPprojects <linxinzhulxz@gmail.com>
Co-authored-by: yangsuxia <34536059+yangsuxia@users.noreply.github.com>
Co-authored-by: marcoyang1998 <45973641+marcoyang1998@users.noreply.github.com>
Co-authored-by: rickychanhoyin <ricky.hoyin.chan@gmail.com>
Co-authored-by: Duo Ma <39255927+shanguanma@users.noreply.github.com>
Co-authored-by: shanguanma <nanr9544@gmail.com>
Co-authored-by: rxhmdia <41623136+rxhmdia@users.noreply.github.com>
Co-authored-by: kobenaxie <572745565@qq.com>
Co-authored-by: shcxlee <113081290+shcxlee@users.noreply.github.com>
Co-authored-by: Teo Wen Shen <36886809+teowenshen@users.noreply.github.com>
Co-authored-by: KajiMaCN <827272056@qq.com>
Co-authored-by: unknown <mazhihao@jshcbd.cn>
Co-authored-by: KajiMaCN <moonlightshadowmzh@gmail.com>
Co-authored-by: Yunusemre <yunusemre.ozkose@sestek.com>
Co-authored-by: Nagendra Goel <nagendra.goel@gmail.com>
Co-authored-by: marcoyang <marcoyang1998@gmail.com>
Co-authored-by: zr_jin <60612200+JinZr@users.noreply.github.com>
  • Loading branch information
1 parent f061cac commit 89ce554
Show file tree
Hide file tree
Showing 407 changed files with 54,330 additions and 3,129 deletions.
15 changes: 13 additions & 2 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,15 @@ statistics=true
max-line-length = 80
per-file-ignores =
# line too long
icefall/diagnostics.py: E501
icefall/diagnostics.py: E501,
egs/*/ASR/*/conformer.py: E501,
egs/*/ASR/pruned_transducer_stateless*/*.py: E501,
egs/*/ASR/*/optim.py: E501,
egs/*/ASR/*/scaling.py: E501,
egs/librispeech/ASR/conv_emformer_transducer_stateless/*.py: E501, E203
egs/librispeech/ASR/lstm_transducer_stateless*/*.py: E501, E203
egs/librispeech/ASR/conv_emformer_transducer_stateless*/*.py: E501, E203
egs/librispeech/ASR/conformer_ctc2/*py: E501,
egs/librispeech/ASR/RESULTS.md: E999,

# invalid escape sequence (cause by tex formular), W605
icefall/utils.py: E501, W605
Expand All @@ -19,3 +22,11 @@ exclude =
**/data/**,
icefall/shared/make_kn_lm.py,
icefall/__init__.py

ignore =
# E203 white space before ":"
E203,
# W503 line break before binary operator
W503,
# E226 missing whitespace around arithmetic operator
E226,
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
# The computed features are saved to ~/tmp/fbank-libri and are
# cached for later runs

set -e

export PYTHONPATH=$PWD:$PYTHONPATH
echo $PYTHONPATH

Expand Down
2 changes: 2 additions & 0 deletions .github/scripts/download-gigaspeech-dev-test-dataset.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
# You will find directories `~/tmp/giga-dev-dataset-fbank` after running
# this script.

set -e

mkdir -p ~/tmp
cd ~/tmp

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
# You will find directories ~/tmp/download/LibriSpeech after running
# this script.

set -e

mkdir ~/tmp/download
cd egs/librispeech/ASR
ln -s ~/tmp/download .
Expand Down
2 changes: 2 additions & 0 deletions .github/scripts/install-kaldifeat.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
# This script installs kaldifeat into the directory ~/tmp/kaldifeat
# which is cached by GitHub actions for later runs.

set -e

mkdir -p ~/tmp
cd ~/tmp
git clone https://github.com/csukuangfj/kaldifeat
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
# to egs/librispeech/ASR/download/LibriSpeech and generates manifest
# files in egs/librispeech/ASR/data/manifests

set -e

cd egs/librispeech/ASR
[ ! -e download ] && ln -s ~/tmp/download .
mkdir -p data/manifests
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/usr/bin/env bash

set -e

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
Expand Down Expand Up @@ -40,7 +42,7 @@ for sym in 1 2 3; do
--lang-dir $repo/data/lang_char \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$rep/test_wavs/BAC009S0764W0123.wav
$repo/test_wavs/BAC009S0764W0123.wav
done

for method in modified_beam_search beam_search fast_beam_search; do
Expand All @@ -53,7 +55,7 @@ for method in modified_beam_search beam_search fast_beam_search; do
--lang-dir $repo/data/lang_char \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$rep/test_wavs/BAC009S0764W0123.wav
$repo/test_wavs/BAC009S0764W0123.wav
done

echo "GITHUB_EVENT_NAME: ${GITHUB_EVENT_NAME}"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/usr/bin/env bash

set -e

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
#!/usr/bin/env bash
#
set -e

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}

cd egs/librispeech/ASR

repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03

log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)

log "Display test files"
tree $repo/
soxi $repo/test_wavs/*.wav
ls -lh $repo/test_wavs/*.wav

pushd $repo/exp
ln -s pretrained-iter-468000-avg-16.pt pretrained.pt
ln -s pretrained-iter-468000-avg-16.pt epoch-99.pt
popd

log "Install ncnn and pnnx"

# We are using a modified ncnn here. Will try to merge it to the official repo
# of ncnn
git clone https://github.com/csukuangfj/ncnn
pushd ncnn
git submodule init
git submodule update python/pybind11
python3 setup.py bdist_wheel
ls -lh dist/
pip install dist/*.whl
cd tools/pnnx
mkdir build
cd build
cmake ..
make -j4 pnnx

./src/pnnx || echo "pass"

popd

log "Test exporting to pnnx format"

./lstm_transducer_stateless2/export.py \
--exp-dir $repo/exp \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--pnnx 1

./ncnn/tools/pnnx/build/src/pnnx $repo/exp/encoder_jit_trace-pnnx.pt
./ncnn/tools/pnnx/build/src/pnnx $repo/exp/decoder_jit_trace-pnnx.pt
./ncnn/tools/pnnx/build/src/pnnx $repo/exp/joiner_jit_trace-pnnx.pt

./lstm_transducer_stateless2/ncnn-decode.py \
--bpe-model-filename $repo/data/lang_bpe_500/bpe.model \
--encoder-param-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.param \
--encoder-bin-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.bin \
--decoder-param-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.param \
--decoder-bin-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.bin \
--joiner-param-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.param \
--joiner-bin-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.bin \
$repo/test_wavs/1089-134686-0001.wav

./lstm_transducer_stateless2/streaming-ncnn-decode.py \
--bpe-model-filename $repo/data/lang_bpe_500/bpe.model \
--encoder-param-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.param \
--encoder-bin-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.bin \
--decoder-param-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.param \
--decoder-bin-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.bin \
--joiner-param-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.param \
--joiner-bin-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.bin \
$repo/test_wavs/1089-134686-0001.wav



log "Test exporting with torch.jit.trace()"

./lstm_transducer_stateless2/export.py \
--exp-dir $repo/exp \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--jit-trace 1

log "Decode with models exported by torch.jit.trace()"

./lstm_transducer_stateless2/jit_pretrained.py \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
--encoder-model-filename $repo/exp/encoder_jit_trace.pt \
--decoder-model-filename $repo/exp/decoder_jit_trace.pt \
--joiner-model-filename $repo/exp/joiner_jit_trace.pt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav

log "Test exporting to ONNX"

./lstm_transducer_stateless2/export.py \
--exp-dir $repo/exp \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--onnx 1

log "Decode with ONNX models "

./lstm_transducer_stateless2/streaming-onnx-decode.py \
--bpe-model-filename $repo/data/lang_bpe_500/bpe.model \
--encoder-model-filename $repo//exp/encoder.onnx \
--decoder-model-filename $repo/exp/decoder.onnx \
--joiner-model-filename $repo/exp/joiner.onnx \
--joiner-encoder-proj-model-filename $repo/exp/joiner_encoder_proj.onnx \
--joiner-decoder-proj-model-filename $repo/exp/joiner_decoder_proj.onnx \
$repo/test_wavs/1089-134686-0001.wav

./lstm_transducer_stateless2/streaming-onnx-decode.py \
--bpe-model-filename $repo/data/lang_bpe_500/bpe.model \
--encoder-model-filename $repo//exp/encoder.onnx \
--decoder-model-filename $repo/exp/decoder.onnx \
--joiner-model-filename $repo/exp/joiner.onnx \
--joiner-encoder-proj-model-filename $repo/exp/joiner_encoder_proj.onnx \
--joiner-decoder-proj-model-filename $repo/exp/joiner_decoder_proj.onnx \
$repo/test_wavs/1221-135766-0001.wav

./lstm_transducer_stateless2/streaming-onnx-decode.py \
--bpe-model-filename $repo/data/lang_bpe_500/bpe.model \
--encoder-model-filename $repo//exp/encoder.onnx \
--decoder-model-filename $repo/exp/decoder.onnx \
--joiner-model-filename $repo/exp/joiner.onnx \
--joiner-encoder-proj-model-filename $repo/exp/joiner_encoder_proj.onnx \
--joiner-decoder-proj-model-filename $repo/exp/joiner_decoder_proj.onnx \
$repo/test_wavs/1221-135766-0002.wav



for sym in 1 2 3; do
log "Greedy search with --max-sym-per-frame $sym"

./lstm_transducer_stateless2/pretrained.py \
--method greedy_search \
--max-sym-per-frame $sym \
--checkpoint $repo/exp/pretrained.pt \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done

for method in modified_beam_search beam_search fast_beam_search; do
log "$method"

./lstm_transducer_stateless2/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done

echo "GITHUB_EVENT_NAME: ${GITHUB_EVENT_NAME}"
echo "GITHUB_EVENT_LABEL_NAME: ${GITHUB_EVENT_LABEL_NAME}"
if [[ x"${GITHUB_EVENT_NAME}" == x"schedule" ]]; then
mkdir -p lstm_transducer_stateless2/exp
ln -s $PWD/$repo/exp/pretrained.pt lstm_transducer_stateless2/exp/epoch-999.pt
ln -s $PWD/$repo/data/lang_bpe_500 data/

ls -lh data
ls -lh lstm_transducer_stateless2/exp

log "Decoding test-clean and test-other"

# use a small value for decoding with CPU
max_duration=100

for method in greedy_search fast_beam_search modified_beam_search; do
log "Decoding with $method"

./lstm_transducer_stateless2/decode.py \
--decoding-method $method \
--epoch 999 \
--avg 1 \
--use-averaged-model 0 \
--max-duration $max_duration \
--exp-dir lstm_transducer_stateless2/exp
done

rm lstm_transducer_stateless2/exp/*.pt
fi
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/usr/bin/env bash

set -e

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/usr/bin/env bash

set -e

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
Expand All @@ -11,10 +13,14 @@ cd egs/librispeech/ASR
repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless2-2022-04-29

log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)

pushd $repo
git lfs pull --include "data/lang_bpe_500/bpe.model"
git lfs pull --include "exp/pretrained-epoch-38-avg-10.pt"
popd

log "Display test files"
tree $repo/
soxi $repo/test_wavs/*.wav
Expand Down Expand Up @@ -77,4 +83,5 @@ if [[ x"${GITHUB_EVENT_NAME}" == x"schedule" || x"${GITHUB_EVENT_LABEL_NAME}" ==
done

rm pruned_transducer_stateless2/exp/*.pt
rm -r data/lang_bpe_500
fi
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/usr/bin/env bash

set -e

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
Expand All @@ -11,9 +13,12 @@ cd egs/librispeech/ASR
repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-04-29

log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "data/lang_bpe_500/bpe.model"
git lfs pull --include "exp/pretrained-epoch-25-avg-6.pt"
popd

log "Display test files"
tree $repo/
Expand Down Expand Up @@ -77,4 +82,5 @@ if [[ x"${GITHUB_EVENT_NAME}" == x"schedule" || x"${GITHUB_EVENT_LABEL_NAME}" ==
done

rm pruned_transducer_stateless3/exp/*.pt
rm -r data/lang_bpe_500
fi
Loading

0 comments on commit 89ce554

Please sign in to comment.