k2-fsa · csukuangfj · Oct 14, 2022 · Oct 13, 2022 · Oct 14, 2022
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -21,6 +21,7 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
    :caption: Contents:
 
    installation/index
+   model-export/index
    recipes/index
    contributing/index
    huggingface/index
diff --git a/docs/source/model-export/code/export-model-state-dict-pretrained-out.txt b/docs/source/model-export/code/export-model-state-dict-pretrained-out.txt
@@ -0,0 +1,21 @@
+2022-10-13 19:09:02,233 INFO [pretrained.py:265] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'encoder_dim': 512, 'nhead': 8, 'dim_feedforward': 2048, 'num_encoder_layers': 12, 'decoder_dim': 512, 'joiner_dim': 512, 'model_warm_step': 3000, 'env_info': {'k2-version': '1.21', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4810e00d8738f1a21278b0156a42ff396a2d40ac', 'k2-git-date': 'Fri Oct 7 19:35:03 2022', 'lhotse-version': '1.3.0.dev+missing.version.file', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': False, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'onnx-doc-1013', 'icefall-git-sha1': 'c39cba5-dirty', 'icefall-git-date': 'Thu Oct 13 15:17:20 2022', 'icefall-path': '/k2-dev/fangjun/open-source/icefall-master', 'k2-path': '/k2-dev/fangjun/open-source/k2-master/k2/python/k2/__init__.py', 'lhotse-path': '/ceph-fj/fangjun/open-source-2/lhotse-jsonl/lhotse/__init__.py', 'hostname': 'de-74279-k2-test-4-0324160024-65bfd8b584-jjlbn', 'IP address': '10.177.74.203'}, 'checkpoint': './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt', 'bpe_model': './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model', 'method': 'greedy_search', 'sound_files': ['./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav'], 'sample_rate': 16000, 'beam_size': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 8, 'context_size': 2, 'max_sym_per_frame': 1, 'simulate_streaming': False, 'decode_chunk_size': 16, 'left_context': 64, 'dynamic_chunk_training': False, 'causal_convolution': False, 'short_chunk_size': 25, 'num_left_chunks': 4, 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}
+2022-10-13 19:09:02,233 INFO [pretrained.py:271] device: cpu
+2022-10-13 19:09:02,233 INFO [pretrained.py:273] Creating model
+2022-10-13 19:09:02,612 INFO [train.py:458] Disable giga
+2022-10-13 19:09:02,623 INFO [pretrained.py:277] Number of model parameters: 78648040
+2022-10-13 19:09:02,951 INFO [pretrained.py:285] Constructing Fbank computer
+2022-10-13 19:09:02,952 INFO [pretrained.py:295] Reading sound files: ['./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav']
+2022-10-13 19:09:02,957 INFO [pretrained.py:301] Decoding started
+2022-10-13 19:09:06,700 INFO [pretrained.py:329] Using greedy_search
+2022-10-13 19:09:06,912 INFO [pretrained.py:388]
+./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav:
+AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
+
+./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav:
+GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
+
+./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav:
+YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
+
+
+2022-10-13 19:09:06,912 INFO [pretrained.py:390] Decoding Done
diff --git a/docs/source/model-export/export-model-state-dict.rst b/docs/source/model-export/export-model-state-dict.rst
@@ -0,0 +1,135 @@
+Export model.state_dict()
+=========================
+
+When to use it
+--------------
+
+During model training, we save checkpoints periodically to disk.
+
+A checkpoint contains the following information:
+
+  - ``model.state_dict()``
+  - ``optimizer.state_dict()``
+  - and some other information related to training
+
+When we need to resume the training process from some point, we need a checkpoint.
+However, if we want to publish the model for inference, then only
+``model.state_dict()`` is needed. In this case, we need to strip all other information
+except ``model.state_dict()`` to reduce the file size of the published model.
+
+How to export
+-------------
+
+Every recipe contains a file ``export.py`` that you can use to
+export ``model.state_dict()`` by taking some checkpoints as inputs.
+
+.. hint::
+
+   Each ``export.py`` contains well-documented usage information.
+
+In the following, we use
+`<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless3/export.py>`_
+as an example.
+
+.. note::
+
+   The steps for other recipes are almost the same.
+
+.. code-block:: bash
+
+  cd egs/librispeech/ASR
+
+  ./pruned_transducer_stateless3/export.py \
+    --exp-dir ./pruned_transducer_stateless3/exp \
+    --bpe-model data/lang_bpe_500/bpe.model \
+    --epoch 20 \
+    --avg 10
+
+will generate a file ``pruned_transducer_stateless3/exp/pretrained.pt``, which
+is a dict containing ``{"model": model.state_dict()}`` saved by ``torch.save()``.
+
+How to use the exported model
+-----------------------------
+
+For each recipe, we provide pretrained models hosted on huggingface.
+You can find links to pretrained models in ``RESULTS.md`` of each dataset.
+
+In the following, we demonstrate how to use the pretrained model from
+`<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13>`_.
+
+.. code-block:: bash
+
+   cd egs/librispeech/ASR
+
+   git lfs install
+   git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
+
+After cloning the repo with ``git lfs``, you will find several files in the folder
+``icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp``
+that have a prefix ``pretrained-``. Those files contain ``model.state_dict()``
+exported by the above ``export.py``.
+
+In each recipe, there is also a file ``pretrained.py``, which can use
+``pretrained-xxx.pt`` to decode waves. The following is an example:
+
+.. code-block:: bash
+
+   cd egs/librispeech/ASR
+
+   ./pruned_transducer_stateless3/pretrained.py \
+      --checkpoint ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt \
+      --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model \
+      --method greedy_search \
+      ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav \
+      ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav \
+      ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav
+
+The above commands show how to use the exported model with ``pretrained.py`` to
+decode multiple sound files. Its output is given as follows for reference:
+
+.. literalinclude:: ./code/export-model-state-dict-pretrained-out.txt
+
+Use the exported model to run decode.py
+---------------------------------------
+
+When we publish the model, we always note down its WERs on some test
+dataset in ``RESULTS.md``. This section describes how to use the
+pretrained model to reproduce the WER.
+
+.. code-block:: bash
+
+   cd egs/librispeech/ASR
+   git lfs install
+   git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
+
+   cd icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp
+   ln -s pretrained-iter-1224000-avg-14.pt epoch-9999.pt
+   cd ../..
+
+We create a symlink with name ``epoch-9999.pt`` to ``pretrained-iter-1224000-avg-14.pt``,
+so that we can pass ``--epoch 9999 --avg 1`` to ``decode.py`` in the following
+command:
+
+.. code-block:: bash
+
+  ./pruned_transducer_stateless3/decode.py \
+      --epoch 9999 \
+      --avg 1 \
+      --exp-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp \
+      --lang-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500 \
+      --max-duration 600 \
+      --decoding-method greedy_search
+
+You will find the decoding results in
+``./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/greedy_search``.
+
+.. caution::
+
+   For some recipes, you also need to pass ``--use-averaged-model False``
+   to ``decode.py``. The reason is that the exported pretrained model is already
+   the averaged one.
+
+.. hint::
+
+   Before running ``decode.py``, we assume that you have already run
+   ``prepare.sh`` to prepare the test dataset.
diff --git a/docs/source/model-export/export-ncnn.rst b/docs/source/model-export/export-ncnn.rst
@@ -0,0 +1,12 @@
+Export to ncnn
+==============
+
+We support exporting LSTM transducer models to `ncnn <https://github.com/tencent/ncnn>`_.
+
+Please refer to :ref:`export-model-for-ncnn` for details.
+
+We also provide `<https://github.com/k2-fsa/sherpa-ncnn>`_
+performing speech recognition using ``ncnn`` with exported models.
+It has been tested on Linux, macOS, Windows, and Raspberry Pi. The project is
+self-contained and can be statically linked to produce a binary containing
+everything needed.
diff --git a/docs/source/model-export/export-onnx.rst b/docs/source/model-export/export-onnx.rst
@@ -0,0 +1,69 @@
+Export to ONNX
+==============
+
+In this section, we describe how to export models to ONNX.
+
+.. hint::
+
+  Only non-streaming conformer transducer models are tested.
+
+
+When to use it
+--------------
+
+It you want to use an inference framework that supports ONNX
+to run the pretrained model.
+
+
+How to export
+-------------
+
+We use
+`<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3>`_
+as an example in the following.
+
+.. code-block:: bash
+
+    cd egs/librispeech/ASR
+    epoch=14
+    avg=2
+
+    ./pruned_transducer_stateless3/export.py \
+      --exp-dir ./pruned_transducer_stateless3/exp \
+      --bpe-model data/lang_bpe_500/bpe.model \
+      --epoch $epoch \
+      --avg $avg \
+      --onnx 1
+
+It will generate the following files inside ``pruned_transducer_stateless3/exp``:
+
+  - ``encoder.onnx``
+  - ``decoder.onnx``
+  - ``joiner.onnx``
+  - ``joiner_encoder_proj.onnx``
+  - ``joiner_decoder_proj.onnx``
+
+You can use ``./pruned_transducer_stateless3/exp/onnx_pretrained.py`` to decode
+waves with the generated files:
+
+.. code-block:: bash
+
+  ./pruned_transducer_stateless3/onnx_pretrained.py \
+    --bpe-model ./data/lang_bpe_500/bpe.model \
+    --encoder-model-filename ./pruned_transducer_stateless3/exp/encoder.onnx \
+    --decoder-model-filename ./pruned_transducer_stateless3/exp/decoder.onnx \
+    --joiner-model-filename ./pruned_transducer_stateless3/exp/joiner.onnx \
+    --joiner-encoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_encoder_proj.onnx \
+    --joiner-decoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_decoder_proj.onnx \
+    /path/to/foo.wav \
+    /path/to/bar.wav \
+    /path/to/baz.wav
+
+
+How to use the exported model
+-----------------------------
+
+We also provide `<https://github.com/k2-fsa/sherpa-onnx>`_
+performing speech recognition using `onnxruntime <https://github.com/microsoft/onnxruntime>`_
+with exported models.
+It has been tested on Linux, macOS, and Windows.
diff --git a/docs/source/model-export/export-with-torch-jit-script.rst b/docs/source/model-export/export-with-torch-jit-script.rst
@@ -0,0 +1,58 @@
+.. _export-model-with-torch-jit-script:
+
+Export model with torch.jit.script()
+===================================
+
+In this section, we describe how to export a model via
+``torch.jit.script()``.
+
+When to use it
+--------------
+
+If we want to use our trained model with torchscript,
+we can use ``torch.jit.script()``.
+
+.. hint::
+
+  See :ref:`export-model-with-torch-jit-trace`
+  if you want to use ``torch.jit.trace()``.
+
+How to export
+-------------
+
+We use
+`<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3>`_
+as an example in the following.
+
+.. code-block:: bash
+
+    cd egs/librispeech/ASR
+    epoch=14
+    avg=1
+
+    ./pruned_transducer_stateless3/export.py \
+      --exp-dir ./pruned_transducer_stateless3/exp \
+      --bpe-model data/lang_bpe_500/bpe.model \
+      --epoch $epoch \
+      --avg $avg \
+      --jit 1
+
+It will generate a file ``cpu_jit.pt`` in ``pruned_transducer_stateless3/exp``.
+
+.. caution::
+
+   Don't be confused by ``cpu`` in ``cpu_jit.pt``. We move all parameters
+   to CPU before saving it into a ``pt`` file; that's why we use ``cpu``
+   in the filename.
+
+How to use the exported model
+-----------------------------
+
+Please refer to the following pages for usage:
+
+- `<https://k2-fsa.github.io/sherpa/python/streaming_asr/emformer/index.html>`_
+- `<https://k2-fsa.github.io/sherpa/python/streaming_asr/conv_emformer/index.html>`_
+- `<https://k2-fsa.github.io/sherpa/python/streaming_asr/conformer/index.html>`_
+- `<https://k2-fsa.github.io/sherpa/python/offline_asr/conformer/index.html>`_
+- `<https://k2-fsa.github.io/sherpa/cpp/offline_asr/gigaspeech.html>`_
+- `<https://k2-fsa.github.io/sherpa/cpp/offline_asr/wenetspeech.html>`_
diff --git a/docs/source/model-export/export-with-torch-jit-trace.rst b/docs/source/model-export/export-with-torch-jit-trace.rst
@@ -0,0 +1,69 @@
+.. _export-model-with-torch-jit-trace:
+
+Export model with torch.jit.trace()
+===================================
+
+In this section, we describe how to export a model via
+``torch.jit.trace()``.
+
+When to use it
+--------------
+
+If we want to use our trained model with torchscript,
+we can use ``torch.jit.trace()``.
+
+.. hint::
+
+  See :ref:`export-model-with-torch-jit-script`
+  if you want to use ``torch.jit.script()``.
+
+How to export
+-------------
+
+We use
+`<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2>`_
+as an example in the following.
+
+.. code-block:: bash
+
+    iter=468000
+    avg=16
+
+    cd egs/librispeech/ASR
+
+    ./lstm_transducer_stateless2/export.py \
+      --exp-dir ./lstm_transducer_stateless2/exp \
+      --bpe-model data/lang_bpe_500/bpe.model \
+      --iter $iter \
+      --avg  $avg \
+      --jit-trace 1
+
+It will generate three files inside ``lstm_transducer_stateless2/exp``:
+
+  - ``encoder_jit_trace.pt``
+  - ``decoder_jit_trace.pt``
+  - ``joiner_jit_trace.pt``
+
+You can use
+`<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/jit_pretrained.py>`_
+to decode sound files with the following commands:
+
+.. code-block:: bash
+
+    cd egs/librispeech/ASR
+    ./lstm_transducer_stateless2/jit_pretrained.py \
+      --bpe-model ./data/lang_bpe_500/bpe.model \
+      --encoder-model-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace.pt \
+      --decoder-model-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace.pt \
+      --joiner-model-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace.pt \
+      /path/to/foo.wav \
+      /path/to/bar.wav \
+      /path/to/baz.wav
+
+How to use the exported models
+------------------------------
+
+Please refer to
+`<https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/index.html>`_
+for its usage in `sherpa <https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/index.html>`_.
+You can also find pretrained models there.
diff --git a/docs/source/model-export/index.rst b/docs/source/model-export/index.rst
@@ -0,0 +1,14 @@
+Model export
+============
+
+In this section, we describe various ways to export models.
+
+
+
+.. toctree::
+
+   export-model-state-dict
+   export-with-torch-jit-trace
+   export-with-torch-jit-script
+   export-onnx
+   export-ncnn