Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc about model export #618

Merged
merged 2 commits into from
Oct 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
:caption: Contents:

installation/index
model-export/index
recipes/index
contributing/index
huggingface/index
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
2022-10-13 19:09:02,233 INFO [pretrained.py:265] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'encoder_dim': 512, 'nhead': 8, 'dim_feedforward': 2048, 'num_encoder_layers': 12, 'decoder_dim': 512, 'joiner_dim': 512, 'model_warm_step': 3000, 'env_info': {'k2-version': '1.21', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4810e00d8738f1a21278b0156a42ff396a2d40ac', 'k2-git-date': 'Fri Oct 7 19:35:03 2022', 'lhotse-version': '1.3.0.dev+missing.version.file', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': False, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'onnx-doc-1013', 'icefall-git-sha1': 'c39cba5-dirty', 'icefall-git-date': 'Thu Oct 13 15:17:20 2022', 'icefall-path': '/k2-dev/fangjun/open-source/icefall-master', 'k2-path': '/k2-dev/fangjun/open-source/k2-master/k2/python/k2/__init__.py', 'lhotse-path': '/ceph-fj/fangjun/open-source-2/lhotse-jsonl/lhotse/__init__.py', 'hostname': 'de-74279-k2-test-4-0324160024-65bfd8b584-jjlbn', 'IP address': '10.177.74.203'}, 'checkpoint': './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt', 'bpe_model': './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model', 'method': 'greedy_search', 'sound_files': ['./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav'], 'sample_rate': 16000, 'beam_size': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 8, 'context_size': 2, 'max_sym_per_frame': 1, 'simulate_streaming': False, 'decode_chunk_size': 16, 'left_context': 64, 'dynamic_chunk_training': False, 'causal_convolution': False, 'short_chunk_size': 25, 'num_left_chunks': 4, 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}
2022-10-13 19:09:02,233 INFO [pretrained.py:271] device: cpu
2022-10-13 19:09:02,233 INFO [pretrained.py:273] Creating model
2022-10-13 19:09:02,612 INFO [train.py:458] Disable giga
2022-10-13 19:09:02,623 INFO [pretrained.py:277] Number of model parameters: 78648040
2022-10-13 19:09:02,951 INFO [pretrained.py:285] Constructing Fbank computer
2022-10-13 19:09:02,952 INFO [pretrained.py:295] Reading sound files: ['./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav']
2022-10-13 19:09:02,957 INFO [pretrained.py:301] Decoding started
2022-10-13 19:09:06,700 INFO [pretrained.py:329] Using greedy_search
2022-10-13 19:09:06,912 INFO [pretrained.py:388]
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav:
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS

./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav:
GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN

./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav:
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION


2022-10-13 19:09:06,912 INFO [pretrained.py:390] Decoding Done
135 changes: 135 additions & 0 deletions docs/source/model-export/export-model-state-dict.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
Export model.state_dict()
=========================

When to use it
--------------

During model training, we save checkpoints periodically to disk.

A checkpoint contains the following information:

- ``model.state_dict()``
- ``optimizer.state_dict()``
- and some other information related to training

When we need to resume the training process from some point, we need a checkpoint.
However, if we want to publish the model for inference, then only
``model.state_dict()`` is needed. In this case, we need to strip all other information
except ``model.state_dict()`` to reduce the file size of the published model.

How to export
-------------

Every recipe contains a file ``export.py`` that you can use to
export ``model.state_dict()`` by taking some checkpoints as inputs.

.. hint::

Each ``export.py`` contains well-documented usage information.

In the following, we use
`<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless3/export.py>`_
as an example.

.. note::

The steps for other recipes are almost the same.

.. code-block:: bash

cd egs/librispeech/ASR

./pruned_transducer_stateless3/export.py \
--exp-dir ./pruned_transducer_stateless3/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch 20 \
--avg 10

will generate a file ``pruned_transducer_stateless3/exp/pretrained.pt``, which
is a dict containing ``{"model": model.state_dict()}`` saved by ``torch.save()``.

How to use the exported model
-----------------------------

For each recipe, we provide pretrained models hosted on huggingface.
You can find links to pretrained models in ``RESULTS.md`` of each dataset.

In the following, we demonstrate how to use the pretrained model from
`<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13>`_.

.. code-block:: bash

cd egs/librispeech/ASR

git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13

After cloning the repo with ``git lfs``, you will find several files in the folder
``icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp``
that have a prefix ``pretrained-``. Those files contain ``model.state_dict()``
exported by the above ``export.py``.

In each recipe, there is also a file ``pretrained.py``, which can use
``pretrained-xxx.pt`` to decode waves. The following is an example:

.. code-block:: bash

cd egs/librispeech/ASR

./pruned_transducer_stateless3/pretrained.py \
--checkpoint ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt \
--bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model \
--method greedy_search \
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav \
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav \
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav

The above commands show how to use the exported model with ``pretrained.py`` to
decode multiple sound files. Its output is given as follows for reference:

.. literalinclude:: ./code/export-model-state-dict-pretrained-out.txt

Use the exported model to run decode.py
---------------------------------------

When we publish the model, we always note down its WERs on some test
dataset in ``RESULTS.md``. This section describes how to use the
pretrained model to reproduce the WER.

.. code-block:: bash

cd egs/librispeech/ASR
git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13

cd icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp
ln -s pretrained-iter-1224000-avg-14.pt epoch-9999.pt
cd ../..

We create a symlink with name ``epoch-9999.pt`` to ``pretrained-iter-1224000-avg-14.pt``,
so that we can pass ``--epoch 9999 --avg 1`` to ``decode.py`` in the following
command:

.. code-block:: bash

./pruned_transducer_stateless3/decode.py \
--epoch 9999 \
--avg 1 \
--exp-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp \
--lang-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500 \
--max-duration 600 \
--decoding-method greedy_search

You will find the decoding results in
``./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/greedy_search``.

.. caution::

For some recipes, you also need to pass ``--use-averaged-model False``
to ``decode.py``. The reason is that the exported pretrained model is already
the averaged one.

.. hint::

Before running ``decode.py``, we assume that you have already run
``prepare.sh`` to prepare the test dataset.
12 changes: 12 additions & 0 deletions docs/source/model-export/export-ncnn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Export to ncnn
==============

We support exporting LSTM transducer models to `ncnn <https://github.com/tencent/ncnn>`_.

Please refer to :ref:`export-model-for-ncnn` for details.

We also provide `<https://github.com/k2-fsa/sherpa-ncnn>`_
performing speech recognition using ``ncnn`` with exported models.
It has been tested on Linux, macOS, Windows, and Raspberry Pi. The project is
self-contained and can be statically linked to produce a binary containing
everything needed.
69 changes: 69 additions & 0 deletions docs/source/model-export/export-onnx.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
Export to ONNX
==============

In this section, we describe how to export models to ONNX.

.. hint::

Only non-streaming conformer transducer models are tested.


When to use it
--------------

It you want to use an inference framework that supports ONNX
to run the pretrained model.


How to export
-------------

We use
`<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3>`_
as an example in the following.

.. code-block:: bash

cd egs/librispeech/ASR
epoch=14
avg=2

./pruned_transducer_stateless3/export.py \
--exp-dir ./pruned_transducer_stateless3/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch $epoch \
--avg $avg \
--onnx 1

It will generate the following files inside ``pruned_transducer_stateless3/exp``:

- ``encoder.onnx``
- ``decoder.onnx``
- ``joiner.onnx``
- ``joiner_encoder_proj.onnx``
- ``joiner_decoder_proj.onnx``

You can use ``./pruned_transducer_stateless3/exp/onnx_pretrained.py`` to decode
waves with the generated files:

.. code-block:: bash

./pruned_transducer_stateless3/onnx_pretrained.py \
--bpe-model ./data/lang_bpe_500/bpe.model \
--encoder-model-filename ./pruned_transducer_stateless3/exp/encoder.onnx \
--decoder-model-filename ./pruned_transducer_stateless3/exp/decoder.onnx \
--joiner-model-filename ./pruned_transducer_stateless3/exp/joiner.onnx \
--joiner-encoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_encoder_proj.onnx \
--joiner-decoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_decoder_proj.onnx \
/path/to/foo.wav \
/path/to/bar.wav \
/path/to/baz.wav


How to use the exported model
-----------------------------

We also provide `<https://github.com/k2-fsa/sherpa-onnx>`_
performing speech recognition using `onnxruntime <https://github.com/microsoft/onnxruntime>`_
with exported models.
It has been tested on Linux, macOS, and Windows.
58 changes: 58 additions & 0 deletions docs/source/model-export/export-with-torch-jit-script.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
.. _export-model-with-torch-jit-script:

Export model with torch.jit.script()
===================================

In this section, we describe how to export a model via
``torch.jit.script()``.

When to use it
--------------

If we want to use our trained model with torchscript,
we can use ``torch.jit.script()``.

.. hint::

See :ref:`export-model-with-torch-jit-trace`
if you want to use ``torch.jit.trace()``.

How to export
-------------

We use
`<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3>`_
as an example in the following.

.. code-block:: bash

cd egs/librispeech/ASR
epoch=14
avg=1

./pruned_transducer_stateless3/export.py \
--exp-dir ./pruned_transducer_stateless3/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch $epoch \
--avg $avg \
--jit 1

It will generate a file ``cpu_jit.pt`` in ``pruned_transducer_stateless3/exp``.

.. caution::

Don't be confused by ``cpu`` in ``cpu_jit.pt``. We move all parameters
to CPU before saving it into a ``pt`` file; that's why we use ``cpu``
in the filename.

How to use the exported model
-----------------------------

Please refer to the following pages for usage:

- `<https://k2-fsa.github.io/sherpa/python/streaming_asr/emformer/index.html>`_
- `<https://k2-fsa.github.io/sherpa/python/streaming_asr/conv_emformer/index.html>`_
- `<https://k2-fsa.github.io/sherpa/python/streaming_asr/conformer/index.html>`_
- `<https://k2-fsa.github.io/sherpa/python/offline_asr/conformer/index.html>`_
- `<https://k2-fsa.github.io/sherpa/cpp/offline_asr/gigaspeech.html>`_
- `<https://k2-fsa.github.io/sherpa/cpp/offline_asr/wenetspeech.html>`_
69 changes: 69 additions & 0 deletions docs/source/model-export/export-with-torch-jit-trace.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
.. _export-model-with-torch-jit-trace:

Export model with torch.jit.trace()
===================================

In this section, we describe how to export a model via
``torch.jit.trace()``.

When to use it
--------------

If we want to use our trained model with torchscript,
we can use ``torch.jit.trace()``.

.. hint::

See :ref:`export-model-with-torch-jit-script`
if you want to use ``torch.jit.script()``.

How to export
-------------

We use
`<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2>`_
as an example in the following.

.. code-block:: bash

iter=468000
avg=16

cd egs/librispeech/ASR

./lstm_transducer_stateless2/export.py \
--exp-dir ./lstm_transducer_stateless2/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--iter $iter \
--avg $avg \
--jit-trace 1

It will generate three files inside ``lstm_transducer_stateless2/exp``:

- ``encoder_jit_trace.pt``
- ``decoder_jit_trace.pt``
- ``joiner_jit_trace.pt``

You can use
`<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/jit_pretrained.py>`_
to decode sound files with the following commands:

.. code-block:: bash

cd egs/librispeech/ASR
./lstm_transducer_stateless2/jit_pretrained.py \
--bpe-model ./data/lang_bpe_500/bpe.model \
--encoder-model-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace.pt \
--decoder-model-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace.pt \
--joiner-model-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace.pt \
/path/to/foo.wav \
/path/to/bar.wav \
/path/to/baz.wav

How to use the exported models
------------------------------

Please refer to
`<https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/index.html>`_
for its usage in `sherpa <https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/index.html>`_.
You can also find pretrained models there.
14 changes: 14 additions & 0 deletions docs/source/model-export/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Model export
============

In this section, we describe various ways to export models.



.. toctree::

export-model-state-dict
export-with-torch-jit-trace
export-with-torch-jit-script
export-onnx
export-ncnn
Loading