Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ScaledLSTM as streaming encoder #479

Merged
merged 44 commits into from
Aug 19, 2022
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
0fcdd15
Merge branch 'k2-fsa:master' into master
yaozengwei Jul 12, 2022
9165de5
add ScaledLSTM
yaozengwei Jul 16, 2022
7c9fcfa
add RNNEncoderLayer and RNNEncoder classes in lstm.py
yaozengwei Jul 16, 2022
2d53f2e
add RNN and Conv2dSubsampling classes in lstm.py
yaozengwei Jul 17, 2022
074bd7d
hardcode bidirectional=False
yaozengwei Jul 17, 2022
d16b9ec
link from pruned_transducer_stateless2
yaozengwei Jul 17, 2022
89bfb6b
link scaling.py pruned_transducer_stateless2
yaozengwei Jul 17, 2022
b1be6ea
copy from pruned_transducer_stateless2
yaozengwei Jul 17, 2022
4a0dea2
modify decode.py pretrained.py test_model.py train.py
yaozengwei Jul 17, 2022
822cc78
copy streaming decoding files from pruned_transducer_stateless2
yaozengwei Jul 17, 2022
5c669b7
modify streaming decoding files
yaozengwei Jul 17, 2022
539a9d7
simplified code in ScaledLSTM
yaozengwei Jul 17, 2022
125eac8
flat weights after scaling
yaozengwei Jul 17, 2022
ce2d817
pruned2 -> pruned4
yaozengwei Jul 17, 2022
872d239
link __init__.py
yaozengwei Jul 17, 2022
7c00f92
fix style
yaozengwei Jul 17, 2022
c71788e
remove add_model_arguments
yaozengwei Jul 17, 2022
1b0d2f3
modify .flake8
yaozengwei Jul 17, 2022
fd261ec
Merge remote-tracking branch 'k2-fsa/master' into lstm_new
yaozengwei Jul 17, 2022
3cedbe3
fix style
yaozengwei Jul 17, 2022
8bd700c
fix scale value in scaling.py
yaozengwei Jul 18, 2022
9bb0c79
add random combiner for training deeper model
yaozengwei Jul 18, 2022
6871c96
add using proj_size
yaozengwei Jul 25, 2022
9e4b5bd
Merge remote-tracking branch 'k2-fsa/master' into lstm
yaozengwei Aug 5, 2022
03b056c
add scaling converter for ScaledLSTM
yaozengwei Aug 5, 2022
45c7894
support jit trace
yaozengwei Aug 9, 2022
522a45c
add using averaged model in export.py
yaozengwei Aug 10, 2022
8f3645e
modify test_model.py, test if the model can be successfully exported …
yaozengwei Aug 10, 2022
1138b27
modify pretrained.py
yaozengwei Aug 10, 2022
dc73ff0
support streaming decoding
yaozengwei Aug 10, 2022
f63f855
fix model.py
yaozengwei Aug 11, 2022
dc212ba
Add cut_id to recognition results
pkufool Aug 7, 2022
8cceedf
Add cut_id to recognition results
yaozengwei Aug 11, 2022
7ee3701
do not pad in Conv subsampling module; add tail padding during decoding.
yaozengwei Aug 14, 2022
be18610
update RESULTS.md
yaozengwei Aug 18, 2022
ba09c4a
Merge remote-tracking branch 'k2-fsa/master' into lstm
yaozengwei Aug 18, 2022
ab6f5e3
minor fix
yaozengwei Aug 18, 2022
db3e570
fix doc
yaozengwei Aug 18, 2022
2ee5122
update README.md
yaozengwei Aug 18, 2022
6191c3e
minor change, filter infinite loss
yaozengwei Aug 19, 2022
3b6310c
remove the condition of raise error
yaozengwei Aug 19, 2022
5b62125
modify type hint for the return value in model.py
yaozengwei Aug 19, 2022
e00aa29
minor change
yaozengwei Aug 19, 2022
9b96a14
modify RESULTS.md
yaozengwei Aug 19, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ per-file-ignores =
egs/*/ASR/pruned_transducer_stateless*/*.py: E501,
egs/*/ASR/*/optim.py: E501,
egs/*/ASR/*/scaling.py: E501,
egs/librispeech/ASR/conv_emformer_transducer_stateless*/*.py: E501, E203,
egs/librispeech/ASR/lstm_transducer_stateless/*.py: E501, E203
egs/librispeech/ASR/conv_emformer_transducer_stateless*/*.py: E501, E203
egs/librispeech/ASR/conformer_ctc2/*py: E501,
egs/librispeech/ASR/RESULTS.md: E999,

Expand Down
1 change: 1 addition & 0 deletions egs/librispeech/ASR/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ The following table lists the differences among them.
| `pruned_stateless_emformer_rnnt2` | Emformer(from torchaudio) | Embedding + Conv1d | Using Emformer from torchaudio for streaming ASR|
| `conv_emformer_transducer_stateless` | ConvEmformer | Embedding + Conv1d | Using ConvEmformer for streaming ASR + mechanisms in reworked model |
| `conv_emformer_transducer_stateless2` | ConvEmformer | Embedding + Conv1d | Using ConvEmformer with simplified memory for streaming ASR + mechanisms in reworked model |
| `lstm_transducer_stateless` | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model |

The decoder in `transducer_stateless` is modified from the paper
[Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).
Expand Down
132 changes: 132 additions & 0 deletions egs/librispeech/ASR/RESULTS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,137 @@
## Results

#### LibriSpeech BPE training results (Pruned Stateless LSTM RNN-T)

[lstm_transducer_stateless](./lstm_transducer_stateless)

It implements LSTM model with mechanisms in reworked model for streaming ASR.

See <https://github.com/k2-fsa/icefall/pull/479> for more details.

#### training on full librispeech

This model contains 12 encoder layers (LSTM module + Feedforward module). The number of model parameters is 84689496.

The WERs are:

| | test-clean | test-other | comment | decoding mode |
|-------------------------------------|------------|------------|----------------------|----------------------|
| greedy search (max sym per frame 1) | 3.81 | 9.73 | --epoch 35 --avg 15 | simulated streaming |
| greedy search (max sym per frame 1) | 3.78 | 9.79 | --epoch 35 --avg 15 | streaming |
| fast beam search | 3.74 | 9.59 | --epoch 35 --avg 15 | simulated streaming |
| fast beam search | 3.73 | 9.61 | --epoch 35 --avg 15 | streaming |
| modified beam search | 3.64 | 9.55 | --epoch 35 --avg 15 | simulated streaming |
| modified beam search | 3.65 | 9.51 | --epoch 35 --avg 15 | streaming |

The training command is:

```bash
./lstm_transducer_stateless/train.py \
--world-size 4 \
--num-epochs 35 \
--start-epoch 1 \
--exp-dir lstm_transducer_stateless/exp \
--full-libri 1 \
--max-duration 500 \
--master-port 12321 \
--num-encoder-layers 12 \
--rnn-hidden-size 1024
```

The tensorboard log can be found at
<https://tensorboard.dev/experiment/FWrM20mjTeWo6dTpFYOsYQ/>

The simulated streaming decoding command using greedy search is:
```bash
./lstm_transducer_stateless/decode.py \
--epoch 35 \
--avg 15 \
--exp-dir lstm_transducer_stateless/exp \
--max-duration 600 \
--num-encoder-layers 12 \
--rnn-hidden-size 1024
--decoding-method greedy_search \
--use-averaged-model True
```

The simulated streaming decoding command using fast beam search is:
```bash
./lstm_transducer_stateless/decode.py \
--epoch 35 \
--avg 15 \
--exp-dir lstm_transducer_stateless/exp \
--max-duration 600 \
--num-encoder-layers 12 \
--rnn-hidden-size 1024
--decoding-method fast_beam_search \
--use-averaged-model True \
--beam 4 \
--max-contexts 4 \
--max-states 8
```

The simulated streaming decoding command using modified beam search is:
```bash
./lstm_transducer_stateless/decode.py \
--epoch 35 \
--avg 15 \
--exp-dir lstm_transducer_stateless/exp \
--max-duration 600 \
--num-encoder-layers 12 \
--rnn-hidden-size 1024
--decoding-method modified_beam_search \
--use-averaged-model True \
--beam-size 4
```

The streaming decoding command using greedy search is:
```bash
./lstm_transducer_stateless/streaming_decode.py \
--epoch 35 \
--avg 15 \
--exp-dir lstm_transducer_stateless/exp \
--max-duration 600 \
--num-encoder-layers 12 \
--rnn-hidden-size 1024
--decoding-method greedy_search \
--use-averaged-model True
```

The streaming decoding command using fast beam search is:
```bash
./lstm_transducer_stateless/streaming_decode.py \
--epoch 35 \
--avg 15 \
--exp-dir lstm_transducer_stateless/exp \
--max-duration 600 \
--num-encoder-layers 12 \
--rnn-hidden-size 1024
--decoding-method fast_beam_search \
--use-averaged-model True \
--beam 4 \
--max-contexts 4 \
--max-states 8
```

The streaming decoding command using modified beam search is:
```bash
./lstm_transducer_stateless/streaming_decode.py \
--epoch 35 \
--avg 15 \
--exp-dir lstm_transducer_stateless/exp \
--max-duration 600 \
--num-encoder-layers 12 \
--rnn-hidden-size 1024
--decoding-method modified_beam_search \
--use-averaged-model True \
--beam-size 4
```

Pretrained models, training logs, decoding logs, and decoding results
are available at
<https://huggingface.co/Zengwei/icefall-asr-librispeech-lstm-transducer-stateless-2022-08-18>


#### LibriSpeech BPE training results (Pruned Stateless Conv-Emformer RNN-T 2)

[conv_emformer_transducer_stateless2](./conv_emformer_transducer_stateless2)
Expand Down
1 change: 1 addition & 0 deletions egs/librispeech/ASR/lstm_transducer_stateless/__init__.py
Loading