k2-fsa · JinZr · Oct 27, 2023 · Dec 9, 2022 · Dec 9, 2022 · Jan 15, 2023
diff --git a/egs/librispeech/ASR/README.md b/egs/librispeech/ASR/README.md
@@ -47,6 +47,7 @@ We place an additional Conv1d layer right after the input embedding layer.
 | `conformer-ctc`              | Conformer          | Use auxiliary attention head |
 | `conformer-ctc2`             | Reworked Conformer | Use auxiliary attention head |
 | `conformer-ctc3`             | Reworked Conformer | Streaming version + delay penalty |
+| `zipformer-ctc`              | Zipformer          | Use auxiliary attention head |
 | `zipformer`                  | Upgraded Zipformer | Use auxiliary transducer head | The latest recipe |
 
 # MMI

diff --git a/egs/librispeech/ASR/RESULTS.md b/egs/librispeech/ASR/RESULTS.md
@@ -375,6 +375,55 @@ for m in greedy_search modified_beam_search fast_beam_search; do
 done
 ```
 
+### Zipformer CTC
+
+#### [zipformer_ctc](./zipformer_ctc)
+
+See <https://github.com/k2-fsa/icefall/pull/941> for more details.
+
+You can find a pretrained model, training logs, decoding logs, and decoding
+results at:
+<https://huggingface.co/desh2608/icefall-asr-librispeech-zipformer-ctc>
+
+Number of model parameters: 86083707, i.e., 86.08 M
+
+| decoding method         | test-clean | test-other | comment             |
+|-------------------------|------------|------------|---------------------|
+| ctc-decoding            | 2.50       | 5.86       | --epoch 30 --avg 9  |
 parser.add_argument( 
     "--hlg-scale", 
     type=float, 
     default=0.8, 
     help="""The scale to be applied to `hlg.scores`. 
 parser.add_argument( 
     "--hlg-scale", 
     type=float, 
     default=0.8, 
     help="""The scale to be applied to `hlg.scores`. 
+| whole-lattice-rescoring | 2.44       | 5.38       | --epoch 30 --avg 9  |
+| attention-rescoring     | 2.35       | 5.16       | --epoch 30 --avg 9  |
+| 1best                   | 2.01       | 4.61       | --epoch 30 --avg 9  |
+
+The training commands are:
+```bash
+
+export CUDA_VISIBLE_DEVICES="0,1,2,3"
+
+./zipformer_ctc/train.py \
+  --world-size 4 \
+  --num-epochs 30 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir zipformer_ctc/exp \
+  --full-libri 1 \
+  --max-duration 1000 \
+  --master-port 12345
+```
+
+The tensorboard log can be found at:
+<https://tensorboard.dev/experiment/IjPSJjHOQFKPYA5Z0Vf8wg>
+
+The decoding command is:
+
+```bash
+./zipformer_ctc/decode.py \
+  --epoch 30 --avg 9 --use-averaged-model True \
+  --exp-dir zipformer_ctc/exp \
+  --lang-dir data/lang_bpe_500 \
+  --lm-dir data/lm \
+  --method ctc-decoding
+```
+
 ### pruned_transducer_stateless7 (Fine-tune with mux)
 
 See <https://github.com/k2-fsa/icefall/pull/1059> for more details.
@@ -616,7 +665,6 @@ for m in greedy_search modified_beam_search fast_beam_search; do
 done
 ```
 
-
 #### Smaller model
 
 We also provide a very small version (only 6.1M parameters) of this setup. The training command for the small model is:
@@ -663,6 +711,7 @@ This small model achieves the following WERs on GigaSpeech test and dev sets:
 
 You can find the tensorboard logs at <https://tensorboard.dev/experiment/tAc5iXxTQrCQxky5O5OLyw/#scalars>.
 
+
 ### Streaming Zipformer-Transducer (Pruned Stateless Transducer + Streaming Zipformer)
 
 #### [pruned_transducer_stateless7_streaming](./pruned_transducer_stateless7_streaming)

diff --git a/egs/librispeech/ASR/zipformer_ctc/__init__.py b/egs/librispeech/ASR/zipformer_ctc/__init__.py
diff --git a/egs/librispeech/ASR/zipformer_ctc/asr_datamodule.py b/egs/librispeech/ASR/zipformer_ctc/asr_datamodule.py
@@ -0,0 +1 @@
+../tdnn_lstm_ctc/asr_datamodule.py