Add recipe for the yes_no dataset. #16

csukuangfj · 2021-08-21T09:34:54Z

There are 60 sound files in the dataset. 30 sound files are used for training and the other 30 files are used for testing.

The decoding log is below:

$ ./tdnn/decode.py --epoch 49
2021-08-21 17:20:27,047 INFO [decode.py:321] Decoding started
2021-08-21 17:20:27,047 INFO [decode.py:322] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 23, 'subsampling_factor': 1, 'search_beam': 20, 'output_beam': 5, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'method': '1best', 'num_paths': 30, 'epoch': 49, 'avg': 15, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-08-21 17:20:27,048 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-08-21 17:20:27,109 INFO [decode.py:331] device: cuda:0
2021-08-21 17:20:31,515 INFO [decode.py:390] averaging ['tdnn/exp/epoch-35.pt', 'tdnn/exp/epoch-36.pt', 'tdnn/exp/epoch-37.pt', 'tdnn/exp/epoch-38.pt', 'tdnn/exp/epoch-39.pt', 'tdnn/exp/epoch-40.pt', 'tdnn/exp/epoch-41.pt', 'tdnn/exp/epoch-42.pt', 'tdnn/exp/epoch-43.pt', 'tdnn/exp/epoch-44.pt', 'tdnn/exp/epoch-45.pt', 'tdnn/exp/epoch-46.pt', 'tdnn/exp/epoch-47.pt', 'tdnn/exp/epoch-48.pt', 'tdnn/exp/epoch-49.pt']
2021-08-21 17:20:31,540 INFO [asr_datamodule.py:216] About to get test cuts
2021-08-21 17:20:31,540 INFO [asr_datamodule.py:243] About to get test cuts
2021-08-21 17:20:31,846 INFO [decode.py:270] batch 0/8, cuts processed until now is 4
2021-08-21 17:20:33,255 INFO [decode.py:285] The transcripts are stored in tdnn/exp/recogs-test-no_rescore.txt
2021-08-21 17:20:33,256 INFO [utils.py:300] [test-no_rescore] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
2021-08-21 17:20:33,258 INFO [decode.py:294] Wrote detailed error stats to tdnn/exp/errs-test-no_rescore.txt
2021-08-21 17:20:33,258 INFO [decode.py:308]
For test, WER of different settings are:
no_rescore      0.42    best for test

2021-08-21 17:20:33,258 INFO [decode.py:418] Done!

You see there is only 1 deletion error.

The dataset is so small that it can run on the CPU.

It is useful for education and demonstration purposes as it involves almost all concepts used in the training and decoding, i.e.,

data preparation
lexicon preparation
LM preparation
HLG construction
CTC training
1best decoding

(It does not contain LM rescoring)

Requires lhotse-speech/lhotse#380

--

TODOs:

Refactor the training and decoding code, remove those that are not needed
Add GitHub actions to run it
~~Use a colab notebook to run it~~ See
Support inferencing with a pre-trained model

csukuangfj · 2021-08-21T09:42:55Z

The code for selecting the training set and test set can be found in
lhotse-speech/lhotse#380

See https://github.com/lhotse-speech/lhotse/blob/ba534a08fc17196f4caf27433587a54779991826/lhotse/recipes/yesno.py#L138-L143

    wave_files = list(corpus_dir.glob("*.wav"))
    assert len(wave_files) == 60

    wave_files.sort()
    train_set = wave_files[::2]
    test_set = wave_files[1::2]

    assert len(train_set) == 30
    assert len(test_set) == 30

danpovey · 2021-08-21T10:13:20Z

Cool!

pkufool · 2021-08-22T14:32:21Z

egs/yesno/ASR/local/compile_hlg.py

+
+    first_token_disambig_id = lexicon.token_table["#0"]
+    first_word_disambig_id = lexicon.word_table["#0"]
+


Do we need to make the following k2 operations run on GPU if there are devices available?

For the yesno dataset, the graphs are tiny. It's ok to run them on CPU.

For the librispeech dataset, I think it's worthwhile to have some benchmarks. If GPU is faster, we can switch to it.

pkufool · 2021-08-22T14:33:58Z

egs/yesno/ASR/local/compute_fbank_yesno.py

+
+"""
+This file computes fbank features of the yesno dataset.
+Its looks for manifests in the directory data/manifests.


Its -> It ?

pkufool · 2021-08-22T14:42:25Z

egs/yesno/ASR/tdnn/asr_datamodule.py

+                shuffle=self.args.shuffle,
+                num_buckets=self.args.num_buckets,
+                bucket_method="equal_duration",
+                drop_last=True,


Do we need make these two arguments configurable?

Yes, will make it configurable.

csukuangfj · 2021-08-22T15:47:44Z

I just wrote a Colab notebook to run the yesno recipe, with CPU.

The training time for 50 epochs is within 2 minutes (with CPU).

See

You will see the following in the above Colab notebook:

Environment setup (Install torch, torchaudio, k2, lhotse, icefall)
Data preparation
Training
Decoding

Part of the training log is given below:

2021-08-22 15:18:20,422 INFO [train.py:460] Training started
2021-08-22 15:18:20,423 INFO [train.py:461] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.001, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'num_epochs': 50, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-08-22 15:18:22,039 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-08-22 15:18:22,187 INFO [asr_datamodule.py:132] About to get train cuts
2021-08-22 15:18:22,188 INFO [asr_datamodule.py:237] About to get train cuts
2021-08-22 15:18:22,191 INFO [asr_datamodule.py:135] About to create train dataset
2021-08-22 15:18:22,192 INFO [asr_datamodule.py:197] Using SingleCutSampler.
2021-08-22 15:18:22,192 INFO [asr_datamodule.py:203] About to create train dataloader
2021-08-22 15:18:22,192 INFO [asr_datamodule.py:216] About to get test cuts
2021-08-22 15:18:22,192 INFO [asr_datamodule.py:243] About to get test cuts
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
2021-08-22 15:18:22,584 INFO [train.py:412] Epoch 0, batch 0, batch avg loss 1.0879, total avg loss: 1.0879, batch size: 4
2021-08-22 15:18:23,268 INFO [train.py:412] Epoch 0, batch 10, batch avg loss 0.5386, total avg loss: 0.7594, batch size: 4
2021-08-22 15:18:23,728 INFO [train.py:428] Epoch 0, valid loss 0.9149, best valid loss: 0.9149 best valid epoch: 0
2021-08-22 15:18:24,213 INFO [train.py:412] Epoch 0, batch 20, batch avg loss 0.3465, total avg loss: 0.6211, batch size: 3
2021-08-22 15:18:24,614 INFO [train.py:428] Epoch 0, valid loss 0.3521, best valid loss: 0.3521 best valid epoch: 0
2021-08-22 15:18:24,628 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-0.pt
2021-08-22 15:18:24,804 INFO [train.py:412] Epoch 1, batch 0, batch avg loss 0.4360, total avg loss: 0.4360, batch size: 5
2021-08-22 15:18:25,460 INFO [train.py:412] Epoch 1, batch 10, batch avg loss 0.2444, total avg loss: 0.3159, batch size: 5
2021-08-22 15:18:25,756 INFO [train.py:428] Epoch 1, valid loss 0.1264, best valid loss: 0.1264 best valid epoch: 1
2021-08-22 15:18:26,288 INFO [train.py:412] Epoch 1, batch 20, batch avg loss 0.2659, total avg loss: 0.2966, batch size: 3
2021-08-22 15:18:26,617 INFO [train.py:428] Epoch 1, valid loss 0.1510, best valid loss: 0.1264 best valid epoch: 1
2021-08-22 15:18:26,635 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-1.pt
2021-08-22 15:18:26,796 INFO [train.py:412] Epoch 2, batch 0, batch avg loss 0.1710, total avg loss: 0.1710, batch size: 4
2021-08-22 15:18:27,411 INFO [train.py:412] Epoch 2, batch 10, batch avg loss 0.2394, total avg loss: 0.2257, batch size: 5
2021-08-22 15:18:27,650 INFO [train.py:428] Epoch 2, valid loss 0.1196, best valid loss: 0.1196 best valid epoch: 2
2021-08-22 15:18:28,214 INFO [train.py:412] Epoch 2, batch 20, batch avg loss 0.2267, total avg loss: 0.2257, batch size: 3
2021-08-22 15:18:28,482 INFO [train.py:428] Epoch 2, valid loss 0.0662, best valid loss: 0.0662 best valid epoch: 2
2021-08-22 15:18:28,496 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-2.pt

...

2021-08-22 15:20:03,495 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-47.pt
2021-08-22 15:20:03,656 INFO [train.py:412] Epoch 48, batch 0, batch avg loss 0.0124, total avg loss: 0.0124, batch size: 4
2021-08-22 15:20:04,250 INFO [train.py:412] Epoch 48, batch 10, batch avg loss 0.0127, total avg loss: 0.0174, batch size: 4
2021-08-22 15:20:04,547 INFO [train.py:428] Epoch 48, valid loss 0.0108, best valid loss: 0.0108 best valid epoch: 48
2021-08-22 15:20:05,095 INFO [train.py:412] Epoch 48, batch 20, batch avg loss 0.0191, total avg loss: 0.0188, batch size: 4
2021-08-22 15:20:05,432 INFO [train.py:428] Epoch 48, valid loss 0.0106, best valid loss: 0.0106 best valid epoch: 48
2021-08-22 15:20:05,487 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-48.pt
2021-08-22 15:20:05,686 INFO [train.py:412] Epoch 49, batch 0, batch avg loss 0.0168, total avg loss: 0.0168, batch size: 4
2021-08-22 15:20:06,361 INFO [train.py:412] Epoch 49, batch 10, batch avg loss 0.0193, total avg loss: 0.0228, batch size: 4
2021-08-22 15:20:06,733 INFO [train.py:428] Epoch 49, valid loss 0.0113, best valid loss: 0.0106 best valid epoch: 48
2021-08-22 15:20:07,312 INFO [train.py:412] Epoch 49, batch 20, batch avg loss 0.0193, total avg loss: 0.0206, batch size: 3
2021-08-22 15:20:07,680 INFO [train.py:428] Epoch 49, valid loss 0.0109, best valid loss: 0.0106 best valid epoch: 48
2021-08-22 15:20:07,707 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-49.pt
2021-08-22 15:20:07,710 INFO [train.py:532] Done!

The decoding log is:

2021-08-22 15:21:07,711 INFO [decode.py:261] Decoding started
2021-08-22 15:21:07,711 INFO [decode.py:262] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 23, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 49, 'avg': 15, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-08-22 15:21:07,712 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-08-22 15:21:07,727 INFO [decode.py:271] device: cpu
2021-08-22 15:21:07,731 INFO [decode.py:291] averaging ['tdnn/exp/epoch-35.pt', 'tdnn/exp/epoch-36.pt', 'tdnn/exp/epoch-37.pt', 'tdnn/exp/epoch-38.pt', 'tdnn/exp/epoch-39.pt', 'tdnn/exp/epoch-40.pt', 'tdnn/exp/epoch-41.pt', 'tdnn/exp/epoch-42.pt', 'tdnn/exp/epoch-43.pt', 'tdnn/exp/epoch-44.pt', 'tdnn/exp/epoch-45.pt', 'tdnn/exp/epoch-46.pt', 'tdnn/exp/epoch-47.pt', 'tdnn/exp/epoch-48.pt', 'tdnn/exp/epoch-49.pt']
/content/icefall/icefall/checkpoint.py:129: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:450.)
  avg[k] //= n
2021-08-22 15:21:07,755 INFO [asr_datamodule.py:216] About to get test cuts
2021-08-22 15:21:07,755 INFO [asr_datamodule.py:243] About to get test cuts
2021-08-22 15:21:07,891 INFO [decode.py:203] batch 0/8, cuts processed until now is 4
2021-08-22 15:21:08,111 INFO [decode.py:240] The transcripts are stored in tdnn/exp/recogs-test_set.txt
2021-08-22 15:21:08,112 INFO [utils.py:301] [test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
2021-08-22 15:21:08,113 INFO [decode.py:248] Wrote detailed error stats to tdnn/exp/errs-test_set.txt
2021-08-22 15:21:08,113 INFO [decode.py:311] Done!

csukuangfj · 2021-08-22T15:50:35Z

@pzelasko

Could you have a look at the above Colab notebook about the installation of lhotse?

The shebang is changed from #!/usr/bin/env python3 to #!python after installation and I have
to correct it manually.

[EDITED]: If I don't, it throws the following error while running egs/yesno/ASR/prepare.sh:

2021-08-22 15:55:43 (prepare.sh:24:main) dl_dir: /content/icefall/egs/yesno/ASR/download
2021-08-22 15:55:43 (prepare.sh:27:main) stage 0: Download data
./prepare.sh: /usr/local/bin/lhotse: python: bad interpreter: No such file or directory

pzelasko · 2021-08-23T02:46:15Z

@pzelasko

Could you have a look at the above Colab notebook about the installation of lhotse?

The shebang is changed from #!/usr/bin/env python3 to #!python after installation and I have
to correct it manually.

[EDITED]: If I don't, it throws the following error while running egs/yesno/ASR/prepare.sh:
2021-08-22 15:55:43 (prepare.sh:24:main) dl_dir: /content/icefall/egs/yesno/ASR/download
2021-08-22 15:55:43 (prepare.sh:27:main) stage 0: Download data
./prepare.sh: /usr/local/bin/lhotse: python: bad interpreter: No such file or directory

Yes I’ll have a look tomorrow.

Add recipe for the yes_no dataset.

f246f0c

csukuangfj mentioned this pull request Aug 21, 2021

Add recipe for the yes_no dataset. lhotse-speech/lhotse#380

Merged

pzelasko approved these changes Aug 21, 2021

View reviewed changes

Refactoring: Remove unused code.

09587d1

pkufool reviewed Aug 22, 2021

View reviewed changes

Add Colab notebook for the yesno dataset.

88166c5

csukuangfj added 7 commits August 23, 2021 07:50

Add GitHub actions to run yesno.

f65525d

Fix a typo.

1bdfcb6

Minor fixes.

3ffcd95

Train more epochs for GitHub actions.

6617d58

Minor fixes.

22dc936

Minor fixes.

7edc0c6

Merge remote-tracking branch 'dan/master' into yesno

b06f4cb

Fix style issues.

c6e3e10

csukuangfj merged commit 6c2c9b9 into k2-fsa:master Aug 23, 2021

csukuangfj deleted the yesno branch August 23, 2021 03:36

Lzhang-hub mentioned this pull request Oct 20, 2021

CUDA out of memory in decoding #70

Open

danpovey mentioned this pull request Nov 27, 2021

Decoding error 'Fsa' object doesn't support assignment. #133

Open

ahazned mentioned this pull request Apr 13, 2022

Illegal memory error when training with multi-GPU #247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add recipe for the yes_no dataset. #16

Add recipe for the yes_no dataset. #16

csukuangfj commented Aug 21, 2021 •

edited

Loading

csukuangfj commented Aug 21, 2021

danpovey commented Aug 21, 2021

pkufool Aug 22, 2021

csukuangfj Aug 22, 2021

pkufool Aug 22, 2021

pkufool Aug 22, 2021

csukuangfj Aug 22, 2021

csukuangfj commented Aug 22, 2021

csukuangfj commented Aug 22, 2021 •

edited

Loading

pzelasko commented Aug 23, 2021


		first_token_disambig_id = lexicon.token_table["#0"]
		first_word_disambig_id = lexicon.word_table["#0"]

Add recipe for the yes_no dataset. #16

Add recipe for the yes_no dataset. #16

Conversation

csukuangfj commented Aug 21, 2021 • edited Loading

TODOs:

csukuangfj commented Aug 21, 2021

danpovey commented Aug 21, 2021

pkufool Aug 22, 2021

Choose a reason for hiding this comment

csukuangfj Aug 22, 2021

Choose a reason for hiding this comment

pkufool Aug 22, 2021

Choose a reason for hiding this comment

pkufool Aug 22, 2021

Choose a reason for hiding this comment

csukuangfj Aug 22, 2021

Choose a reason for hiding this comment

csukuangfj commented Aug 22, 2021

csukuangfj commented Aug 22, 2021 • edited Loading

pzelasko commented Aug 23, 2021

csukuangfj commented Aug 21, 2021 •

edited

Loading

csukuangfj commented Aug 22, 2021 •

edited

Loading