About using random combiner to train a narrower and deeper comformer #431

yaozengwei · 2022-06-17T04:44:56Z

@csukuangfj Following @danpovey's advice, I did some experiments on pruned_transducer_stateless5, with the Medium model as in https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md#medium.

Here are some results about modifications of RandomCombine class (final_weight=0.5, pure_prob=0.333).

train on full librispeech, decode with epoch-32-avg-10, use averaged model

random-combine from layer-0, 2.97 & 7.3
random-combine from layer-0, no linear layers in RandomCombine class, 3.02 & 7.29
random-combine from layer-4, 2.98 & 7.23
random-combine from layer-4, no linear layers in RandomCombine class, 2.88, 6.89

The text was updated successfully, but these errors were encountered:

csukuangfj · 2022-06-17T07:35:32Z

I think you can try a larger model with the 4th setting.

yaozengwei · 2022-06-27T03:43:32Z

@danpovey @csukuangfj
For the Baseline model as in https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md#baseline-2, train on full librispeech, use averaged model

decode with epoch-30-avg-10

random-combine from layer-0, 2.49 & 5.75
random-combine from layer-4, no linear layers in RandomCombine class, 2.54, 5.72

decode with epoch-30-avg-17

random-combine from layer-0, 2.51 & 5.73
random-combine from layer-4, no linear layers in RandomCombine class, 2.52, 5.7

danpovey · 2022-06-27T07:39:34Z

OK, well even though it's not better, it should at least be a little faster. It is also more unlikely to cause problems for fixed-point operation: the linear layer can lead to large activations, potentiallyl.

yaozengwei · 2022-06-27T08:30:46Z

OK, well even though it's not better, it should at least be a little faster. It is also more unlikely to cause problems for fixed-point operation: the linear layer can lead to large activations, potentiallyl.

I see. Do I need to create a PR for above modification?

danpovey · 2022-06-27T20:54:48Z

Yes, please.

yaozengwei mentioned this issue Jun 28, 2022

Modification about random combine #452

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About using random combiner to train a narrower and deeper comformer #431

About using random combiner to train a narrower and deeper comformer #431

yaozengwei commented Jun 17, 2022

csukuangfj commented Jun 17, 2022

yaozengwei commented Jun 27, 2022

danpovey commented Jun 27, 2022

yaozengwei commented Jun 27, 2022

danpovey commented Jun 27, 2022

About using random combiner to train a narrower and deeper comformer #431

About using random combiner to train a narrower and deeper comformer #431

Comments

yaozengwei commented Jun 17, 2022

csukuangfj commented Jun 17, 2022

yaozengwei commented Jun 27, 2022

danpovey commented Jun 27, 2022

yaozengwei commented Jun 27, 2022

danpovey commented Jun 27, 2022