-
Notifications
You must be signed in to change notification settings - Fork 664
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
181 additions
and
76 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# HuBERT Fine-tuning Example | ||
|
||
## Usage | ||
|
||
After finishing the pre-training step, the model can be validated by fine-tuning on the supervised subset of Libri-Light dataset with an extra feed-forward layer on top of the transformer layers. | ||
|
||
During the whole fine-tuning process, the feature extraction layers are freezed (i.e., no gradients is back propagated to these layers). At the first 10k fine-tuning iterations, the transformer layers are freezed, only the CTC layer is trained. After 10k iterations, the transformer layers are fine-tuned along with the CTC layer. | ||
|
||
Sample SLURM command for fine-tuning on `10h` subset of `LibriLightLimited` dataset: | ||
``` | ||
srun --gpus-per-node=1 -N 1 --ntasks-per-node=1 --cpus-per-task=10 \ | ||
python finetune.py --root-path /root/datasets/ --exp-dir ./exp_finetune \ | ||
--checkpoint /exp_iter2/checkpoints_librispeech_hubert_pretrain_base/epoch=361-step=399999.ckpt \ | ||
--gpus 1 --debug --warmup-updates 2000 --hold-updates 8000 --decay-updates 10000 --max-updates 20000 --learning-rate 5e-5 | ||
``` | ||
|
||
# Decoding | ||
|
||
### Viterbi Decoding | ||
The output of CTC layer contains repeated letters, blank symbol ("-"), and silence symbol ("|"). Viterbi decoding unifies the repeated letters into a single letter, removes the blank symbol, and split the string into a list of words by the silence symbol. | ||
|
||
Sample SLURM command for evaluation with Viterbi decoding: | ||
``` | ||
srun python evaluate.py --librispeech_path /root/datasets/ --checkpoint ./exp_finetune/checkpoints_hubert_pretrain_base/epoch\=109-step\=19999.ckpt --split test-clean | ||
``` | ||
|
||
### CTC Decoding with language model | ||
torchaudio provides a CTCDecoder feature that is based on FlashLight. The CTCDecoder supports KenLM language model. Use `--use-lm` to enable CTC decoding with KenLM 4-gram language model. | ||
|
||
Sample SLURM command for evaluation with KenLM language model: | ||
``` | ||
srun python evaluate.py --librispeech_path /root/datasets/ --checkpoint ./exp_finetune/checkpoints_hubert_pretrain_base/epoch\=109-step\=19999.ckpt --split test-clean --use-lm --beam-size 1500 --lm-weight 2.46 --word-score -0.59 | ||
``` | ||
|
||
### WER results | ||
The table below contains WER results for fine-tuning HuBERT Base model on `10h` subset of `LibriLightLimited` dataset. | ||
|
||
| | WER% (Viterbi)| WER% (KenLM) | | ||
|:-----------------:|--------------:|--------------:| | ||
| dev-clean | 10.7 | 4.4 | | ||
| dev-other | 18.3 | 9.7 | | ||
| test-clean | 10.8 | 4.4 | | ||
| test-other | 18.5 | 10.1 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.