Add Unispeech & Unispeech-SAT #13963

patrickvonplaten · 2021-10-11T16:21:20Z

What does this PR do?

This PR adds UniSpeech from Microsoft: https://github.com/microsoft/UniSpeech

TODOS:

Run UniSpeech models and verify that HF forward pass yields same output
Add UniSpeech checkpoints: https://huggingface.co/microsoft/unispeech-large-1500h-cv
Run UniSpeech-SAT and verify that HF forward pass yields same output (blocked by: Access Required for UniSpeech-SAT models microsoft/UniSpeech#4)
Add UniSpeech-SAT checkpoints
Add UniSpeech vocab and preprocessing (verify with Microsoft)
Add UniSpeech vocab and preprocessing (verify wiht Microsoft)
Verify naming with Microsoft & make README.md's pretty
Clean PR and add tests
Verify fine-tuning works

Future PR:

Correct pretraining loss

patrickvonplaten · 2021-10-11T16:32:34Z

Wait until #13877 is merged

…ransformers into add_unispeech

…into add_unispeech

docs/source/model_doc/unispeech_sat.rst

src/transformers/models/unispeech/modeling_unispeech.py

src/transformers/models/unispeech_sat/modeling_unispeech_sat.py

src/transformers/models/unispeech/modeling_unispeech.py

patrickvonplaten · 2021-10-21T16:43:14Z

PR is good for review IMO:

PreTrained Unispeech checkpoints: https://huggingface.co/models?other=unispeech
PreTrained Unispeech-SAT checkpoints: https://huggingface.co/models?other=unispeech-sat

patrickvonplaten · 2021-10-21T16:45:18Z

I think we can merge the pretrained models now. To make them "promotable" we should still do 2 things:

Unispeech: Add phoneme <-> text tokenizer, need some feedback here from the authors
Unispeech-SAT: the model should work very well for speaker-verification and speaker-diarization. We should add those two tasks and then promote the model on it as it performs very well

sgugger

Thanks a lot for adding those two models!

README.md

docs/source/model_doc/unispeech.rst

src/transformers/models/unispeech/modeling_unispeech.py

src/transformers/models/unispeech_sat/modeling_unispeech_sat.py

tests/test_modeling_unispeech.py

tests/test_modeling_unispeech_sat.py

…ransformers into add_unispeech

anton-l

Looks good, thanks a lot for debugging the original models!

src/transformers/models/unispeech/modeling_unispeech.py

anton-l · 2021-10-25T18:39:37Z

src/transformers/models/unispeech/modeling_unispeech.py

+
+ # quantize all (unmasked) extracted features and project to final vq dim
+ extract_features = self.dropout_features(outputs[1])
+ quantized_features, codevector_perplexity = self.quantizer(extract_features)


Doesn't UniSpeech use the same masking strategy for quantization as Wav2Vec? Or did you remove masking just for debugging purposes?

pretraining is quite different and not implemented yet really - this code should not be used yet

src/transformers/models/unispeech/modeling_unispeech.py

…ransformers into add_unispeech

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

…into add_unispeech

* unispeech * add copy from * remove hubert copy from * finish for today * add unispeech-sat * adapt more * up * up * up * up * add modeling * add tests * up * up * finish * up * Apply suggestions from code review * up * up * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * up * up Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

patrickvonplaten added 4 commits October 11, 2021 16:15

unispeech

d047f17

add copy from

2d9562e

remove hubert copy from

bafdcc6

finish for today

0273ad9

patrickvonplaten added 17 commits October 14, 2021 09:37

add unispeech-sat

38c7ecf

merge conflicts

5c9b66f

adapt more

d2bb748

up

c91bdb6

up

4addd96

up

1f4cdd3

Merge branch 'add_unispeech' of https://github.com/patrickvonplaten/t…

e9b2342

…ransformers into add_unispeech

up

2f510f8

up

923f68d

add modeling

4fd5dcb

add tests

37e1857

Merge branch 'master' of https://github.com/huggingface/transformers …

9582b07

…into add_unispeech

up

7377014

up

24e97dd

Merge branch 'master' of https://github.com/huggingface/transformers …

2392bb7

…into add_unispeech

finish

52bf95c

up

d43f3f9