-
Notifications
You must be signed in to change notification settings - Fork 26.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Unispeech & Unispeech-SAT #13963
Add Unispeech & Unispeech-SAT #13963
Conversation
Wait until #13877 is merged |
…into add_unispeech
…into add_unispeech
src/transformers/models/unispeech_sat/modeling_unispeech_sat.py
Outdated
Show resolved
Hide resolved
src/transformers/models/unispeech_sat/modeling_unispeech_sat.py
Outdated
Show resolved
Hide resolved
src/transformers/models/unispeech_sat/modeling_unispeech_sat.py
Outdated
Show resolved
Hide resolved
src/transformers/models/unispeech_sat/modeling_unispeech_sat.py
Outdated
Show resolved
Hide resolved
src/transformers/models/unispeech_sat/modeling_unispeech_sat.py
Outdated
Show resolved
Hide resolved
src/transformers/models/unispeech_sat/modeling_unispeech_sat.py
Outdated
Show resolved
Hide resolved
PR is good for review IMO:
|
I think we can merge the pretrained models now. To make them "promotable" we should still do 2 things:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding those two models!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks a lot for debugging the original models!
|
||
# quantize all (unmasked) extracted features and project to final vq dim | ||
extract_features = self.dropout_features(outputs[1]) | ||
quantized_features, codevector_perplexity = self.quantizer(extract_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't UniSpeech use the same masking strategy for quantization as Wav2Vec? Or did you remove masking just for debugging purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretraining is quite different and not implemented yet really - this code should not be used yet
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
…into add_unispeech
* unispeech * add copy from * remove hubert copy from * finish for today * add unispeech-sat * adapt more * up * up * up * up * add modeling * add tests * up * up * finish * up * Apply suggestions from code review * up * up * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * up * up Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
What does this PR do?
This PR adds UniSpeech from Microsoft: https://github.com/microsoft/UniSpeech
TODOS:
Run UniSpeech models and verify that HF forward pass yields same output
Add UniSpeech checkpoints: https://huggingface.co/microsoft/unispeech-large-1500h-cv
Run UniSpeech-SAT and verify that HF forward pass yields same output (blocked by: Access Required for UniSpeech-SAT models microsoft/UniSpeech#4)
Add UniSpeech-SAT checkpoints
Add UniSpeech vocab and preprocessing (verify with Microsoft)
Add UniSpeech vocab and preprocessing (verify wiht Microsoft)
Verify naming with Microsoft & make README.md's pretty
Clean PR and add tests
Verify fine-tuning works
Future PR: