-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FastSpeech2Conformer #23439
Add FastSpeech2Conformer #23439
Conversation
4a19759
to
9b052b4
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
c17b488
to
e48eb17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Connor, apologies for the delay in doing the review. Aside from the handful of small things below, I think this is an excellent PR already.
I can understand some of the design decisions you made, but the Transformers team is pretty strict about how the code should be structured, and so some of this code will have to change in order to be accepted. I don't think it's going to be too much work, though, since what you have here is pretty solid. 😄
src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
30fe94b
to
4bd5efa
Compare
Thanks for the review @hollance! Addressed the comments above, the only part that might need follow-up discussion is making the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Connor, just a few more comments on the code. It's looking very good already!
If you're happy with the code so far, feel free to request a review from @sanchit-gandhi and/or @ArthurZucker.
src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
tests/models/fastspeech2_conformer/test_modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
2b1f0f7
to
5f8ff7d
Compare
Appreciate the comments @hollance, @ArthurZucker @sanchit-gandhi this should be ready for review for you now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great already @connor-henderson! Mainly just refactoring / styling suggestions from my part. Would be great to bring the attention / transformer layers into alignment a bit more with the rest of the library (see suggestions below), and think we can simplify a lot of the __init__
logic by always passing the config
as an attribute and avoiding nested configs.
Happy to clarify any of the points below or answer any questions! Also cc'ing @ylacombe
src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/tokenization_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
tests/models/fastspeech2_conformer/test_modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
tests/models/fastspeech2_conformer/test_tokenization_fastspeech2_conformer.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great already @connor-henderson! Mainly just refactoring / styling suggestions from my part. Would be great to bring the attention / transformer layers into alignment a bit more with the rest of the library (see suggestions below), and think we can simplify a lot of the __init__
logic by always passing the config
as an attribute and avoiding nested configs.
Happy to clarify any of the points below or answer any questions! Also cc'ing @ylacombe
8349819
to
7222ad6
Compare
Thank you for the review @sanchit-gandhi, comments should be addressed now. Centralizing a note on passing the config instead of args here since there were a few comments on that - the other modules mentioned are all instantiated twice with different arg values so they can’t solely be passed the config. Lmk if you think there’s something I missed or if you’d prefer something else like duplicating the modules in order to pass just the config. |
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
def __init__( | ||
self, | ||
config: FastSpeech2ConformerConfig, | ||
attention_heads=4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewer: the reason that we specify these arguments in addition to the config is that the number of attention heads is configurable depending on whether this is the encoder (config.encoder_attention_heads
) or decoder (config.decoder_attention_heads
), so we pass this as an additional argument depending on whether this module belongs to the encoder / decoder resp.
This is in-keeping with other models in the library, e.g. T5:
transformers/src/transformers/models/t5/configuration_t5.py
Lines 55 to 58 in 6232c38
num_layers (`int`, *optional*, defaults to 6): | |
Number of hidden layers in the Transformer encoder. | |
num_decoder_layers (`int`, *optional*): | |
Number of hidden layers in the Transformer decoder. Will use the same value as `num_layers` if not set. |
But we could simplify it and assume that the encoder and decoder will have the same number of heads and ffn dim, in which case this would collapse to just taking the config as input
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/tokenization_fastspeech2_conformer.py
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/tokenization_fastspeech2_conformer.py
Show resolved
Hide resolved
tests/models/fastspeech2_conformer/test_modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating here @connor-henderson, the PR is looking really good! Just a few small suggestions, then I think we can get a final review here and move the checkpoints to the official org
tests/models/fastspeech2_conformer/test_modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
tests/models/fastspeech2_conformer/test_modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
0cf0059
to
858d7a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking super clean - thanks for iterating @connor-henderson! Especially like the new "WithHiFiGanHead" class and overall the FS2 API. I think we've got as many modules as possible to be intialised from the config
-> the remainder need more flexibility (as mentioned in the comments above).
Would love a second opinion from @ArthurZucker on the tokenizer, and more generally on the PR integration before we get this merged!
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
Reviewing now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It already looks nice, great work for a hard model to add! Will do another pass once you adress current comments! Mostly nits on the modeling for readability, more comments on the tokenizer to remove the hard dependency!
Very nice tests 🚀 🤗
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Show resolved
Hide resolved
0c2148f
to
91cf792
Compare
It's been a long ride, but merging now! Thanks again for the great work and your patience ! |
What does this PR do?
Adds the TTS (text-to-speech) conformer version of the FastSpeech2 model. Closest related issue is #15166 though this implements ESPnet's conformer version instead of Fairseq's version as suggested in #15773 (comment).
FastSpeech2 paper (Microsoft)
Conformer version paper (ESPnet)
Conformer version code implementation: https://github.com/espnet/espnet/tree/master/espnet2/tts/fastspeech2
Additional conformer version code code implementation: https://github.com/DigitalPhonetics/IMS-Toucan/blob/ToucanTTS/TrainingInterfaces/Text_to_Spectrogram/FastSpeech2
The paper abstracts say most of this, but the main points of what makes this model an interesting addition are:
FastSpeech
)transformers
(SpeechT5ForTextToSpeech
)To do
fastspeech2_conformer
when ready)Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@hollance @sanchit-gandhi