-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the motivation for inserting blank IDs between the input IPA-ids? #94
Comments
Hello, that is a great question! TLDR: More details: Modern neural network-based speech synthesisers are much more powerful approximators. So, the idea behind adding an extra state is to provide a placeholder for the MAS to learn such dynamic variation and transition between sounds, where two states seem to be a nice compromise between having the model learn these dynamic variations when needed and jumping directly to next sound in case, it doesn't need to learn that variation (some transitions don't need a gap between them) and also fewer tensors on the GPU than having 3 states like in HMM-based synthesisers. Hope this helps :) |
Hello, could you please help me understand the motivation for inserting blank IDs between the input IPA-ids? The implementation code can be found in text_mel_datamodule.py line216:
def get_text(self, text, add_blank=True):
text_norm, cleaned_text = text_to_sequence(text, self.cleaners)
if self.add_blank: #True
text_norm = intersperse(text_norm, 0)
text_norm = torch.IntTensor(text_norm)
return text_norm, cleaned_text
thanks.
The text was updated successfully, but these errors were encountered: