Pause length and rate of speech generation for emotional and engaging speech
The branch congan represents a first attempt at this problem. It involves a conditonal GAN and IMLE trained on CREMA-D and conditioned on emotions
The seq2seq branch represents a sequence labelling model trained on TED-LIUM 3. This attempt looked at engaging speech, rather than emotions.