Replies: 6 comments 5 replies
-
"Implement a way to do long-form generation. E.g. one possibility is to chunk long input text into smaller pieces and then synthesize the codes each in turn, concatenating them, and vocoding the final result. Given a sentence+source audio is there any way to determine, roughly, the duration that the generated audio would be? |
Beta Was this translation helpful? Give feedback.
-
Also the possibility to train in other languages |
Beta Was this translation helpful? Give feedback.
-
I can look into this. I think the best approach would be to split the input into multiple chunks, optimally based on sentences so that the speech seems continuous, and then always feed the previous chunk as a reference for the model. There is also the problem of leading silences, which do not always seem to be filtered out with the |
Beta Was this translation helpful? Give feedback.
-
I am quite interested into the "Make demo/ user-interface program to rapidly collect human preference ratings between two audio samples, one generated by the model, and one ground truth." Are you looking to compare MARS5 TTS generated outputs against other TTS generated outputs (similar to the lmsys chatbot arena)? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Possibility of being capable of streaming similar to what Coqui AI XTTSV2 can do? Here is an example from their docs: https://docs.coqui.ai/en/latest/models/xtts.html#streaming-manually This is very effective with LLM chatbots for real-time conversations. |
Beta Was this translation helpful? Give feedback.
-
Some areas where we expect the community to help in growing MARS5's capabilities are:
---> Improving inference stability and consistency
---> Speed/performance optimizations
---> Improving reference audio selection when given long references.
---> Benchmark performance numbers for MARS5 on standard speech datasets.
Specific tasks
OPEN FOR CONTRIBUTIONS
Beta Was this translation helpful? Give feedback.
All reactions