-
Notifications
You must be signed in to change notification settings - Fork 524
How to synthesize a large file into audio files
The Microsoft cognitive text-to-speech service has some limits, e.g.
- The request SSML cannot be more than 10 minute audio
- The voice elements in a single request should less or equal than 50
In some scenarios where we want to synthesize a long paragraph into a single audio file. We can use the speech SDK to solve this problem.
Batch synthesis API is the recommended solution to generate large audio file. For details, see Batch synthesis API for text to speech.
You can find sample code here.
Firstly, create an audioConfig
using AudioConfig.FromWavFileOutput
, based on which, create a synthesizer.
Then call speak
method many times with shorter sentences, the generated audio for multi speaks will be saved in a single audio file.
The below example does in this way:
-
split the text file into pararaph using by \n or \r. This is because the real time endpoint has a limit of 10 min audio.
-
call SDK to synthesize one by one into the same mp3 file. It has some retrying when the synthesis fails for one paragraph.
public static void SynthesisSsmlToMp3File(string voiceName, string style, string[] paragraphs, string file) { var config = SpeechConfig.FromSubscription("Your key", "you region"); // Sets the synthesis output format. // The full list of supported format can be found here: // https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-text-to-speech#audio-outputs // config.SetSpeechSynthesisOutputFormat((SpeechSynthesisOutputFormat)Enum.Parse(typeof(SpeechSynthesisOutputFormat), codec)); config.SpeechSynthesisVoiceName = voiceName; config.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Audio24Khz96KBitRateMonoMp3); // Creates a speech synthesizer using file as audio output. // Replace with your own audio file name. System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch(); sw.Start(); var fileName = voiceName + ".mp3"; using (var fileOutput = AudioConfig.FromWavFileOutput(file)) using (var synthesizer = new SpeechSynthesizer(config, fileOutput)) { foreach (string pargraph in paragraphs) { var ssml = GenerateSsml(voiceName, pargraph, style); int retry = 3; while (retry > 0) { using (var result = synthesizer.SpeakSsmlAsync(ssml).Result) { if (result.Reason == ResultReason.SynthesizingAudioCompleted) { Console.WriteLine($"success on {voiceName}{ssml} {result.ResultId} in {sw.ElapsedMilliseconds} msec"); break; } else if (result.Reason == ResultReason.Canceled) { Console.WriteLine($"failed on {voiceName}{ssml} {result.ResultId}"); var cancellation = SpeechSynthesisCancellationDetails.FromResult(result); Console.WriteLine($"CANCELED: Reason={cancellation.Reason}"); if (cancellation.Reason == CancellationReason.Error) { Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}"); Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]"); Console.WriteLine($"CANCELED: Did you update the subscription info?"); } } retry--; Console.WriteLine("retrying again..."); } } } } }
- Azure TTS: Empower every person and every organization on the planet to have a delightful digital voice!
- Azure Custom Voice: Build your one-of-a-kind Custom Voice and close to human Neural TTS in cloud and edge!