Missing chunkSize in speech audio file WAV format #513

mudssrali · 2022-03-29T10:50:23Z

We're using text-to-speech Microsoft Cognitive Services to generate speech audios. However we need it in WAV format according to following output format specifications:

Codec: PCMS16LE (araw)
Channel: Mono 
Sample Rate: 8000
Bits per Sample :16

Once we generate audio speech, we pass it to an IVR (Interactive Voice Response) service. When upload azure service generated speech audio file, we're getting error because of audio file format. We further dug into it, no clue until we inspect speech audio file metadata especially Raw Header. More information can be found here: Azure-Samples/cognitive-services-speech-sdk#1450

After detailed analysis on different files Raw Headers, azure generated speech audio files with WAV format (Riff8Khz16BitMonoPcm) don't include chunkSize. We test it by converting speech audio through online tool provided by 3Cx and the uploaded to IVR and it's working fine.

Raw Header - Azure Speech API - success.wav

52 49 46 46 00 00 00 00 57 41 56 45 66 6D 74 20 10 00 00 00 01 00 01 00 40 1F 00 00 80
3E 00 00 02 00 10 00 64 61 74 61 BE 1B 03 00 00 00 FF FF 00 00 01 00 FF FF 00 00 00 00
00 00 00 00 FF FF 00 00 FE FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 01 00 00 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00

success.wav

Raw Header - Converted converted_success.wav

52 49 46 46 E2 1B 03 00 57 41 56 45 66 6D 74 20 10 00 00 00 01 00 01 00 40 1F 00 00 80 3E
00 00 02 00 10 00 64 61 74 61 BE 1B 03 00 00 00 FF FF 00 00 01 00 FF FF 00 00 00 00 00 00
00 00 FF FF 00 00 FE FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 01 00 00 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00

converted_success.wav

Tool that we used for conversion changes only chunkSize bytes from 00 00 00 00 to E2 1B 03 00

More Information about WAVE PCM soundfile format

WAVE PCM soundfile format

The text was updated successfully, but these errors were encountered:

mudssrali · 2022-04-01T16:55:47Z

@glharper can you please suggest a release date for this fix? Thanks

glharper · 2022-04-01T16:57:21Z

@mudssrali mid-April, 2-3 weeks.

dargilco · 2022-04-21T15:36:57Z

Speech SDK 1.21 was release. See release notes: https://docs.microsoft.com/azure/cognitive-services/speech-service/releasenotes?tabs=speech-sdk#speech-sdk-1210-april-2022-release

mudssrali · 2022-04-24T20:01:05Z

Thanks @glharper for fixing this issue and releasing it. It would definitely save dollars of a non-profit organization since we have been paying to a third party service to convert wav into wav to fix headers.

Thank you @dargilco for the notes link. I was about to ask the release.

mudssrali changed the title ~~Missing chunkSize in speech audio file in WAV format~~ Missing chunkSize in speech audio file WAV format Mar 29, 2022

glharper self-assigned this Mar 30, 2022

glharper mentioned this issue Mar 31, 2022

Glharper/wav file size #516

Merged

yulin-li mentioned this issue Apr 1, 2022

Compatibility issue with generated speech audio in WAV format Azure-Samples/cognitive-services-speech-sdk#1450

Closed

glharper closed this as completed in #516 Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing chunkSize in speech audio file WAV format #513

Missing chunkSize in speech audio file WAV format #513

mudssrali commented Mar 29, 2022 •

edited

Loading

mudssrali commented Apr 1, 2022

glharper commented Apr 1, 2022

dargilco commented Apr 21, 2022

mudssrali commented Apr 24, 2022

Missing chunkSize in speech audio file WAV format #513

Missing chunkSize in speech audio file WAV format #513

Comments

mudssrali commented Mar 29, 2022 • edited Loading

mudssrali commented Apr 1, 2022

glharper commented Apr 1, 2022

dargilco commented Apr 21, 2022

mudssrali commented Apr 24, 2022

mudssrali commented Mar 29, 2022 •

edited

Loading