Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing chunkSize in speech audio file WAV format #513

Closed
mudssrali opened this issue Mar 29, 2022 · 4 comments · Fixed by #516
Closed

Missing chunkSize in speech audio file WAV format #513

mudssrali opened this issue Mar 29, 2022 · 4 comments · Fixed by #516
Assignees

Comments

@mudssrali
Copy link

mudssrali commented Mar 29, 2022

We're using text-to-speech Microsoft Cognitive Services to generate speech audios. However we need it in WAV format according to following output format specifications:

Codec: PCMS16LE (araw)
Channel: Mono 
Sample Rate: 8000
Bits per Sample :16

Once we generate audio speech, we pass it to an IVR (Interactive Voice Response) service. When upload azure service generated speech audio file, we're getting error because of audio file format. We further dug into it, no clue until we inspect speech audio file metadata especially Raw Header. More information can be found here: Azure-Samples/cognitive-services-speech-sdk#1450

After detailed analysis on different files Raw Headers, azure generated speech audio files with WAV format (Riff8Khz16BitMonoPcm) don't include chunkSize. We test it by converting speech audio through online tool provided by 3Cx and the uploaded to IVR and it's working fine.

Raw Header - Azure Speech API - success.wav

52 49 46 46 00 00 00 00 57 41 56 45 66 6D 74 20 10 00 00 00 01 00 01 00 40 1F 00 00 80
3E 00 00 02 00 10 00 64 61 74 61 BE 1B 03 00 00 00 FF FF 00 00 01 00 FF FF 00 00 00 00
00 00 00 00 FF FF 00 00 FE FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 01 00 00 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00

success.wav

Raw Header - Converted converted_success.wav

52 49 46 46 E2 1B 03 00 57 41 56 45 66 6D 74 20 10 00 00 00 01 00 01 00 40 1F 00 00 80 3E
00 00 02 00 10 00 64 61 74 61 BE 1B 03 00 00 00 FF FF 00 00 01 00 FF FF 00 00 00 00 00 00
00 00 FF FF 00 00 FE FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 01 00 00 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00

converted_success.wav

Tool that we used for conversion changes only chunkSize bytes from 00 00 00 00 to E2 1B 03 00

More Information about WAVE PCM soundfile format
image
WAVE PCM soundfile format

@mudssrali mudssrali changed the title Missing chunkSize in speech audio file in WAV format Missing chunkSize in speech audio file WAV format Mar 29, 2022
@glharper glharper self-assigned this Mar 30, 2022
@mudssrali
Copy link
Author

@glharper can you please suggest a release date for this fix? Thanks

@glharper
Copy link
Member

glharper commented Apr 1, 2022

@mudssrali mid-April, 2-3 weeks.

@dargilco
Copy link
Member

Speech SDK 1.21 was release. See release notes: https://docs.microsoft.com/azure/cognitive-services/speech-service/releasenotes?tabs=speech-sdk#speech-sdk-1210-april-2022-release

@mudssrali
Copy link
Author

Thanks @glharper for fixing this issue and releasing it. It would definitely save dollars of a non-profit organization since we have been paying to a third party service to convert wav into wav to fix headers.

Thank you @dargilco for the notes link. I was about to ask the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants