Skip to content

Conversation

@ctate
Copy link
Collaborator

@ctate ctate commented Dec 2, 2025

No description provided.

@vercel
Copy link
Contributor

vercel bot commented Dec 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
workflow-builder Ready Ready Preview Comment Dec 2, 2025 9:04am

const audioBuffer = await response.arrayBuffer();
const audioBase64 = Buffer.from(audioBuffer).toString("base64");

const contentType = outputFormat.startsWith("mp3") ? "audio/mpeg" : "audio/wav";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const contentType = outputFormat.startsWith("mp3") ? "audio/mpeg" : "audio/wav";
let contentType: string;
if (outputFormat.startsWith("mp3")) {
contentType = "audio/mpeg";
} else if (outputFormat.startsWith("pcm")) {
contentType = "audio/L16";
} else {
contentType = "audio/mpeg"; // fallback
}

The content type mapping for audio formats is incorrect. PCM formats (pcm_16000, pcm_22050, pcm_24000, pcm_44100) are being labeled as "audio/wav" when they should use a different MIME type like "audio/L16" or "audio/x-raw".

View Details

Analysis

Incorrect MIME type mapping for ElevenLabs PCM audio formats

What fails: textToSpeechStep() in plugins/elevenlabs/steps/text-to-speech.ts incorrectly maps PCM output formats (pcm_16000, pcm_22050, pcm_24000, pcm_44100) to the MIME type audio/wav, which is incorrect for raw PCM data.

How to reproduce:

// Call textToSpeechStep with a PCM output format
const result = await textToSpeechStep({
  voiceId: "your-voice-id",
  text: "Hello world",
  outputFormat: "pcm_16000" // or pcm_22050, pcm_24000, pcm_44100
});

// result.contentType will be "audio/wav" but should be "audio/L16"

Result: The function returns contentType: "audio/wav" for all PCM formats. However, ElevenLabs' PCM formats return raw S16LE (16-bit signed little-endian) PCM audio data without WAV container headers.

Expected:

  • MP3 formats should use audio/mpeg ✓ (already correct)
  • PCM formats should use audio/L16 per RFC 2586 and industry standards for raw L16 PCM
  • The MIME type audio/wav is incorrect because WAV is a container format with RIFF headers, while ElevenLabs PCM returns headerless raw PCM bytes

References:

Impact: Downstream systems that validate or process the audio based on the MIME type will incorrectly treat the raw PCM data as a WAV file with container headers, potentially causing decoding failures or incorrect audio processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants