Skip to content

Choppy audio with default output_frame_size #204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vvolhejn opened this issue Mar 21, 2025 · 3 comments · Fixed by #210
Closed

Choppy audio with default output_frame_size #204

vvolhejn opened this issue Mar 21, 2025 · 3 comments · Fixed by #210

Comments

@vvolhejn
Copy link
Contributor

With the default output_frame_size of 960, I can't get seamless audio playback from a file. This is the example I created:

import argparse
import asyncio
from pathlib import Path

import numpy as np
import sphn
from fastrtc import AsyncStreamHandler, Stream, wait_for_item

SAMPLE_RATE = 24000
# 480 works but 960 or higher does not!
OUTPUT_FRAME_SIZE = 960


class FilePlaybackHandler(AsyncStreamHandler):
    def __init__(self, audio_path: Path) -> None:
        super().__init__(
            input_sample_rate=SAMPLE_RATE,
            output_sample_rate=SAMPLE_RATE,
            output_frame_size=OUTPUT_FRAME_SIZE,
        )
        self.output_queue = asyncio.Queue()
        self.audio_path = audio_path

    async def receive(self, frame: tuple[int, np.ndarray]) -> None:
        pass

    async def emit(self) -> tuple[int, np.ndarray]:
        return await wait_for_item(self.output_queue)

    def copy(self):
        return FilePlaybackHandler(self.audio_path)

    async def start_up(self) -> None:
        data, _sr = sphn.read(self.audio_path, sample_rate=SAMPLE_RATE)
        data = data[0]  # Take first channel to make it mono

        simulated_ratio = 1.5

        for i in range(0, len(data), OUTPUT_FRAME_SIZE):
            await self.output_queue.put((SAMPLE_RATE, data[i : i + OUTPUT_FRAME_SIZE]))
            # Optional - delay to simulate streaming. Works the same either way.
            await asyncio.sleep(OUTPUT_FRAME_SIZE / SAMPLE_RATE / simulated_ratio)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("file", type=Path)
    args = parser.parse_args()

    stream = Stream(
        handler=FilePlaybackHandler(args.file),
        modality="audio",
        mode="send-receive",
    )

    stream.ui.launch(debug=True)

I run this on a .wav file. This is what I get:

Screen.Recording.2025-03-21.at.14.52.07.mov

After a hair-pulling-inducing amount of debugging, I had another look at the OpenAI example and saw that it sets output_frame_size=480. And voilà, using this value magically makes the playback work.

Why does this happen? Is this expected behavior? In "Advanced Configuration" I found the innocent-looking note

In general it is best to leave these settings untouched. In some cases, lowering the output_frame_size can yield smoother audio playback.

but why is 480 not the default then? I think this should be more prominent, or there should be some kind of warning if choppy audio is detected that could be caused by the frame size...

Thank you!

@freddyaboulton
Copy link
Collaborator

Sorry about this @vvolhejn . The output_frame_size should not be a parameter - it should just be 0.2 * output_sample_rate. Would you like to open a PR?

I think we can just ignore the parameter and always set it to 0.2 * the output_frame_rate.

@vvolhejn
Copy link
Contributor Author

I'm happy to open a PR for this - do you mean output_frame_size should be 0.02 * output_sample_rate? And a deprecation warning when somebody tries to override the parameter?

@freddyaboulton
Copy link
Collaborator

Yea that is what I am thinking thanks @vvolhejn !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants