Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NoiseReduce Service Added #324

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
processing metrics indicate the time a processor needs to generate all its
output. Note that not all processors generate these kind of metrics.

- `noisereduce.py` which allows you to run noisereduce to reduce background noises
on calls. Important for calls run through Twilio. Added example:
`examples/foundational/07c-i-interruptible-deepgram-noisereduce.py` of noisereduce
with DeepgramSTT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to move this much higher in the file now :-D.

### Changed

- `WhisperSTTService` model can now also be a string.
Expand Down
101 changes: 101 additions & 0 deletions examples/foundational/07c-i-interruptible-deepgram-noisereduce.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

import asyncio
import aiohttp
import os
import sys

from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from pipecat.services.noisereduce import NoiseReduce

from runner import configure

from loguru import logger

from dotenv import load_dotenv
load_dotenv(override=True)

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")


async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True
)
)

stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

nr = NoiseReduce()

tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-helios-en"
)

llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")

messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]

tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)

pipeline = Pipeline([
transport.input(), # Transport user input
nr, # Noise reducer
stt, # STT
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])

task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))

@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])

runner = PipelineRunner()

await runner.run(task)


if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
2 changes: 2 additions & 0 deletions linux-py3.10-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -490,6 +490,8 @@ werkzeug==3.0.3
# via flask
yarl==1.9.4
# via aiohttp
noisereduce==3.0.2
# via noisereduce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have this file anymore.


# The following packages are considered to be unsafe in a requirements file:
# setuptools
4 changes: 3 additions & 1 deletion macos-py3.10-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,8 @@ werkzeug==3.0.3
# via flask
yarl==1.9.4
# via aiohttp

noisereduce==3.0.2
# via noisereduce

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have these files anymore.

# The following packages are considered to be unsafe in a requirements file:
# setuptools
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ silero = [ "torch~=2.3.1", "torchaudio~=2.3.1" ]
websocket = [ "websockets~=12.0", "fastapi~=0.111.0" ]
whisper = [ "faster-whisper~=1.0.3" ]
xtts = [ "resampy~=0.4.3" ]
noisereduce = [ "noisereduce~=3.0.2" ]

[tool.setuptools.packages.find]
# All the following settings are optional:
Expand Down
36 changes: 36 additions & 0 deletions src/pipecat/services/noisereduce.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the file to processors/audio please? this is not actually a third-party service but just a processor.

import noisereduce as nr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a try/catch here and advise the user to do pip install pipecat[noisereduce]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this in other services.

from loguru import logger
import numpy as np
from pipecat.frames.frames import (
AudioRawFrame,
Frame,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor


class NoiseReduce(FrameProcessor):
def __init__(self):
super().__init__()

async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, AudioRawFrame):
self.reduce_noise(frame)
await self.push_frame(frame, direction)

def reduce_noise(self, frame: AudioRawFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix with underscore for private functions _reduce_noise

if frame.num_channels != 1:
logger.error(f"Expected 1 channel, got {frame.num_channels}")
return

# load data
data = np.frombuffer(frame.audio, dtype=np.int16)

# Add a small epsilon to avoid division by zero
epsilon = 1e-10
data = data.astype(np.float32) + epsilon

# perform noise reduction
reduced_noise = nr.reduce_noise(y=data, sr=frame.sample_rate)
frame.audio = np.clip(reduced_noise, -32768, 32767).astype(np.int16).tobytes()
Loading