Skip to content

feat(stt): added Canary-1B STT model support and listing STT models #175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mahimairaja
Copy link
Contributor

What does this PR does?

This PR adds the #167

  1. This pull request has added the capability to use NVIDIA's Canary 1B STT model. Canary supported (English, German, French, and Spanish). get_stt_model("canary/1b", lang="en" )
  2. Also added a function list_stt_models which allows listing all available Speech-to-Text (STT) models in the FastRTC library.

Implementation

Use have to install canary as extra

$ uv add fastrtc"[stt-canary]"

Sample Code

from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model, list_stt_models)
from groq import Groq

print(f"Available STT models: {list_stt_models()}")  
# Available STT models: ['moonshine/base', 'moonshine/tiny', 'canary/1b']

client = Groq()

stt_model = get_stt_model(
                        "canary/1b",
                        lang="en"  # Optional, defaults to "en"
                    )
tts_model = get_tts_model()

def echo(audio):
    prompt = stt_model.stt(audio)

    response = (
        client.chat.completions.create(
            model="llama-3.1-8b-instant",
            max_tokens=200,
            messages=[
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": prompt}
                ]
        )
        .choices[0]
        .message.content
    )

    for audio_chunk in tts_model.stream_tts_sync(response):
        yield audio_chunk

stream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")

stream.ui.launch()

…_models to list down all the supported stt models
@freddyaboulton
Copy link
Collaborator

Awesome! Can you publish as an independent package (maybe fastrtc-canary) and then we can add to a (to be created) Text-to-Speech gallery in the docs.

@freddyaboulton
Copy link
Collaborator

This is great work I just don't want to have to add a different optional dependency for each possible model.

@mahimairaja
Copy link
Contributor Author

Sure! will make it

@mahimairaja
Copy link
Contributor Author

@freddyaboulton So are we making a entirely new package, which is not in this repo?

Technically we will import from entirely new package?

from fastrtc_canary import get_stt_model

. . .

Is this what you mention?

@freddyaboulton
Copy link
Collaborator

Yes that is what I mean

@mahimairaja
Copy link
Contributor Author

Okay!

@mahimairaja
Copy link
Contributor Author

Will make a new repo for this package later part of this week

@freddyaboulton
Copy link
Collaborator

Awesome!

@mahimairaja
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants