-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installing on either Mac or Linux #3
Comments
It works pretty well on a CPU (if you can get it working) You don't need to use an API key because you're running the server locally and it doesn't need to connect to OpenAI. I was able to build on Linux in the Dockerfile, but I had issues mounting my audio devices into the container (even though I can run |
I'll try on Ubuntu through Linux on my t2.medium ec2 instance A little confused about the underlying model being used - you are saying it is served locally, so did OpenAI allow whisper to be downloaded and used? I had assumed all whisper access was via API only (all the GPT models). To simplify my ask: what's your recommendation for how to spin up this server? Is it Linux + Docker and should hopefully work? And is it whisper faster by default (with a possibility to switch to whisper.cpp)? |
If you plan to run it on an ec2 instance your needs are different from mine. I run it locally to provide text-to-speech to a non-python app. It could be on the same server, or same LAN, but I'm not streaming audio from client to You could adapt this to support streaming audio from the client, but you'd also have to make other changes to support multiple users:
|
Got it, let me clarify more since it's always helpful to provide as much context as possible to LLMs. I'm working on a web based (desktop for now) application that does real-time voice (user in the browser) capture and transcription -> response from an LLM -> back to browser as text. This should run (both web server and transcription) on an EC2 instance. Ideally both can be in python (and connect to a react app in the client, though this part is likely not relevant). Needs are to be close to real time (let's say 5-20s voice clips from user -- with let's say 5-20s latency e2e being acceptable). Happy to follow your guidance on how I should be thinking about this. Here's my current view:
WDYT? |
Do you need to implement your own speech-to-text, rather than use the Web Speech API? https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition |
Assume yes :) |
Can you get the build to work if you delete
|
I'll try that - but did you have a general recommendation on the design I want here? For example: I have a python web server already -- should I be looking into another solution instead? |
If you specifically want to use Whisper, this repo would give you a good starting point. You'd probably want to stream audio using WebRTC, run a separate thread/process for each connection. This code demonstrates that you don't have to wait for the full 20/30 seconds |
That makes sense - seems like whisper faster integration is what I would want. One nit: why separate thread/process for each connection -- is that so audio transcription is non blocking for the webserver? I'm on flask right now so it's sync and single threaded IIRC. Will I get poor performance if I just synchronously try and handle the webrtc audio -> openai RPC? It's not clear to me if python will automatically release GIL on the RPC and therefore "just work" without a performance hit? Super curious how you think about this |
TBH, I'm not sure if your webserver would be handling each request on a separate thread, from what you say I'm guessing it's similar to NodeJS? NodeJS will try to handle other requests when one request wants to do some I/O and maybe Whisper would allow other requests to use the CPU when some GPU operations were being performed. You'd have to do some load testing to ensure that the web server can still handle regular API & static content requests while multiple users were having speech-to-text processed. |
A few questions:
The text was updated successfully, but these errors were encountered: