Installing on either Mac or Linux #3

chadexplains · 2023-05-23T03:52:33Z

A few questions:

Does this need a GPU or is performance OK on CPU?
Where do you set the API key? Would be good to add to README
I can't build on Mac or Linux yet - different errors on each but will keep this thread updated as I go

nalbion · 2023-05-23T07:01:54Z

It works pretty well on a CPU (if you can get it working)

You don't need to use an API key because you're running the server locally and it doesn't need to connect to OpenAI.

I was able to build on Linux in the Dockerfile, but I had issues mounting my audio devices into the container (even though I can run whisper.cpp in an Ubuntu Docker container through WSL.

chadexplains · 2023-05-23T07:17:25Z

I'll try on Ubuntu through Linux on my t2.medium ec2 instance

A little confused about the underlying model being used - you are saying it is served locally, so did OpenAI allow whisper to be downloaded and used? I had assumed all whisper access was via API only (all the GPT models).

To simplify my ask: what's your recommendation for how to spin up this server? Is it Linux + Docker and should hopefully work?

And is it whisper faster by default (with a possibility to switch to whisper.cpp)?

nalbion · 2023-05-23T09:01:28Z

If you plan to run it on an ec2 instance your needs are different from mine. I run it locally to provide text-to-speech to a non-python app. It could be on the same server, or same LAN, but I'm not streaming audio from client to whisper_server. whisper_server owns the mic.

You could adapt this to support streaming audio from the client, but you'd also have to make other changes to support multiple users:

you'd need to run a new run_whisper_loop for each user connection.
you may need different DecodingOptions for task, language, context

chadexplains · 2023-05-23T10:58:02Z

Got it, let me clarify more since it's always helpful to provide as much context as possible to LLMs.

I'm working on a web based (desktop for now) application that does real-time voice (user in the browser) capture and transcription -> response from an LLM -> back to browser as text.

This should run (both web server and transcription) on an EC2 instance. Ideally both can be in python (and connect to a react app in the client, though this part is likely not relevant).

Needs are to be close to real time (let's say 5-20s voice clips from user -- with let's say 5-20s latency e2e being acceptable).

Happy to follow your guidance on how I should be thinking about this. Here's my current view:

python webserver
OpenAi based LLM (for both the the generation and transcription part)
probably using whisper faster (for latency?)
maybe stream transcription but even waiting the full 20s to hear the users clip and transcribe it whole hog would work
multiple users are expected

WDYT?

nalbion · 2023-05-23T11:58:05Z

Do you need to implement your own speech-to-text, rather than use the Web Speech API?

https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition

chadexplains · 2023-05-23T14:30:56Z

Assume yes :)

nalbion · 2023-05-23T23:34:45Z

Can you get the build to work if you delete requirements.txt and run

python -m piptools compile requirements.in --resolver=backtracking

chadexplains · 2023-05-24T05:16:14Z

I'll try that - but did you have a general recommendation on the design I want here?

For example: I have a python web server already -- should I be looking into another solution instead?

nalbion · 2023-05-24T06:21:15Z

If you specifically want to use Whisper, this repo would give you a good starting point.

You'd probably want to stream audio using WebRTC, run a separate thread/process for each connection. This code demonstrates that you don't have to wait for the full 20/30 seconds

chadexplains · 2023-05-24T06:36:33Z

That makes sense - seems like whisper faster integration is what I would want.

One nit: why separate thread/process for each connection -- is that so audio transcription is non blocking for the webserver? I'm on flask right now so it's sync and single threaded IIRC. Will I get poor performance if I just synchronously try and handle the webrtc audio -> openai RPC?

It's not clear to me if python will automatically release GIL on the RPC and therefore "just work" without a performance hit?

Super curious how you think about this

nalbion · 2023-05-24T11:40:11Z

TBH, I'm not sure if your webserver would be handling each request on a separate thread, from what you say I'm guessing it's similar to NodeJS?

NodeJS will try to handle other requests when one request wants to do some I/O and maybe Whisper would allow other requests to use the CPU when some GPU operations were being performed. You'd have to do some load testing to ensure that the web server can still handle regular API & static content requests while multiple users were having speech-to-text processed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installing on either Mac or Linux #3

Installing on either Mac or Linux #3

chadexplains commented May 23, 2023

nalbion commented May 23, 2023

chadexplains commented May 23, 2023

nalbion commented May 23, 2023

chadexplains commented May 23, 2023

nalbion commented May 23, 2023 •

edited

Loading

chadexplains commented May 23, 2023

nalbion commented May 23, 2023

chadexplains commented May 24, 2023 •

edited

Loading

nalbion commented May 24, 2023

chadexplains commented May 24, 2023

nalbion commented May 24, 2023

Installing on either Mac or Linux #3

Installing on either Mac or Linux #3

Comments

chadexplains commented May 23, 2023

nalbion commented May 23, 2023

chadexplains commented May 23, 2023

nalbion commented May 23, 2023

chadexplains commented May 23, 2023

nalbion commented May 23, 2023 • edited Loading

chadexplains commented May 23, 2023

nalbion commented May 23, 2023

chadexplains commented May 24, 2023 • edited Loading

nalbion commented May 24, 2023

chadexplains commented May 24, 2023

nalbion commented May 24, 2023

nalbion commented May 23, 2023 •

edited

Loading

chadexplains commented May 24, 2023 •

edited

Loading