This software builds on top of carter-voice-assistant project and replaces the Carter API with πΎ OpenAI API. With this integration, the assistant is able to provide more accurate and sophisticated responses to user input.
gpt-voice-assistant pixel-art by JVPC0D3R
GPT-3.5 is the core of the assistant, but this project uses other AI models to extract more data from the user and it's environment:
-
The first model implemented is 𦻠Whisper , which was prebuilt in the original Carter project. Whisper's goal is to listen to the user and transcript it's voice into text.
-
In order to give vision to the assistant, I used π Ultralytics YOLOv8 model, which can detect, classify and track objects in real time.
-
To give the assistant access to the Internet I implemented a π SerpAPI based module.
-
In order for the assistant to know if the user wants to perform one action or another, I implemented a π text classification model, which has to decide if the user input is a chat, a vision query, a google search or a farewell.
-
Also if the user command needs a google search before calling GPT, the assistant has to get arguments to call the SerpAPI. In order to do that I used a π keyword extraction model.
To run the gpt-voice-assistant, you will need to provide an OpenAI API and a SerpAPI key. I suggest creating a python file named keys.py to store the API key variables.
To install and run the gpt-voice-assistant, follow these steps:
git clone https://github.com/JVPRUGBIER/gpt-voice-assistant
Install the required dependencies:
pip install -r requirements.txt
Create a 'keys.py' file in the project directory and add your OpenAI and SerpAPI keys:
OPENAI_API_KEY = "your_api_key"
SERP_API_KEY = "your_api_key"
Chat using text with GPT
python chat.py -t
Chat using text with GPT and let the assistant read the response out loud
python chat.py -t -v
Have a full speech chat with the gpt-voice-assistant
python chat.py -l -v