voi-to-voi

Small Framework for linking different AI models to create a Voice-To-Voice interface.

The app uses NodeJS for the backend, Express for the API, SQLite for the database and Bootstrap for the frontend.

By default the app is configure to use the Whisper model from OpenAI from Speech-to-Text, ChatGPT model also from OpenAI to generate a response, and Google Cloud for the Text-To-Speech.

To run the app with these models, you'll have to provide your own API keys. Instructions on how to aquire them can be found below.

Besides these, adding and configuring new models can be quickly done through the defined steps.

Setup

If you don’t have Node.js installed, install it from here (Node.js version >= 14.6.0 required)
Clone this repository
Navigate into the project directory

$ cd voi-to-voi

Install the requirements

$ npm install

Make a copy of the example environment variables file

On Linux systems:

$ cp .env.example .env

On Windows:

$ copy .env.example .env

Add your API keys to the newly created .env file. In the case of the Google Cloud API Keys, point to the file where the API keys are saved.

If you don't have API keys for OpenAI or Google Cloud but you still want to use their models, you can get their API keys here:
- OpenAI API Keys
- Google Cloud API Keys

Run the app

$ npm start

You should now be able to access the app at http://localhost:3000!

Configure

Configuring existing models can be done through the Config page of the app.

The fields of these configurations are specified in a JSON file for each model which can be found under their corresponding folder. For example: models/ChatGPT/conf.json

The fields that appear in the app can be changed by changing these conf.json files along the lines of the defined fields.

So far there are three types of fields that can be specified:

text
chekbox
dropdown

Whenever a new field is added, the new parameters can be used inside the exe.js file of the corresponding model where they will be contained within the parameters.configuration expression.

Chat

A chat inferace is available in the app that allows for typing messages, or talking using the microphone.

The chat process consists of three stages:

Transcribe (Take speech audio and convert it to text)
Generate (Create a text response from the AI to the given prompt)
Synthsize (Create speech audio from the generated text)

Each of these stages have one or more AI models associated with them, based on their configuration.

Adding New Models

Navigate to the models folder inside the project.
Create a copy of the Template folder and name it based on the name of the new model.
Edit the conf.json file inside this folder to fit the specifications of your model.
Edit the exe.js file to specify the operations that should be performed whenever your model is called.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
api		api
database		database
frontend		frontend
models		models
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voi-to-voi

Setup

Configure

Chat

Adding New Models

About

Languages

License

DottedAnt-Dooz/voi-to-voi

Folders and files

Latest commit

History

Repository files navigation

voi-to-voi

Setup

Configure

Chat

Adding New Models

About

Topics

Resources

License

Stars

Watchers

Forks

Languages