GIVA is a vocal assistant that combines speech recognition and text-to-speech with the capabilities of GPT (3.5-turbo or 4). Prompts are engineered so that GPT provides outputs that are short and adapted to be converted to audio.
- Speech Recognition: GIVA employs the
openai/whisper
model for accurate transcription of speech inputs. It's possibile to choose between the tiny, small, medium, and large v2 versions of the mdoel. - GPT Chat Completion: The user can choose between GPT-3.5-turbo and GPT-4 to interact with the vocal assistant.
- Possibility to Operate on GPT Parameters: The user can operate on parameters such as Temperature, Presence Penality, and Frequency Penality.
- Text-to-Speech: With the
microsoft/speecht5_tts
model, GIVA generates an audio output. - Interactive Interface: The application consists of two tabs. The first tab exclusively presents the audio output, while the second tab provides additional information, including the output of Automatic Speech Recognition (ASR) and the responses generated by GPT.
The user can select from different ASR models, such as:
The user can select from different ASR models, such as:
- GPT-3.5-turbo
- GPT-4
The user can operate on:
- Temperature
- Presence Penality
- Frequency Penality
- Maximum Number of Tokens