LIA (Local Intelligent Agent) is a project that implements an AI voice assistant capable of transcribing speech, generate a response, and convert that response text to speech. It utilizes WhisperCPP for speech recognition, Ollama for text generation, and TTS for speech synthesis.
- Real-time audio recording
- Speech-to-text transcription using WhisperCPP
- Text generation using Ollama
- Text-to-speech conversion using TTS
- Audio playback of generated responses
Before you begin, ensure you have met the following requirements:
- Python 3.7 or higher
- Ollama server running locally on port 11434
- WhisperCPP model file (
ggml-base.en.bin
) in the project directory
- Clone this repository:
git clone https://github.com/yourusername/ai-voice-assistant.git
cd ai-voice-assistant
- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows, use venv\Scripts\activate
- Install the required packages:
pip install -r requirements.txt
- Download the WhisperCPP model file (
ggml-base.en.bin
) and place it in the project directory.
-
Ensure that the Ollama server is running on
http://localhost:11434
. -
Run the main script:
python main.py
-
The program will start recording audio. Speak into your microphone.
-
Press Enter to stop the recording.
-
The system will transcribe your speech, generate a response, and play it back through your speakers.
- To change the Ollama model, modify the
"model"
key in thegenerate_response
function. - To use a different TTS model, update the model path in the
TTS
initialization.
Contributions to this project are welcome. Please fork the repository and create a pull request with your changes.
- WhisperCPP for speech recognition
- Ollama for text generation
- TTS for text-to-speech conversion
If you have any questions or feedback, please open an issue in this repository.