BUD-E_V1.0

BUD-E (Buddy) is an open-source voice assistant framework designed to facilitate seamless interaction with AI models and APIs. It enables the creation and integration of diverse skills for educational and research applications.

Architecture Overview

BUD-E_V1.0 operates on a client-server architecture, allowing users to communicate with the assistant on edge devices. The main computation, however, is conducted on a server, which can either be cloud-based or on a local device equipped with a strong GPU.

BUD-E V1.0 uses a client-server architecture:

Server: Handles main computation (speech recognition, language processing, text-to-speech, vision processing).
Client: Manages user interactions (audio recording, playback, clipboard management).

Components

Server Components:

Automatic Speech Recognition (ASR)
Language Model (LLM)
Text-to-Speech (TTS)
Vision Processing (Image Captioning and OCR)

Client Options:

Python Desktop Client (Windows and Linux)
School BUD-E Web Interface

Note: Mac OS support for the desktop client is waiting for you to build it. :)

Installation

Server Setup

Clone the repository:

git clone https://github.com/LAION-AI/BUD-E_V1.0.git
cd BUD-E_V1.0/server

Install dependencies:
```
pip install -r requirements.txt
```
Configure components in their respective files:
- ASR: bud_e_transcribe.py
- LLM: bud_e_llm.py
- TTS: bud_e_tts.py
- Vision: bud_e_captioning_with_ocr.py
Start the server:
```
python bud_e-server.py
```

Client Setup

Navigate to the client directory:
```
cd ../client
```
Install dependencies:
```
pip install -r requirements.txt
```
Configure the client:
- Edit bud_e_client.py to set the server IP and port.
- Obtain a Porcupine API key for wake word detection.
Run the client:
```
python bud_e_client.py
```

Skill System

BUD-E's functionality can be extended through a skill system. Skills are Python functions that can be activated in two ways:

Keyword Activation
Language Model (LM) Activation

Skill Creation

To create a new skill:

Create a Python file in the client/skills folder.
Define the skill function with this structure:

def skill_name(transcription_response, client_session, LMGeneratedParameters=""):
    # Skill logic
    return skill_response, client_session

Add a skill description comment above the function:

For keyword-activated skills:

# KEYWORD ACTIVATED SKILL: [["keyword1"], ["keyword2", "keyword3"], ["phrase1"]]

For LM-activated skills:

# LM ACTIVATED SKILL: SKILL TITLE: Skill Name DESCRIPTION: What the skill does. USAGE INSTRUCTIONS: How to use the skill.

LM-Activated Skill Example

Here's an example of an LM-activated skill that changes the assistant's voice:

# LM ACTIVATED SKILL: SKILL TITLE: Change Voice DESCRIPTION: This skill changes the text-to-speech voice for the assistant's responses. USAGE INSTRUCTIONS: To change the voice, use the following format: <change_voice>voice_name</change_voice>. Replace 'voice_name' with one of the available voices: Stella, Stefanie, Florian, or Thorsten. For example, to change the voice to Stefanie, you would use: <change_voice>Stefanie</change_voice>. The assistant will confirm the voice change or provide an error message if an invalid voice is specified.
def server_side_execution_change_voice(user_input, client_session, params):
    voice_name = params.strip('()')
    valid_voices = ['Stella', 'Stefanie', 'Florian', 'Thorsten']
    
    if voice_name not in valid_voices:
        return f"Invalid voice. Please choose from: {', '.join(valid_voices)}.", client_session
    
    if 'TTS_Config' not in client_session:
        client_session['TTS_Config'] = {}
    
    client_session['TTS_Config']['voice'] = voice_name
    
    print(f"Voice changed to {voice_name}")
    
    return f"Voice successfully changed to {voice_name}.", client_session

This skill demonstrates how LM-activated skills work:

The skill description provides instructions for the language model on how to use the skill.
The skill expects the LM to generate a parameter enclosed in specific tags (e.g., <change_voice>Stefanie</change_voice>).
The params argument in the function receives the content within these tags.
The skill processes this input and updates the client session accordingly.

Customization

BUD-E supports integration with various AI model providers:

ASR: Local Whisper models or cloud services (e.g., Deepgram)
LLM: Commercial APIs (e.g., Groq, OpenAI) or self-hosted models (e.g., VLLM, Ollama)
TTS: Cloud services or local solutions (e.g., FishTTS, StyleTTS 2)
Vision: Custom models or cloud APIs

Refer to the configuration files for integration examples.

Troubleshooting

Common issues and potential solutions:

Dependency installation failures: Try using conda for problematic packages.
API connection errors: Verify API keys, endpoint URLs, and network connectivity.
Wake word detection issues: Ensure correct Porcupine API key configuration.
Performance issues: For local setups, ensure adequate GPU capabilities or optimize model sizes.

Contributing

Best join our Discord community: https://discord.gg/pCPJJXP7Qx

License

Apache 2.0

Acknowledgements

Porcupine for wake word detection
Whisper for speech recognition
FishTTS and StyleTTS 2 for text-to-speech capabilities
Groq, Hyperlab, and other API providers for AI model access

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
client		client
server		server
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BUD-E_V1.0

Architecture Overview

Components

Server Components:

Client Options:

Installation

Server Setup

Client Setup

Skill System

Skill Creation

LM-Activated Skill Example

Customization

Troubleshooting

Contributing

License

Acknowledgements

About

Releases

Packages

Languages

License

LAION-AI/Desktop-BUD-E_V1.0

Folders and files

Latest commit

History

Repository files navigation

BUD-E_V1.0

Architecture Overview

Components

Server Components:

Client Options:

Installation

Server Setup

Client Setup

Skill System

Skill Creation

LM-Activated Skill Example

Customization

Troubleshooting

Contributing

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages