Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance configuration management and expand documentation #55

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions .config/sample-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# For a detailed overview of each configuration option, see docs/configuration-options.md

user:
name: Owl
client_token: <client-token>
voice_sample_filepath:

web:
base_url: http://localhost
port: 3000
environment: development
api:
base_url: http://localhost
port: 8000

llm:
model: ollama/mistral:instruct
base_url: http://localhost
port: 11434
api_key:

async_transcription:
provider: deepgram

streaming_transcription:
provider: deepgram

deepgram:
api_key: <your-deepgram-api-key>
model: nova-2
language: en-US

async_whisper:
host: localhost
port: 8010
hf_token: <your-hf-token>
device: cpu
compute_type: int8
batch_size: 16
model: tiny
verification_threshold: 0.1
verification_model_source: speechbrain/spkrec-ecapa-voxceleb
verification_model_directory: .models/spkrec-ecapa-voxceleb

streaming_whisper:
host: localhost
port: 8009
model: small
language: en
silero_sensitivity: 0.4
webrtc_sensitivity: 2
post_speech_silence_duration: 0.5

database:
url: sqlite:///./db.sqlite3

captures:
directory: captures

vad:
directory: .models/vad

conversation_endpointing:
timeout_seconds: 300
min_utterances: 2

notification:
apn_team_id:

udp:
enabled: false
host: 0.0.0.0
port: 8001

google_maps:
api_key:

bing:
subscription_key:

prompt:
suggest_links_system_message: >
You are the world's most advanced AI assistant. You are given the transcript of an interaction.
One of the participants is your client. Their name is {config.user.name}. Your task is to generate a
rich search query based on the summary of the interaction. You want to optimize the search query
to get maximally interesting relevant links for {config.user.name}. IMPORTANT: Try and make your search
query about a single subject that is most relevant to the interaction. Make it as specific as
possible and only pick one subject. Don't include {config.user.name}'s name in the output, just output
the query and nothing else. VERY IMPORTANT: You must just output the search engine query without
any prefix and nothing else!
summarization_system_message: >
You are the world's most advanced AI assistant. You are given the transcript of an interaction.
One of the participants is your client. Their name is {config.user.name}. The transcript includes
speaker ids, but unfortunately sometimes we don't know the specific person name and sometimes they
can be mislabeled. Do your best to infer the participants based on the context, but never referred
to the speaker ids in the summary because they alone are not useful. Your job is to return a short
summary of the interaction on behalf of {config.user.name} so they can remember what was happening.
This is for {config.user.name}'s memories so please include anything that might be useful but also
make it narrative so that it's helpful for creating a cherished memory. Format your summary with
the following sections: Summary, Atmosphere, Key Take aways (bullet points)
short_summarization_system_message: >
You are the world's most advanced AI assistant. You are given the transcript of an interaction. One
of the participants is your client. Their name is {config.user.name}. The transcript includes
speaker ids, but unfortunately sometimes we don't know the specific person name and sometimes they
can be mislabeled. Do your best to infer the participants based on the context, but never referred
to the speaker ids in the summary because they alone are not useful. Your job is to return a one
sentence summary of the interaction on behalf of {config.user.name}. It should capture the overall
significance of the interaction but not exceed one sentence.
115 changes: 115 additions & 0 deletions .config/sample-env.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# For a detailed overview of each configuration option, see docs/configuration-options.md

# User Configuration
# ------------------
# OWL_USER_CLIENT_TOKEN=
# OWL_USER_NAME=
# OWL_USER_VOICE_SAMPLE_FILEPATH=


# Web Configuration
# -----------------
# OWL_WEB_BASE_URL=
# OWL_WEB_PORT=
# OWL_WEB_ENVIRONMENT=
# OWL_WEB_BACKEND_BASE_URL=
# OWL_WEB_BACKEND_PORT=


# LLM Configuration
# -----------------
# OWL_LLM_MODEL=
# OWL_LLM_BASE_URL=
# OWL_LLM_PORT=
# OWL_LLM_API_KEY=


# Async Transcription Configuration
# ---------------------------------
# OWL_ASYNC_TRANSCRIPTION_PROVIDER=


# Streaming Transcription Configuration
# --------------------------------------
# OWL_STREAMING_TRANSCRIPTION_PROVIDER=


# Deepgram Configuration
# ----------------------
# OWL_DEEPGRAM_API_KEY=
# OWL_DEEPGRAM_MODEL=
# OWL_DEEPGRAM_LANGUAGE=


# Async Whisper Configuration
# ---------------------------
# OWL_ASYNC_WHISPER_HOST=
# OWL_ASYNC_WHISPER_PORT=
# OWL_ASYNC_WHISPER_HF_TOKEN=
# OWL_ASYNC_WHISPER_DEVICE=
# OWL_ASYNC_WHISPER_COMPUTE_TYPE=
# OWL_ASYNC_WHISPER_BATCH_SIZE=
# OWL_ASYNC_WHISPER_MODEL=
# OWL_ASYNC_WHISPER_VERIFICATION_THRESHOLD=
# OWL_ASYNC_WHISPER_VERIFICATION_MODEL_SOURCE=
# OWL_ASYNC_WHISPER_VERIFICATION_MODEL_DIRECTORY=


# Streaming Whisper Configuration
# -------------------------------
# OWL_STREAMING_WHISPER_HOST=
# OWL_STREAMING_WHISPER_PORT=
# OWL_STREAMING_WHISPER_MODEL=
# OWL_STREAMING_WHISPER_LANGUAGE=
# OWL_STREAMING_WHISPER_SILERO_SENSITIVITY=
# OWL_STREAMING_WHISPER_WEBRTC_SENSITIVITY=
# OWL_STREAMING_WHISPER_POST_SPEECH_SILENCE_DURATION=


# Database Configuration
# ----------------------
# OWL_DATABASE_URL=


# Captures Configuration
# ----------------------
# OWL_CAPTURES_DIRECTORY=

# VAD Configuration
# -----------------
# OWL_VAD_DIRECTORY=


# Conversation Endpointing Configuration
# --------------------------------------
# OWL_CONVERSATION_ENDPOINTING_TIMEOUT_SECONDS=
# OWL_CONVERSATION_ENDPOINTING_MIN_UTTERANCES=


# Notification Configuration
# --------------------------
# OWL_NOTIFICATION_APN_TEAM_ID=


# UDP Configuration
# -----------------
# OWL_UDP_ENABLED=
# OWL_UDP_HOST=
# OWL_UDP_PORT=


# Google Maps Configuration
# -------------------------
#OWL_GOOGLE_MAPS_API_KEY=


# Bing Configuration
# ------------------
# OWL_BING_SUBSCRIPTION_KEY=


# Prompt Configuration
# --------------------
# OWL_PROMPT_SUGGEST_LINKS_SYSTEM_MESSAGE=
# OWL_PROMPT_SUMMARIZATION_SYSTEM_MESSAGE=
# OWL_PROMPT_SHORT_SUMMARIZATION_SYSTEM_MESSAGE=
2 changes: 1 addition & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -135,4 +135,4 @@ venv.bak/

# Exclude the Dockerfile and .dockerignore to prevent potential leaks
Dockerfile
.dockerignore
.dockerignore
8 changes: 4 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Local configuration files
/*.yaml
.env
config.yaml

# Local database
*.sqlite3
Expand All @@ -15,12 +16,11 @@ audio_cache/
# vscode
.vscode/launch.json


# Local voice samples
voice_samples/

# Local models
pretrained_models/
.models/

# Captures
captures/
Expand Down Expand Up @@ -103,4 +103,4 @@ inject_derived
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
.apdisk
Loading