Local LLM Discord Bot Interface

A lightweight Discord bot interface for interacting with locally-hosted language models. Supports conversation history, streaming responses, and custom configurations.

Core Features

DM-Only Interactions: Restrict bot usage to private messages
Context-Aware Chat: Maintains limited conversation history per user
Ping & Clear Commands: ping displays bot latency, clear resets user history
Message Chunking: Automatically splits long responses (>2000 chars)
GPU Acceleration: Configure offloading layers for performance
Streaming Mode: Real-time token delivery with typing simulation
Custom Prompts: Modify system behavior via SYSTEM_PROMPT
Thread Safety: Prevents race conditions with user-level locks

Setup Guide

Install Requirements:

pip install discord.py llama-cpp-python python-dotenv

Create .env:

DISCORD_TOKEN=TOKEN
MODEL_PATH=Llama-3.1-8B-Q4_K_L.gguf
# Required parameters above. Optional below:
COMMAND_PREFIX=!
FULL_LOG=FALSE
MODEL_N_CTX=1024
MAX_TOKENS=256
TOP_K=40
TOP_P=0.95
TEMPERATURE=0.7
REPEAT_PENALTY=1.1
GPU_LAYERS=7
ONLY_DM=TRUE
HISTORY_LIMIT=3
STREAM_MODE=FALSE
SYSTEM_PROMPT=You are a helpful assistant. Answer as concisely as possible.

Run Bot:
```
python main.py
```

Full .env Configuration

Parameter	Type	Description	Default
`DISCORD_TOKEN`	String	Required Discord bot token	-
`COMMAND_PREFIX`	String	Bot command prefix	`!`
`FULL_LOG`	Boolean	Enable verbose logging	`FALSE`
`MODEL_PATH`	String	Required Path to GGUF model file	-
`MODEL_N_CTX`	Integer	Context window size	`1024`
`MAX_TOKENS`	Integer	Maximum tokens per response	`256`
`TOP_K`	Integer	Top-k sampling	`40`
`TOP_P`	Float	Top-p sampling	`0.95`
`TEMPERATURE`	Float	Response randomness (0.1-2.0)	`0.7`
`REPEAT_PENALTY`	Float	Penalize repeated phrases	`1.1`
`GPU_LAYERS`	Integer	GPU offloading layers (0=CPU-only)	`0`
`ONLY_DM`	Boolean	Bot responds only in DMs	`TRUE`
`HISTORY_LIMIT`	Integer	Max stored message pairs (user+assistant)	`3`
`STREAM_MODE`	Boolean	Enable real-time token streaming	`FALSE`
`SYSTEM_PROMPT`	String	Initial assistant behavior prompt	`You are a helpful...`

TODO

Stream fix:
- Fix generation interruption caused by Discord API rate limits
- Implement adaptive delay between token sends

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
services		services
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local LLM Discord Bot Interface

Core Features

Setup Guide

Full .env Configuration

TODO

About

Languages

License

bot08/local-llm-discord-bot

Folders and files

Latest commit

History

Repository files navigation

Local LLM Discord Bot Interface

Core Features

Setup Guide

Full .env Configuration

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Languages