Chat tools powered by the llama-cpp-python library. Made for a PyCon LT 2024 Talk. This repository provides:
- a Command Line Chat: A chat bot that runs on the command line interface. Outputs are streamed so that you're able to see the "typing in real time" experience.
- a Telegram Bot: A chat bot on Telegram.
For either application, we need the same model and similar dependencies.
Use the command line to download the model and the libraries.
- On a linux machine, run
pip3 install -r requirements.txt
- Download an individual model file from huggingface. You do not need a HF account to run this.
huggingface-cli download TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF capybarahermes-2.5-mistral-7b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
Note
Offloading to GPU requires installing with BLAS.
In the root directory of the repo, run
python run_chat.py
To enable verbose outputs, set the number of threads, use the flags
python run_chat.py --verbose --n_threads 12
Once it is up and running, start chatting with it by directly typing your prompt and hit Enter
. Chat state is managed under the hood.
bye
: Ends the chat and also the program.clear
: Clear chat history.
A telegram bot powered by llama-cpp-python.
- Follow the steps to obtain an API TOKEN from @BotFather.
- Set the token as an environment variable
export TELEGRAM_BOT_TOKEN=YOUR_TOKEN
In the root directory of the repo, run
python start_telegram_bot.py
Currently the server from python bindings do not support batched inference nor concurrent requests.