Lily-Bot is an n-gram language model based on my own Discord messages.
It is built to use as few pre-made libraries as possible. This README also guides you through setting up your own n-gram model using your preferred Discord package.
Install dependencies:
pip install -r requirements.txtRun the Flask app:
flask runOr run with Gunicorn (for production):
gunicorn -w 4 app:appThis repository contains an n-gram language model trained on my own Discord messages.
N-gram models predict the next word based on the previous "n" words:
- Unigram: 1 word at a time
- Bigram: 2 words at a time
- Trigram: 3 words at a time
The model looks at the previous word(s) and predicts the next word based on probabilities derived from the training data.
My understanding of n-gram models is largely derived from:
Daniel Jurafsky and James H. Martin (2024). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd edition. Online manuscript released August 20, 2024. https://web.stanford.edu/~jurafsky/slp3
The model and original messages have been removed for privacy. You can try an implementation here: Google Cloud Demo