The demo contains 2 python applications working on the same VM for simplicyty.
Applications:
- Embeddings - applications to easily add embeddings to vector database
- Chatbot - LLAMA v2 powered streamlit application providing chat capability.
Install conda in your environment and create the environment.
conda create -n llm
conda activate llm
pip install -r requirements.txt
pip uninstall ctransformers
pip install ctransformers --upgrade --force-reinstall
Run the embeddings application
streamlit run embedded-ui.py --server.port 9900
Go to your VM ip and port 9900.
Run the Chatbot application
streamlit run app.py --server.port 8800
Go to your VM ip and port 8800.
Paramter | Description | Default value |
---|---|---|
CHUNK_SIZE | Size of the chunk used to ingest the data into the vector database | 500 |
CHUNK_OVERLAP | Size of the chunk overlap used to ingest the data into the vector database | 50 |
DATA_PATH | Size of xxx | 500 |
DB_FAISS_PATH | Size of xxx | 500 |
MODEL_TYPE | Type of the LLM model used, passed to the CTransformer | llama |
MODEL_BIN_PATH | Path to the LLM model on disk | models/llama-2-7b-chat.ggmlv3.q2_K.bin |
MAX_NEW_TOKENS | Max amount of new tokens generated by LLM on each call | 81920 |
RETURN_SOURCE_DOCUMENTS | Should return the sources used to find the answer | True |
VECTOR_COUNT | Max amount of vectors retrieve as a context from vector db | 2 |
TEMPERATURE | The LLM model temperature. Value between 0 and 1. | 0.01 |
MODEL_BATCH_SIZE | The LLM model batch size for tokens processing | 4096 |
USE_GPU | Use GPU or not | True |
EMBEDDINGS_MODEL_NAME | The model name used for embeddings. Model is taken from huggingfaces. | sentence-transformers/all-MiniLM-L6-v2 |