Skip to content

Commit

Permalink
- General fixes and improvements.
Browse files Browse the repository at this point in the history
- Added sample setup to use GPU
  • Loading branch information
amithkoujalgi committed Dec 17, 2023
1 parent 70b8530 commit e97290f
Show file tree
Hide file tree
Showing 9 changed files with 159 additions and 73 deletions.
13 changes: 13 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
start:
docker-compose -f ./docker-compose.yaml down -v; \
docker-compose -f ./docker-compose.yaml rm -fsv; \
docker-compose -f ./docker-compose.yaml up --remove-orphans;

start-gpu:
docker-compose -f ./docker-compose-gpu.yaml down -v; \
docker-compose -f ./docker-compose-gpu.yaml rm -fsv; \
docker-compose -f ./docker-compose-gpu.yaml up --remove-orphans;

stop:
docker-compose -f ./docker-compose.yaml down -v; \
docker-compose -f ./docker-compose.yaml rm -fsv;
102 changes: 61 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,72 +14,92 @@ The LLMs are downloaded and served via [Ollama](https://github.com/jmorganca/oll
- [How to run](#how-to-run)
- [Demo](#demo)
- [Improvements](#improvements)
- Contributing
- [Credits](#credits)

#### Requirements
### Requirements

[![][shield]][site]

[![][maketool-shield]][maketool-site]

[site]: https://docs.docker.com/compose/

[shield]: https://img.shields.io/badge/Docker_Compose-Installation-blue.svg?style=for-the-badge&labelColor=gray

[maketool-site]: https://www.gnu.org/software/make/

- Docker (with docker-compose)
- Python (for development only)

#### How to run

Define a `docker-compose.yml` by adding the following contents into the file.

```yaml
services:

ollama:
image: ollama/ollama
ports:
- 11434:11434
volumes:
- ~/ollama:/root/.ollama
networks:
- net

app:
image: amithkoujalgi/pdf-bot:1.0.0
ports:
- 8501:8501
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434
- MODEL=orca-mini
networks:
- net

networks:
net:
[maketool-shield]: https://img.shields.io/badge/Make-Tool-blue.svg?style=for-the-badge&labelColor=gray

### How to run

#### CPU version

```shell
make start
```

Then run:
#### GPU version

```shell
docker-compose up
make start-gpu
```

When the server is up and running, access the app at: http://localhost:8501

Note:
**Note:**

- It takes a while to start up since it downloads the specified model for the first time.
- If your hardware does not have a GPU and you choose to run only on CPU, expect high response time from the bot.
- Only Nvidia is supported as mentioned in Ollama's documentation. Others such as AMD isn't supported yet. Read how to use GPU on [Ollama container](https://hub.docker.com/r/ollama/ollama) and [docker-compose](https://docs.docker.com/compose/gpu-support/#:~:text=GPUs%20are%20referenced%20in%20a,capabilities%20.).
- Only Nvidia is supported as mentioned in Ollama's documentation. Others such as AMD isn't supported yet. Read how to
use GPU on [Ollama container](https://hub.docker.com/r/ollama/ollama)
and [docker-compose](https://docs.docker.com/compose/gpu-support/#:~:text=GPUs%20are%20referenced%20in%20a,capabilities%20.).
- Make sure to have Nvidia drivers setup on your execution environment for the best results.

Image on DockerHub: https://hub.docker.com/r/amithkoujalgi/pdf-bot

#### [Demo](https://www.youtube.com/watch?v=jJyFslR-oNQ):
### [Demo](https://www.youtube.com/watch?v=jJyFslR-oNQ)

https://github.com/amithkoujalgi/ollama-pdf-bot/assets/1876165/40dc70e6-9d35-4171-9ae6-d82247dbaa17

Sample PDFs:
#### Sample PDFs

[Hl-L2351DW v0522.pdf](https://github.com/amithkoujalgi/ollama-pdf-bot/files/13323209/Hl-L2351DW.v0522.pdf)

[HL-B2080DW v0522.pdf](https://github.com/amithkoujalgi/ollama-pdf-bot/files/13323208/HL-B2080DW.v0522.pdf)

#### Improvements
### Improvements

- [ ] Expose model params such as `temperature`, `top_k`, `top_p` as configurable env vars

#### Credits
### Benchmarks

- The above provided PDFs were used for benchmarking.

LLAMA2: Download model - ~6-8 minutes

#### Devices used

- PC: Intel i9 (9th gen), Nvidia RTX 2080, 32 GB memory
- Laptop: Intel i7 MacBook Pro (2017)

| Model | Device | Operation | Time Taken |
|--------|--------|-------------------------------------------|------------------|
| LLAMA2 | PC | Load embedding model | ~3-4 minutes |
| LLAMA2 | PC | Answer the questions on the uploaded PDFs | ~5-10 seconds |
| LLAMA2 | Laptop | Load embedding model | ~8 minutes |
| LLAMA2 | Laptop | Answer the questions on the uploaded PDFs | ~100-130 seconds |

### Contributing

Contributions are most welcome! Whether it's reporting a bug, proposing an enhancement, or helping
with code - any sort of contribution is much appreciated.

#### Requirements

![Python](https://img.shields.io/badge/python-3.8_+-green.svg)

### Credits

Thanks to the incredible [Ollama](https://github.com/jmorganca/ollama), [Langchain](https://www.langchain.com/) and [Streamlit](https://streamlit.io/) projects.
Thanks to the incredible [Ollama](https://github.com/jmorganca/ollama), [Langchain](https://www.langchain.com/)
and [Streamlit](https://streamlit.io/) projects.
30 changes: 30 additions & 0 deletions docker-compose-gpu.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
services:

ollama:
image: ollama/ollama:latest
ports:
- 11434:11434
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [ gpu ]
volumes:
- ~/ollama:/root/.ollama
networks:
- net

app:
image: amithkoujalgi/pdf-bot:1.0.0
ports:
- 8501:8501
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434
- MODEL=llama2
networks:
- net

networks:
net:
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ services:
- 8501:8501
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434
- MODEL=orca-mini
- MODEL=llama2
networks:
- net

Expand Down
9 changes: 6 additions & 3 deletions pdf_bot/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,16 @@

import streamlit as st

from pdf_helper import PDFHelper
from config import Config
from pdf_helper import PDFHelper, load_embedding_model

load_embedding_model(model_name=Config.EMBEDDING_MODEL_NAME)

title = "PDF Bot"

model_name = os.environ.get('MODEL', "orca-mini")
model_name = Config.MODEL

ollama_api_base_url = os.environ.get('OLLAMA_API_BASE_URL', "http://localhost:11434")
ollama_api_base_url = Config.OLLAMA_API_BASE_URL
pdfs_directory = os.path.join(str(Path.home()), 'langchain-store', 'uploads', 'pdfs')
os.makedirs(pdfs_directory, exist_ok=True)

Expand Down
9 changes: 9 additions & 0 deletions pdf_bot/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import os


class Config:
MODEL = os.environ.get('MODEL', "llama2")
EMBEDDING_MODEL_NAME = os.environ.get('EMBEDDING_MODEL_NAME', "all-MiniLM-L6-v2")
OLLAMA_API_BASE_URL = os.environ.get('OLLAMA_API_BASE_URL', "http://localhost:11434")
HUGGING_FACE_EMBEDDINGS_DEVICE_TYPE = os.environ.get('HUGGING_FACE_EMBEDDINGS_DEVICE_TYPE',
"cpu")
28 changes: 16 additions & 12 deletions pdf_bot/pdf_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS

from config import Config


# This loads the PDF file
def load_pdf_data(file_path):
Expand Down Expand Up @@ -42,10 +44,10 @@ def split_docs(documents, chunk_size=1000, chunk_overlap=20):


# function for loading the embedding model
def load_embedding_model(model_path, normalize_embedding=True):
def load_embedding_model(model_name, normalize_embedding=True):
return HuggingFaceEmbeddings(
model_name=model_path,
model_kwargs={'device': 'cpu'}, # here we will run the model with CPU only
model_name=model_name,
model_kwargs={'device': Config.HUGGING_FACE_EMBEDDINGS_DEVICE_TYPE}, # here we will run the model with CPU only
encode_kwargs={
'normalize_embeddings': normalize_embedding # keep True to compute cosine similarity
}
Expand Down Expand Up @@ -78,19 +80,19 @@ def load_qa_chain(retriever, llm, prompt):
def get_response(query, chain) -> str:
# Get response from chain
response = chain({'query': query})

# Wrap the text for better output in Jupyter Notebook
wrapped_text = textwrap.fill(response['result'], width=100)
return wrapped_text
res = response['result']
# wrapped_text = textwrap.fill(res, width=100)
return res


class PDFHelper:

def __init__(self, ollama_api_base_url: str, model_name: str = "orca-mini",
embedding_model_path: str = "all-MiniLM-L6-v2"):
def __init__(self, ollama_api_base_url: str, model_name: str = Config.MODEL,
embedding_model_name: str = Config.EMBEDDING_MODEL_NAME):
self._ollama_api_base_url = ollama_api_base_url
self._model_name = model_name
self._embedding_model_path = embedding_model_path
self._embedding_model_name = embedding_model_name

def ask(self, pdf_file_path: str, question: str) -> str:
vector_store_directory = os.path.join(str(Path.home()), 'langchain-store', 'vectorstore',
Expand All @@ -116,7 +118,7 @@ def ask(self, pdf_file_path: str, question: str) -> str:
)

# Load the Embedding Model
embed = load_embedding_model(model_path=self._embedding_model_path)
embed = load_embedding_model(model_name=self._embedding_model_name)

# load and split the documents
docs = load_pdf_data(file_path=pdf_file_path)
Expand All @@ -130,8 +132,9 @@ def ask(self, pdf_file_path: str, question: str) -> str:

template = """
### System:
You are an respectful and honest assistant. You have to answer the user's questions using only the context \
provided to you. If you don't know the answer, just say you don't know. Don't try to make up an answer.
You are an honest assistant.
You will accept PDF files and you will answer the question asked by the user appropriately.
If you don't know the answer, just say you don't know. Don't try to make up an answer.
### Context:
{context}
Expand All @@ -151,4 +154,5 @@ def ask(self, pdf_file_path: str, question: str) -> str:
end_time = time.time()

print(f"Response time: {end_time - start_time} seconds.\n")

return response.strip()
23 changes: 15 additions & 8 deletions pdf_bot/pull_model.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,29 @@
import json
import os

import requests

model_name = os.environ.get('MODEL', "orca-mini")
ollama_api_base_url = os.environ.get('OLLAMA_API_BASE_URL', "http://localhost:11434")
from config import Config

model_name = Config.MODEL
ollama_api_base_url = Config.OLLAMA_API_BASE_URL
print(f"Using model: {model_name}")
print(f"Using Ollama base URL: {ollama_api_base_url}")


def pull_model(model_name_):
print(f"pulling model '{model_name_}'...")
print(f"Pulling model '{model_name_}'...")
url = f"{ollama_api_base_url}/api/pull"
data = json.dumps(dict(name=model_name_))
print(data)
headers = {'Content-Type': 'application/json'}
_response = requests.post(url, data=data, headers=headers)
print(_response.text)

# Use stream=True to handle streaming response
with requests.post(url, data=data, headers=headers, stream=True) as response:
if response.status_code == 200:
# Process the response content in chunks
for chunk in response.iter_content(chunk_size=1024):
if chunk:
print(chunk.decode('utf-8'), end='') # Replace 'utf-8' with the appropriate encoding
else:
print(f"Error: {response.status_code} - {response.text}")


pull_model(model_name_=model_name)
16 changes: 8 additions & 8 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
langchain
streamlit
replicate
pymupdf
huggingface-hub
faiss-cpu
sentence-transformers
requests
langchain==0.0.334
streamlit==1.28.1
replicate==0.18.1
pymupdf==1.23.6
huggingface-hub==0.17.3
faiss-cpu==1.7.4
sentence-transformers==2.2.2
requests==2.31.0

0 comments on commit e97290f

Please sign in to comment.