Simple RAG

This is a simple RAG (Retrieval-Augmented Generation) that mostly self-implemented. This simple-rag package contain 4 modules:

Retrieval: A retriever that retrieve the most relevant documents from a given corpus.
Rerank: A reranker that rerank the retrieved documents.
LLM: A language model that generate the answer.
Data Helper: A helper that help to load the PDF data.

Overview

The simple RAG pipeline is shown in the following figure:

Installation

Pre-requisites:

Python 3.6 or later
Ollama (for LLM self-hosted)
Poppler (for PDF processing)

To install poppler, select one of the following commands that is appropriate for your OS:

# Debian/Ubuntu
sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

# Fedora/RHEL
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel

# macOS
brew install pkg-config poppler python

# Windows (using conda)
conda install -c conda-forge poppler

Then, install the package using the following commands:

git clone https://github.com/behitek/simple-rag/
cd simple-rag
pip install -e .

How to use

Here is an example of how to use the simple-rag package:

import os

from rag.data_helper import PDFReader
from rag.llm import OllamaLLM
from rag.pipeline import Answer, SimpleRAGPipeline
from rag.rerank import CrossEncoderRerank
from rag.retrieval import BM25Retrieval
from rag.text_utils import text2chunk

# Set your PDF path here
sample_pdf = os.path.join(os.path.dirname(__file__), "sample.pdf")
contents = PDFReader(pdf_paths=[sample_pdf]).read()
text = " ".join(contents)
chunks = text2chunk(text, chunk_size=200, overlap=50)
print(f"Number of chunks: {len(chunks)}")

retrieval = BM25Retrieval(documents=chunks)
llm = OllamaLLM(model_name="llama3:instruct")
rerank = CrossEncoderRerank(model_name="cross-encoder/ms-marco-MiniLM-L-6-v2")
pipeline = SimpleRAGPipeline(retrieval=retrieval, llm=llm, rerank=rerank)


def run(query: str) -> Answer:
    return pipeline.run(query)


if __name__ == "__main__":
    query = "What can Ollama do?"
    print("Sample query:", query)
    response: Answer = pipeline.run(query)
    print(response.answer)
    print("Now, please ask your own questions!")
    while True:
        query = input("Your question: ")
        response: Answer = run(query)
        print(response.answer)
        print()

Result:

$ python examples/simple_rag_bm25_ollama.py 

Number of chunks: 10
Sample query: What can Ollama do?
Based on the provided context, what can Ollama do?

According to the text, Ollama can:

1. Self-host a lot of "top" open-source LLMs, including LLAMA2 (by Facebook), Mistral, Phi (from Microsoft), Gemma (by Google), and more.
2. Deploy a model with custom parameters.
3. Deploy a custom model from .GGUF format.
4. Support 4-bit quantization to save memory.
5. Handle several GPU types: NVIDIA, AMD, and Apple GPU.
6. Provide an OpenAI-compatible API.

Additionally, Ollama can also:

1. Run on multiple platforms: Windows (preview), MacOS, and Linux.
2. Deploy LLM without a GPU, although this is not explicitly tested in the context.

Please refer to the examples for more examples.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
examples		examples
rag		rag
test		test
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple RAG

Overview

Installation

How to use

About

Releases

Packages

Languages

behitek/simple-rag

Folders and files

Latest commit

History

Repository files navigation

Simple RAG

Overview

Installation

How to use

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages