Skip to content

A Node.js RESTful server for a Retrieval-Augmented Generation (RAG) system using ChromaDB and Ollama!

License

Notifications You must be signed in to change notification settings

vteam27/RagXOllama

Repository files navigation

RagXOllama

A minimal Node.js RESTful server for a Retrieval-Augmented Generation (RAG) system using ChromaDB and Ollama!

image

Setup

Method 1 (using Docker)

  1. Clone this repo.
git clone https://github.com/vteam27/RagXOllama
cd RagXOllama
  1. Add your documents to Docs/ folder.
  2. Simply build the docker containers using
docker compose up 
  1. Wait for the data to be ingested to DB and the LLM model to be downloaded.
  2. Go to http://localhost:3000

Method 2 (build locally)

  1. Ollama : To serve open source LLMs locally (one time setup)

    ollama serve
    ollama run llama3
    
  2. ChromaDB Backend: Spin up the ChromaDB core

    docker pull chromadb/chroma
    docker run -p 8000:8000 chromadb/chroma
    
  3. Add your PDF documents to Docs/

  4. Install dependencies and run!

    git clone https://github.com/vteam27/RagXOllama
    cd RagXOllama
    npm install
    node app.js
    
  5. Go to http://localhost:3000 to chat using a web demo.

  6. Alternatively, you can run node pipeline.js to interact using terminal, easy development and testing.

Features

  • Chat with private documents securely!
  • Search and query your data easily using Natural Language only!
  • Keep all your LLMs up to date with the latest data.
  • Use any open source LLM of your choice. Browse LLMs
  • This repo can be used as a template to easily integrate LLMs with RAG functionality into your MERN (or any other node.js) stack apps.

Example

RAG_example

  1. Ingestion: Feed relevant information into the chromaDB vector store collection.
  2. Retrieve: We retrieve relevant (top n) chunks of factual information stored in chromaDB using a algorithm that calculates it's similarity score with the user query.
  3. Augment: We append this context into our prompt.
  4. Generate: We feed this prompt to a 8B llama3 4-bit quantized model served by ollama to generate the desired response.

View the logs.txt file for the full output of my code.

About RAG

With the help of Retrieval-Augmented Generation (RAG) we can achieve:

  • Improved Factual Accuracy: By relying on retrieved information, RAG systems can ensure answers are grounded in real-world data.
  • Domain Specificity: RAG allows you to integrate domain-specific knowledge bases, making the system more knowledgeable in a particular area.
  • Adaptability to New Information: By using external knowledge sources, RAG systems can stay up-to-date with the latest information, even if the LLM itself wasn't specifically trained on it.

image

Milestones

  • Setup chromaDB and ollama
  • Build a basic RAG pipeline
  • Build a data loader to chunk and ingest data into chromaDB.
  • Add support for text, pdf and docx files.
  • Implement a REST API Architecture.
  • Build demo UI for easy interaction
  • Containarize and publish image to dockerhub
  • All done for now :)