Skip to content

Ajey95/Sibyl-System

Repository files navigation

LocalRQA: A Private, Local Question-Answering System

Python Version License

LocalRQA is a toolkit for building and running your own private "ChatGPT" that is an expert on your specific documents. It uses the Retrieval-Augmented Generation (RAG) technique to provide answers that are factually grounded in a knowledge base you provide, ensuring all processing is handled locally to maintain data privacy.


Features

  • Retrieval-Augmented Generation (RAG): Ensures answers are based on provided documents, reducing factual errors and hallucinations.
  • 100% Local & Private: Your documents, questions, and the AI model's processing never leave your machine.
  • GPU Accelerated: Designed to run on local NVIDIA GPUs for high-performance inference.
  • Interactive UI: Comes with a Gradio-based web interface for easy interaction and demonstration.

Installation Guide for Windows with NVIDIA GPU (e.g., RTX 3050/4060)

The original installation process is not compatible with modern Windows environments. The following is a definitive guide that includes all necessary fixes.

1. Prerequisites

Before you begin, ensure you have installed the following:

  1. Git for Windows: Download here
  2. Python 3.10 (64-bit): Download here.
    • CRITICAL: During installation, you MUST check the box "Add Python 3.10 to PATH".
  3. Visual Studio Build Tools: Download here.
    • CRITICAL: During installation, you MUST select the "Desktop development with C++" workload.

2. Environment Setup

Open your terminal (Windows PowerShell or Command Prompt).

# Clone the project from GitHub
git clone [https://github.com/jasonyux/LocalRQA.git](https://github.com/jasonyux/LocalRQA.git)

# Navigate into the project folder
cd LocalRQA

# Create a Python 3.10 virtual environment
py -3.10 -m venv rtx_env

# Activate the environment
.\rtx_env\Scripts\activate

3. Manual Code & Data Fixes

You must make these changes before installing dependencies.

A. Edit setup.py:

  • Open the setup.py file.
  • Inside the install_requires=[ ... ] list, find and delete the entire lines for 'deepspeed', 'faiss-gpu', and 'flash_attn'.
  • Save the file.

B. Check huggingface.py:

  • Open the file local_rqa\qa_llms\huggingface.py.
  • Ensure the line model = model.cuda() (around line 58) is active (it should NOT have a # in front of it).

C. Create Runner Scripts:

  • In the main LocalRQA folder, create the following three new files:

    • run_controller.py:

      import subprocess
      import sys
      subprocess.run([sys.executable, 'local_rqa/serve/controller.py'] + sys.argv[1:])
    • run_worker.py:

      import subprocess
      import sys
      subprocess.run([sys.executable, 'local_rqa/serve/model_worker.py'] + sys.argv[1:])
    • run_web.py:

      import subprocess
      import sys
      subprocess.run([sys.executable, 'local_rqa/serve/gradio_web_server.py'] + sys.argv[1:])

D. Prepare the Database:

  • Run these commands in your activated terminal:
    # Create the folder the code expects
    mkdir example\databricks\database_fixed
    
    # Copy and rename the database file
    copy example\databricks\database\databricks.pkl example\databricks\database_fixed\documents.pkl

4. Install Dependencies

Run these commands one by one in your activated (rtx_env) terminal.

# 1. Install the GPU version of PyTorch for CUDA 11.8
pip install torch torchvision torchaudio --index-url [https://download.pytorch.org/whl/cu118](https://download.pytorch.org/whl/cu118)

# 2. Install the stable CPU version of FAISS for Windows
pip install faiss-cpu

# 3. Install missing dependencies
pip install -U langchain-community uvicorn

# 4. Install the project itself and all other requirements
pip install -e .

Usage: Running the Application

After a successful installation, open three separate terminals. In each one, navigate to your LocalRQA folder and activate the environment (.\rtx_env\Scripts\activate).

  • In Terminal 1 (Launch the Controller):

    python run_controller.py
  • In Terminal 2 (Launch the Model Worker): Note: The first time you run this, it will download the language model (approx. 2 GB). This command is tailored for GPUs with low VRAM (e.g., RTX 3050).

    python run_worker.py --qa_model_name_or_path TinyLlama/TinyLlama-1.1B-Chat-v1.0 --database_path example/databricks/database_fixed --load_8bit

    Wait for this to finish loading. It will say "Uvicorn running on..."

  • In Terminal 3 (Launch the Web UI): Wait for Terminal 2 to be ready, then run this.

    python run_web.py --example "What is LocalRQA?"

Finally, open your web browser and navigate to http://localhost:7860 to use the application.


How It Works

The system uses a two-step "Open-Book Exam" process:

  1. Retrieval: When you ask a question, the system first searches its knowledge base (the documents.pkl file) to find the most relevant text chunks.
  2. Generation: It then gives your question and these retrieved chunks to the language model (TinyLlama), which generates an answer based only on the provided facts.

About

From Psycho-Pass, the all-knowing hive-mind AI that runs society by constantly analyzing data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •