Skip to content

zyannick/LogosRL

Repository files navigation

Logos RL

PyTorch Multi-GPU Python Version Code style: black DVC Docker MLflow License: MIT

LogosRL is a research and engineering toolkit for fine-tuning language models on complex reasoning tasks using reinforcement learning. This project provides a production-grade, end-to-end MLOps pipeline for implementing, comparing, and analyzing RL algorithms (PPO, A2C, SAC) for applications in mathematical reasoning and de novo protein generation.

Architectural Goals

This project is built with a professional MLOps philosophy, emphasizing:

- Reproducibility: A fully containerized environment with versioned data and models ensures that any experiment can be perfectly reproduced.

- Scalability: The architecture is designed for multi-GPU, distributed training and can be deployed on professional HPC clusters using Slurm.

- Modularity: A clean separation of concerns between the pipeline, data management, training logic, and algorithmic strategies makes the system easy to extend and maintain.

Usage

This project uses Docker and DVC to ensure a completely reproducible environment.

Prerequisites

  • Docker
  • DVC
  • GIT
  • NVIDIA GPU with drivers compatible with CUDA 12.1 or higher.

Setup & Installation

  1. Clone the Repository

    git clone https://github.com/zyannick/LogosRL.git
    cd LogosRL
  2. Create a Virtual Environment and Install Dependencies

    # Create a python env usin uv/anaconda (python 3.12)
    # conda create -n moe python=3.12
    # conda activate moe
    # pip install -r requirements.txt
    uv venv -p 3.12
    source .venv/bin/activate
    uv pip install -r requirements.txt
  3. Create Local Directories: Before the first run, create the necessary directories for outputs and caches. This ensures the Docker container has the correct permissions.

    mkdir -p data moe_outputs mlruns hf_cache mpl_config
  4. Run the Full Pipeline: This single command executes the entire DVC pipeline, from data preparation to training and evaluation.

    # Build the Docker image
    dvc repro build_docker
    # Launch the full pipeline inside a docker. This will use dvc inside the docker
    ./run_local.sh repro run_pipeline

    or

    # Launch the full pipeline
    torchrun --nproc_per_node=auto src/run_pipeline.py --pipeline_stage full_pipeline

    The final model, checkpoints, and metrics will be available in the moe_outputs/ directory, and experiment results can be viewed via the MLflow UI.

  5. Launch MLflow UI (Optional): To view the experiment results, run the MLflow UI server:

    mlflow server --backend-store-uri sqlite:///moe_outputs/mlflow.db --port 5000

    Then navigate to http://localhost:5000 in your browser.

Running on a Slurm

I have also provided a script to train the models on a Slurm (you should update the email to get the updates on your email) :

mkdir -p data moe_outputs mlruns hf_cache mpl_config
./build_docker.sh
# First, create the directory for Slurm logs
mkdir -p slurm_logs
# Submit the job
sbatch submit_dvc_slurm.sbatch

Results

PPO

A2C

SAC

About

Finetuning LLM for Reinforcement Learning (PPO, A2C, SAC) for mathematics, biology

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages