Logos RL

LogosRL is a research and engineering toolkit for fine-tuning language models on complex reasoning tasks using reinforcement learning. This project provides a production-grade, end-to-end MLOps pipeline for implementing, comparing, and analyzing RL algorithms (PPO, A2C, SAC) for applications in mathematical reasoning and de novo protein generation.

Architectural Goals

This project is built with a professional MLOps philosophy, emphasizing:

- Reproducibility: A fully containerized environment with versioned data and models ensures that any experiment can be perfectly reproduced.

- Scalability: The architecture is designed for multi-GPU, distributed training and can be deployed on professional HPC clusters using Slurm.

- Modularity: A clean separation of concerns between the pipeline, data management, training logic, and algorithmic strategies makes the system easy to extend and maintain.

Usage

This project uses Docker and DVC to ensure a completely reproducible environment.

Prerequisites

Docker
DVC
GIT
NVIDIA GPU with drivers compatible with CUDA 12.1 or higher.

Setup & Installation

Clone the Repository

git clone https://github.com/zyannick/LogosRL.git
cd LogosRL

Create a Virtual Environment and Install Dependencies

# Create a python env usin uv/anaconda (python 3.12)
# conda create -n moe python=3.12
# conda activate moe
# pip install -r requirements.txt
uv venv -p 3.12
source .venv/bin/activate
uv pip install -r requirements.txt

Create Local Directories: Before the first run, create the necessary directories for outputs and caches. This ensures the Docker container has the correct permissions.
```
mkdir -p data moe_outputs mlruns hf_cache mpl_config
```

Run the Full Pipeline: This single command executes the entire DVC pipeline, from data preparation to training and evaluation.

# Build the Docker image
dvc repro build_docker
# Launch the full pipeline inside a docker. This will use dvc inside the docker
./run_local.sh repro run_pipeline

or

# Launch the full pipeline
torchrun --nproc_per_node=auto src/run_pipeline.py --pipeline_stage full_pipeline

The final model, checkpoints, and metrics will be available in the moe_outputs/ directory, and experiment results can be viewed via the MLflow UI.

Launch MLflow UI (Optional): To view the experiment results, run the MLflow UI server:
```
mlflow server --backend-store-uri sqlite:///moe_outputs/mlflow.db --port 5000
```
Then navigate to http://localhost:5000 in your browser.

Running on a Slurm

I have also provided a script to train the models on a Slurm (you should update the email to get the updates on your email) :

mkdir -p data moe_outputs mlruns hf_cache mpl_config
./build_docker.sh
# First, create the directory for Slurm logs
mkdir -p slurm_logs
# Submit the job
sbatch submit_dvc_slurm.sbatch

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.dvc		.dvc
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build_docker.sh		build_docker.sh
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
requirements.txt		requirements.txt
run_local.sh		run_local.sh
submit_dvc_slurm.sbatch		submit_dvc_slurm.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Logos RL

Architectural Goals

Usage

Prerequisites

Setup & Installation

Running on a Slurm

Results

PPO

A2C

SAC

About

Uh oh!

Releases

Packages

Languages

License

zyannick/LogosRL

Folders and files

Latest commit

History

Repository files navigation

Logos RL

Architectural Goals

Usage

Prerequisites

Setup & Installation

Running on a Slurm

Results

PPO

A2C

SAC

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages