LogosRL is a research and engineering toolkit for fine-tuning language models on complex reasoning tasks using reinforcement learning. This project provides a production-grade, end-to-end MLOps pipeline for implementing, comparing, and analyzing RL algorithms (PPO, A2C, SAC) for applications in mathematical reasoning and de novo protein generation.
This project is built with a professional MLOps philosophy, emphasizing:
- Reproducibility: A fully containerized environment with versioned data and models ensures that any experiment can be perfectly reproduced.
- Scalability: The architecture is designed for multi-GPU, distributed training and can be deployed on professional HPC clusters using Slurm.
- Modularity: A clean separation of concerns between the pipeline, data management, training logic, and algorithmic strategies makes the system easy to extend and maintain.
This project uses Docker and DVC to ensure a completely reproducible environment.
-
Clone the Repository
git clone https://github.com/zyannick/LogosRL.git cd LogosRL -
Create a Virtual Environment and Install Dependencies
# Create a python env usin uv/anaconda (python 3.12) # conda create -n moe python=3.12 # conda activate moe # pip install -r requirements.txt uv venv -p 3.12 source .venv/bin/activate uv pip install -r requirements.txt
-
Create Local Directories: Before the first run, create the necessary directories for outputs and caches. This ensures the Docker container has the correct permissions.
mkdir -p data moe_outputs mlruns hf_cache mpl_config
-
Run the Full Pipeline: This single command executes the entire DVC pipeline, from data preparation to training and evaluation.
# Build the Docker image dvc repro build_docker # Launch the full pipeline inside a docker. This will use dvc inside the docker ./run_local.sh repro run_pipeline
or
# Launch the full pipeline torchrun --nproc_per_node=auto src/run_pipeline.py --pipeline_stage full_pipelineThe final model, checkpoints, and metrics will be available in the
moe_outputs/directory, and experiment results can be viewed via the MLflow UI. -
Launch MLflow UI (Optional): To view the experiment results, run the MLflow UI server:
mlflow server --backend-store-uri sqlite:///moe_outputs/mlflow.db --port 5000
Then navigate to
http://localhost:5000in your browser.
I have also provided a script to train the models on a Slurm (you should update the email to get the updates on your email) :
mkdir -p data moe_outputs mlruns hf_cache mpl_config
./build_docker.sh
# First, create the directory for Slurm logs
mkdir -p slurm_logs
# Submit the job
sbatch submit_dvc_slurm.sbatch