CuMa

Official implementation of our NeurIPS-W paper:

You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models
Shuvendu Roy, Hossein Hajimirsadeghi, Mengyao Zhai, and Golnoosh Samei
NeruIPS 2025 Workshop on Mathematical Reasoning and AI

In this work, we systematically investigate the performance of label-free RL methods across different model sizes and reasoning strengths, from 0.5B to 7B parameters. Our empirical analysis reveals critical limitations: label-free RL is highly dependent on the base model's pre-existing reasoning capability, with performance often degrading below baseline levels for weaker models. We find that smaller models fail to generate sufficiently long or diverse chain-of-thought reasoning to enable effective self-reflection, and that training data difficulty plays a crucial role in determining success. To address these challenges, we propose a simple yet effective method for label-free RL that utilizes curriculum learning to progressively introduce harder problems during training and mask no-majority rollouts during training. Additionally, we introduce a data curation pipeline to generate samples with predefined difficulty. Our approach demonstrates consistent improvements across all model sizes and reasoning capabilities, providing a path toward more robust unsupervised RL that can bootstrap reasoning abilities in resource-constrained models.

Getting started

The project is built upon the openr1. Please follow the OpenR1 installation guidelines. We summarize the main steps below.

Caution

Libraries rely on CUDA 12.4. If you see errors related to segmentation faults, double check the version your system is running with nvcc --version.

To run the code in this project, first, create a Python virtual environment using e.g. uv. To install uv, follow the UV Installation Guide.

Note

As a shortcut, run make install to setup development libraries (spelled out below). Afterwards, if everything is setup correctly you can try out the Open-R1 models.

uv venv openr1 --python 3.11 && source openr1/bin/activate && uv pip install --upgrade pip

Tip

For Hugging Face cluster users, add export UV_LINK_MODE=copy to your .bashrc to suppress cache warnings from uv

Next, install vLLM and FlashAttention (use Flash Attention v2.7.4.post1 to avoid ABI mismatches):

uv pip install vllm==0.8.4
uv pip install setuptools && uv pip install flash-attn==2.7.4.post1 --no-build-isolation

This will also install PyTorch v2.6.0 and it is very important to use this version since the vLLM binaries are compiled for it. You can then install the remaining dependencies for your specific use case via pip install -e .[LIST OF MODES]. For most contributors, we recommend:

GIT_LFS_SKIP_SMUDGE=1 uv pip install -e ".[dev]"
pip install lighteval

Training Intuitor

export ACCELERATE_LOG_LEVEL=info
MODEL=Qwen/Qwen2.5-1.5B
RECIPE=recipes/Qwen2.5-1.5B/intuitor/config_demo.yaml
FILE_NAME=intuitor.py
DIST=fsdp

# Run vllm-serve in the background with nohup
nohup env CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model $MODEL > vllm-serve.log 2>&1 &
VLLM_PID=$!
echo "vLLM server started with PID: $VLLM_PID"

env CUDA_VISIBLE_DEVICES=1,2,3 ACCELERATE_LOG_LEVEL=info \
    accelerate launch --config_file recipes/accelerate_configs/$DIST.yaml --num_processes=3 \
    src/open_r1/$FILE_NAME --config $RECIPE
TRAINING_PID=$!
echo "Training process started with PID: $TRAINING_PID"

Training CuMa

export ACCELERATE_LOG_LEVEL=info
MODEL=Qwen/Qwen2.5-1.5B
RECIPE=recipes/Qwen2.5-1.5B/cuma/config.yaml
FILE_NAME=cuma.py
DIST=fsdp

# Run vllm-serve in the background with nohup
nohup env CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model $MODEL > vllm-serve.log 2>&1 &
VLLM_PID=$!
echo "vLLM server started with PID: $VLLM_PID"

# Run accelerate launch in the background with nohup
env CUDA_VISIBLE_DEVICES=1,2,3 ACCELERATE_LOG_LEVEL=info \
    accelerate launch --config_file recipes/accelerate_configs/$DIST.yaml --num_processes=3 \
    src/open_r1/$FILE_NAME --config $RECIPE
TRAINING_PID=$!
echo "Training process started with PID: $TRAINING_PID"

Evaluation

MODEL=TRAINED-CHECKPOINT-PATH
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=evals/$MODEL

TASK=math_500
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m lighteval.__main__ vllm $MODEL_ARGS "lighteval|$TASK|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR 

TASK=gsm8k
    CUDA_VISIBLE_DEVICES=0,1,2,3 python -m lighteval.__main__ vllm $MODEL_ARGS "leaderboard|$TASK|5|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR 

TASK=aime24
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m lighteval.__main__ vllm $MODEL_ARGS "lighteval|$TASK|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR 

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m lighteval.__main__ vllm $MODEL_ARGS "extended|lcb:codegeneration|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR 

TASK=gpqa:diamond
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m lighteval.__main__ vllm $MODEL_ARGS "lighteval|$TASK|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR

Curating dataset

We provide a script for curating additional data of predefined difficulty to ensure better model performance. Please use data_curation.py to curate these data.

Acknowledgements

We thank the authors of Intuitor for releasing their code.

Reference

@article{roy2025,
  title={You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models},
  author={Shuvendu Roy, Hossein Hajimirsadeghi, Mengyao Zhai, Golnoosh Samei},
  journal={NeruIPS Workshop on Mathematical Reasoning and AI},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
recipes		recipes
src/open_r1		src/open_r1
LICENSE		LICENSE
README.md		README.md
data_curation.py		data_curation.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CuMa

Getting started

Training Intuitor

Training CuMa

Evaluation

Curating dataset

Acknowledgements

Reference

About

Uh oh!

Releases

Packages

Languages

License

BorealisAI/CuMa

Folders and files

Latest commit

History

Repository files navigation

CuMa

Getting started

Training Intuitor

Training CuMa

Evaluation

Curating dataset

Acknowledgements

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages