This is a repository for the paper: Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning.
The paper is at paper/SCDPO.pdf
.
Model | Checkpoint | GSM8k | MATH |
---|---|---|---|
MathGenie/InternLM2-SFT-SCDPO | 🤗 HF Link | 88.5 | 58.1 |
MathGenie/Mistral-7B-Ours-SFT | 🤗 HF Link | 76.8 | 43.2 |
MathGenie/Mistral-7B-Ours-SFT-SCDPO | 🤗 HF Link | 80.1 | 47.7 |
Dataset | Link |
---|---|
MathGenie/SCDPO-Data-Mistral-Ours | 🤗 HF Link |
The code for DPO and SCDPO data generation is in src
. src/positive_negative_lce_gen
, src/positive_negative_lce_gen_internlm_ape
and src/positive_negative_lce_gen_mathcode_mistral_addsys
are for the generation of DPO data, while src/step_controled_dpo_lce_internlm
, src/step_controled_dpo_lce_mathcoder
and src/step_controled_dpo_lce
are for the generation of SCDPO data.
API inference of internlm models are deployed using vllm, with an example script at src/positive_negative_lce_gen_internlm_ape/scripts_1/deploy_vllm.sh
. Other models are deployed using an image from huggingface/text-generation-inference, with an exampe deploy script at src/deploy-tgi.sh
The training code is based on the repository huggingface/alignment-handbook. The code for SFT training is at alignment-handbook/scripts/run_sft_lce.py
, which is adapted for code-integrated training. The code for DPO and SCDPO training is at alignment-handbook/scripts/run_dpo.py
. The config yaml files and training shell scripts are at alignment-handbook/recipes
. You can modify the paths in the .yaml
files and .sh
files to train the models.
First, create a Python virtual environment using e.g. Conda:
conda create -n handbook python=3.10 && conda activate handbook
Next, install PyTorch v2.1.2
- the precise version is important for reproducibility! Since this is hardware-dependent, we
direct you to the PyTorch Installation Page.
You can then install the remaining package dependencies as follows:
cd alignment-handbook/
python -m pip install .
You will also need Flash Attention 2 installed, which can be done by running:
python -m pip install flash-attn --no-build-isolation
To run SFT, let's take training Mistral-7B-Ours-SFT for example. First download the dataset MathGenie/SCDPO-Data-Mistral-Ours from 🤗 HF Link. Then, modify the path to the pretrained model and dataset in alignment-handbook/recipes/mistral-7b-lce/sft/config_full.yaml
. Execute the command:
bash alignment-handbook/recipes/mistral-7b-lce/sft/sft_4gpu.sh alignment-handbook/recipes/mistral-7b-lce/sft/config_full.yaml
This finetunes the model on 4 GPUs for 3 epochs.
To run DPO, first download the dataset MathGenie/DPO-Data-Mistral-Ours from 🤗 HF Link. You can download MathGenie/Mistral-7B-Ours-SFT or use the SFT model you trained before. Then, modify the path to the pretrained model and dataset in alignment-handbook/recipes/mistral-7b-lce/dpo/config_full.yaml
. Execute the command:
bash alignment-handbook/recipes/mistral-7b-lce/dpo/dpo_4gpu.sh alignment-handbook/recipes/mistral-7b-lce/dpo/config_full.yaml
This finetunes the model on 4 GPUs for 2 epochs.
To run SCDPO, first download the dataset MathGenie/SCDPO-Data-Mistral-Ours from 🤗 HF Link. You can download MathGenie/Mistral-7B-Ours-SFT or use the SFT model you trained before. Then, modify the path to the pretrained model and dataset in alignment-handbook/recipes/mistral-7b-lce/dpo/config_full_sc.yaml
. Execute the command:
bash alignment-handbook/recipes/mistral-7b-lce/dpo/dpo_4gpu.sh alignment-handbook/recipes/mistral-7b-lce/dpo/config_full_sc.yaml
This finetunes the model on 4 GPUs for 2 epochs.
The inference code is at alignment-hendbook/src/inference
. inference_g.py
, inference_m1.py
and inference_m2.py
are for generating the solutions. compute_acc.py
is for computing the accuracy of the generated solutions.
Install the inference environment:
pip install -r alignment-hendbook/src/inference/requirements.txt
Use huggingface tgi from https://github.com/huggingface/text-generation-inference. Deploy the model using deploy.sh
For example:
bash alignment-hendbook/src/inference/mistral-7b-lce/deploy.sh MODEL_PATH
Then put the model name in config.json
to specify the directory name the inference results are saved into. Start inference by running infer.sh
For example:
bash alignment-hendbook/src/inference/mistral-7b-lce/infer.sh g 3epoch
bash alignment-hendbook/src/inference/mistral-7b-lce/infer.sh m1 3epoch
bash alignment-hendbook/src/inference/mistral-7b-lce/infer.sh g 3epoch
Deviding the inference into three parts is to save time. 3epoch
is to clarify the checkpoint, you can replace it with your own name such as 2epoch
, 500step
, ect.
Finally, after the inference has finished, compute accuracy using compute_acc.py
. For example:
bash alignment-hendbook/src/inference/mistral-7b-lce/compute_acc.py 3epoch