Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding orpo training #1210

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

Goekdeniz-Guelmez
Copy link
Contributor

Training:

python -m mlx_lm.lora \
    --model mlx-community/Josiefied-Qwen2.5-0.5B-Instruct-abliterated-v1-4bit \
    --train \
    --data /Users/gokdenizgulmez/Desktop/dpo_test_data \
    --iters 10 \
    --batch-size 1 \
    --num-layers 1 \
    --val-batches 2 \
    --steps-per-report 1 \
    --adapter-path /Users/gokdenizgulmez/Desktop/test-dpo \
    --max-seq-length 1024 \
    --grad-checkpoint \
    --training-mode orpo \
    --fine-tune-type lora \
    --dpo-loss-type sigmoid \
    --beta 0.1 \
    --steps-per-eval 50

Output:

Loading pretrained model
Fetching 9 files: 100%|███████████████████████████████████████| 9/9 [00:00<00:00, 113701.01it/s]
Loading datasets
Training in orpo mode
Trainable parameters: 0.109% (0.541M/494.033M)
Starting ORPO training..., iters: 10
Iter 1: Val loss 3.107, Val chosen reward 2.000, Val rejected reward 0.000, Val took 0.518s
Iter 1: Train loss 2.197, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.375, Tokens/sec 743.712, Trained Tokens 541.0, Peak mem 1.284 GB
Iter 2: Train loss 8.681, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.348, Tokens/sec 773.850, Trained Tokens 1115.0, Peak mem 1.347 GB
Iter 3: Train loss 0.378, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.501, Tokens/sec 797.219, Trained Tokens 1646.0, Peak mem 1.347 GB
Iter 4: Train loss 0.006, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.539, Tokens/sec 807.946, Trained Tokens 2171.0, Peak mem 1.347 GB
Iter 5: Train loss 0.005, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.516, Tokens/sec 796.148, Trained Tokens 2696.0, Peak mem 1.347 GB
Iter 6: Train loss 0.442, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.384, Tokens/sec 748.986, Trained Tokens 3237.0, Peak mem 1.347 GB
Iter 7: Train loss 0.145, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.514, Tokens/sec 804.046, Trained Tokens 3768.0, Peak mem 1.347 GB
Iter 8: Train loss 5.233, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.362, Tokens/sec 781.894, Trained Tokens 4342.0, Peak mem 1.347 GB
Iter 9: Train loss 4.444, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.321, Tokens/sec 758.359, Trained Tokens 4916.0, Peak mem 1.347 GB
Iter 10: Val loss 2.200, Val chosen reward 2.000, Val rejected reward 0.000, Val took 0.467s
Iter 10: Train loss 0.002, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.521, Tokens/sec 798.309, Trained Tokens 5441.0, Peak mem 1.347 GB
Saved final weights to /Users/gokdenizgulmez/Desktop/test-dpo/adapters.safetensors.

@ivanfioravanti
Copy link
Contributor

Amazing job!!!

@Goekdeniz-Guelmez
Copy link
Contributor Author

Thanks @ivanfioravanti

@Goekdeniz-Guelmez
Copy link
Contributor Author

Used Dataset: mlx-community/Qwen2.5-0.5B-Instruct-8bit, used Dataset: mlx-community/DPO-test.

Answer before:

Hello! How can I assist you today?

Train args:

python -m mlx_lm.lora \
    --model mlx-community/Qwen2.5-0.5B-Instruct-8bit \
    --train \
    --data /Users/gokdenizgulmez/Desktop/orpo_test_data \
    --iters 50 \
    --batch-size 1 \
    --num-layers 8 \
    --val-batches 1 \
    --steps-per-report 10 \
    --adapter-path /Users/gokdenizgulmez/Desktop/test-orpo-full \
    --max-seq-length 254 \
    --grad-checkpoint \
    --training-mode orpo \
    --fine-tune-type lora \
    --beta 0.1 \
    --steps-per-eval 500

Trsin output:

Loading pretrained model
Fetching 9 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 95808.97it/s]
Loading datasets
Training in orpo mode
Trainable parameters: 0.109% (0.541M/494.033M)
Starting ORPO training..., iters: 50
Iter 1: Val loss 0.11973625, Val chosen reward -0.360, Val rejected reward -0.279, Val accuracy 0.000, Val margin -2.197, Val took 0.238s
[WARNING] Sequences longer than 254 tokens will be truncated.
...
Iter 10: Train loss 0.07418435, Chosen reward -0.267, Rejected reward -0.266, Accuracy 0.400, Margin -1.742, Learning Rate 1.000e-05, It/sec 1.065, Tokens/sec 406.773, Peak mem 2.272 GB
Iter 20: Train loss 0.05426955, Chosen reward -0.383, Rejected reward -0.434, Accuracy 0.700, Margin -1.543, Learning Rate 1.000e-05, It/sec 1.221, Tokens/sec 415.303, Peak mem 3.321 GB
Iter 30: Train loss 0.02641384, Chosen reward -0.417, Rejected reward -0.598, Accuracy 0.900, Margin -1.264, Learning Rate 1.000e-05, It/sec 1.022, Tokens/sec 381.361, Peak mem 3.417 GB
Iter 40: Train loss 0.00709062, Chosen reward -0.447, Rejected reward -0.816, Accuracy 1.000, Margin -1.071, Learning Rate 1.000e-05, It/sec 0.904, Tokens/sec 342.027, Peak mem 3.678 GB
Iter 50: Val loss 0.00316891, Val chosen reward -0.618, Val rejected reward -0.962, Val accuracy 1.000, Val margin -1.032, Val took 0.235s
Iter 50: Train loss 0.00232958, Chosen reward -0.500, Rejected reward -0.899, Accuracy 1.000, Margin -1.023, Learning Rate 1.000e-05, It/sec 14.488, Tokens/sec 4520.141, Peak mem 3.678 GB
Saved final weights to /Users/gokdenizgulmez/Desktop/test-orpo-full/adapters.safetensors.

Answer after:

I'm good, how can I help you today? 😄 😄

…terator, Updated batch unpacking to match iterator, Added preference score scaling, Simplified reward calculation, Removed redundant rejected_rewards
@chimezie chimezie mentioned this pull request Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants