-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSpeed with trl #2490
Comments
Thanks for reporting, please provide a minimal code/steps to reproduce this. |
pipeline.zip (edit by maintainer: remove link) thanks @qgallouedec The attached files constitute a pipeline that using the DPOTrainer with DeepSpeed. |
Sorry but we don't use zip files. The easy way to provide a MRE is to go line by line, if the error remains when you remove it, then you can discard the line. When there is no line left to remove, you have your MRE |
Sorry @qgallouedec , here a minimal code of my pipeline: import argparse
import json
import math
import pandas as pd
import torch
from accelerate import Accelerator
from datasets import Dataset
from torch import optim, nn
from torch.optim.lr_scheduler import LambdaLR
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModelForMaskedLM, AutoModel, AutoModelForCausalLM
from trl import PPOConfig, PPOTrainer, AutoModelForCausalLMWithValueHead, DPOConfig, DPOTrainer
from huggingface_hub import login
import os
import itertools
model_RLRF = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct", torch_dtype=torch.float32)
tokenizer_RLRF = AutoTokenizer.from_pretrained(model_RLRF_name_or_path)
tokenizer_RLRF.add_special_tokens({'pad_token': tokenizer_RLRF.eos_token})
tokenizer_RLRF.padding_side = 'left'
DPO_config = DPOConfig(
report_to='tensorboard',
logging_first_step=True,
"per_device_train_batch_size": 4,
"gradient_accumulation_steps": 1,
"sync_ref_model": true,
"ref_model_mixup_alpha": 0.6,
"ref_model_sync_steps": 256,
"bf16": True,
)
# Create reference model:
parameter_names = [n for n, _ in model_RLRF.named_parameters()]
ref_model = deepcopy(model)
# if no layers are shared, return copy of model
for param_name in parameter_names:
param = ref_model.get_parameter(param_name)
param.requires_grad = False
ref_model.eval()
# Set optimizer for RLRF
optimizer_RLRF = optim.AdamW(filter(lambda param: param.requires_grad, model_RLRF.parameters()),
lr=HP_RLRF.get('learning_rate', 1.41e-5))
train_dataset= pd.read_csv("perfernces_dataset_from_ranker_train_queries_and_baseline_doc.csv")
train_dataset= Dataset.from_pandas(train_dataset_RLRF)
dpo_trainer = DPOTrainer(model=model, args=DPO_config, processing_class=tokenizer_RLRF, ref_model=ref_model,
optimizers=(optimizer, None), train_dataset=train_dataset)
dpo_trainer.train() the loaded data file (train_dataset) is: |
System Info
Describe the bug
I am trying to train meta-llama/Llama-3.1-8B-Instruct with trl DPOTrainer.
After creating the trainer and starting the training loop, I'm getting the following error (in the forward pass):
I tried to downgrade transformers with no success.
System info:
my accelerate config:
trl env output:
Information
Tasks
examples
folderReproduction
outputs:
Expected behavior
Train my model using the DPOTrainer and deepspeed
Checklist
The text was updated successfully, but these errors were encountered: