Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: type list doesn't define __round__ method - why I am getting this error #2674

Open
Tarak200 opened this issue Jan 28, 2025 · 2 comments
Labels
🐛 bug Something isn't working ⏳ needs more info Additional information or clarification is required to proceed 🏋 Reward Related to Reward modelling

Comments

@Tarak200
Copy link

I am getting this error while logging the loss during reward model training in RewardTrainer 318 line. can I get some help how to proceed?

@github-actions github-actions bot added 🏋 Reward Related to Reward modelling 🐛 bug Something isn't working labels Jan 28, 2025
@qgallouedec
Copy link
Member

Thanks for reporting, please provide a MRE, system info etc. Refer to the bug report template to help you with it

@qgallouedec qgallouedec added the ⏳ needs more info Additional information or clarification is required to proceed label Jan 30, 2025
@Tarak200
Copy link
Author

Issue

Unable to log the metrics and loss while doing training of reward model

Sample Code

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
)
from trl import RewardTrainer, RewardConfig
from peft import PeftModel, LoraConfig, TaskType

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_name = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter_path = "/raid/ganesh/nagakalyani/nagakalyani/autograding/huggingface_codellama/nithin_zero-shot_2.0/RLHF/qwen/model/final_checkpoint"

model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code = True, torch_dtype=torch.bfloat16, device_map="auto", quantization_config = bnb_config)

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code = True)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    model.config.pad_token_id = model.config.eos_token_id

    output_dir = "./reward_model"

    training_args = RewardConfig(
        center_rewards_coefficient=0.01,
        output_dir= output_dir,
        per_device_train_batch_size=4,
        gradient_accumulation_steps= 8,
        eval_strategy="steps",
        logging_steps=10,
        num_train_epochs = 1,
        report_to="tensorboard",
        max_length = 512,
        save_steps = 0.2,
        save_strategy="steps",
        gradient_checkpointing = True,
        fp16 = True,
        metric_for_best_model="eval_loss",
        optim="paged_adamw_32bit",
        save_safetensors=True,
        # optim="adamw_torch",
        learning_rate=2e-5,
        # report_to = "wandb",
        logging_dir="./logs"
    )

    peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules = ["q_proj", "v_proj", "k_proj", "o_proj"],
)

    trainer = RewardTrainer(
        model=model,
        args=training_args,
        tokenizer=tokenizer,
        train_dataset=formatted_dataset["train"],
        eval_dataset=formatted_dataset["test"],
        peft_config=peft_config,
    )

    trainer.train()

Package versions

transformers==4.44.0
trl==0.11.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working ⏳ needs more info Additional information or clarification is required to proceed 🏋 Reward Related to Reward modelling
Projects
None yet
Development

No branches or pull requests

2 participants