TypeError: type list doesn't define round method - why I am getting this error #2674

Tarak200 · 2025-01-28T11:54:02Z

I am getting this error while logging the loss during reward model training in RewardTrainer 318 line. can I get some help how to proceed?

qgallouedec · 2025-01-28T12:39:07Z

Thanks for reporting, please provide a MRE, system info etc. Refer to the bug report template to help you with it

Tarak200 · 2025-01-31T10:24:57Z

Issue

Unable to log the metrics and loss while doing training of reward model

Sample Code

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
)
from trl import RewardTrainer, RewardConfig
from peft import PeftModel, LoraConfig, TaskType

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_name = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter_path = "/raid/ganesh/nagakalyani/nagakalyani/autograding/huggingface_codellama/nithin_zero-shot_2.0/RLHF/qwen/model/final_checkpoint"

model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code = True, torch_dtype=torch.bfloat16, device_map="auto", quantization_config = bnb_config)

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code = True)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    model.config.pad_token_id = model.config.eos_token_id

    output_dir = "./reward_model"

    training_args = RewardConfig(
        center_rewards_coefficient=0.01,
        output_dir= output_dir,
        per_device_train_batch_size=4,
        gradient_accumulation_steps= 8,
        eval_strategy="steps",
        logging_steps=10,
        num_train_epochs = 1,
        report_to="tensorboard",
        max_length = 512,
        save_steps = 0.2,
        save_strategy="steps",
        gradient_checkpointing = True,
        fp16 = True,
        metric_for_best_model="eval_loss",
        optim="paged_adamw_32bit",
        save_safetensors=True,
        # optim="adamw_torch",
        learning_rate=2e-5,
        # report_to = "wandb",
        logging_dir="./logs"
    )

    peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules = ["q_proj", "v_proj", "k_proj", "o_proj"],
)

    trainer = RewardTrainer(
        model=model,
        args=training_args,
        tokenizer=tokenizer,
        train_dataset=formatted_dataset["train"],
        eval_dataset=formatted_dataset["test"],
        peft_config=peft_config,
    )

    trainer.train()

Package versions

transformers==4.44.0
trl==0.11.4

github-actions bot added 🏋 Reward Related to Reward modelling 🐛 bug Something isn't working labels Jan 28, 2025

qgallouedec added the ⏳ needs more info Additional information or clarification is required to proceed label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: type list doesn't define round method - why I am getting this error #2674

TypeError: type list doesn't define round method - why I am getting this error #2674

Tarak200 commented Jan 28, 2025

qgallouedec commented Jan 28, 2025

Tarak200 commented Jan 31, 2025

TypeError: type list doesn't define __round__ method - why I am getting this error #2674

TypeError: type list doesn't define __round__ method - why I am getting this error #2674

Comments

Tarak200 commented Jan 28, 2025

qgallouedec commented Jan 28, 2025

Tarak200 commented Jan 31, 2025

Issue

Package versions

TypeError: type list doesn't define round method - why I am getting this error #2674

TypeError: type list doesn't define round method - why I am getting this error #2674