Skip to content

metric_for_best_model set to eval_f1 raises KeyError because metric is not found in evaluation results #40217

@monochandan

Description

@monochandan

System Info

Environment: (Google Colab)

Python 3.11.13
torch==2.6.0+cu124
transformers==4.55.2
bitsandbytes==0.47.0
peft==0.17.0
accelerate==1.10.0
numpy==1.26.4
scipy==1.14.1

GPU
NVIDIA L4
Driver Version: 550.54.15
CUDA Version: 12.4

Model Quantized with QLoRA

Dataset:

Train Dataset
{'text': Value('string'), 'embeddings': List(Value('float64')), 'tfidf_vector': List(Value('float64')), 'roberta_sent_neg': Value('float64'), 'roberta_sent_pos': Value('float64'), 'names': Value('int64'), 'organizations': Value('int64'), 'dates': Value('int64'), 'count_tokens': Value('int64'), 'label': Value('int64'), 'input_ids': List(Value('int32')), 'token_type_ids': List(Value('int8')), 'attention_mask': List(Value('int8'))}

Val Dataset
{'text': Value('string'), 'embeddings': List(Value('float64')), 'tfidf_vector': List(Value('float64')), 'roberta_sent_neg': Value('float64'), 'roberta_sent_pos': Value('float64'), 'names': Value('int64'), 'organizations': Value('int64'), 'dates': Value('int64'), 'count_tokens': Value('int64'), 'label': Value('int64'), 'input_ids': List(Value('int32')), 'token_type_ids': List(Value('int8')), 'attention_mask': List(Value('int8'))}

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Code Setup


model = "google-bert/bert-base-uncased"
tokenizer_bert = AutoTokenizer.from_pretrained(model)
if tokenizer_bert.pad_token is None:
    tokenizer_bert.pad_token = tokenizer_bert.eos_token
tokenizer_bert.padding_side = "right"


compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_use_double_quant = True,
    bnb_4bit_compute_dtype = compute_dtype,
)

original_model_bert = AutoModelForSequenceClassification.from_pretrained(
    model,
    num_labels = 2, 
    quantization_config= bnb_config, 
    )
lora_config = LoraConfig(
    r = 8,
    lora_alpha = 16,
    lora_dropout=0.1,
    bias = "none",
    task_type=TaskType.SEQ_CLS, 
)
kbit_model_bert = prepare_model_for_kbit_training(original_model_bert)
kbit_model_bert.gradient_checkpointing_enable()
peft_model_bert = get_peft_model(kbit_model_bert, lora_config)




def compute_metrics(eval_pred):
  logits, labels = eval_pred
  predictions = np.argmax(logits, axis = -1)
  accuracy = accuracy_score(labels, predictions)
  precision = precision_score(labels, predictions)
  recall = recall_score(labels, predictions)
  f1 = f1_score(labels, predictions, average = "binary")
  print(f1)
  print("\n")
  return {"accuracy": accuracy,
      "precision": precision,
      "recall": recall,
      "f1": f1
          }


output_dir = f'/content/drive/}'
args = TrainingArguments(
        output_dir = output_dir,
        weight_decay=0.22511642804764023,
        warmup_ratio=0.12890328790683203,
        adam_beta1=0.9348819720458172,
        adam_beta2=0.9285998615546803,
        adam_epsilon=1.9972958061508847e-07,
        max_grad_norm=4.222172817940239,
        gradient_accumulation_steps=2,
        max_steps=712,
        do_train = True,
        do_eval = True,
        lr_scheduler_type='polynomial',
        warmup_steps=488,
        metric_for_best_model = "eval_f1",
        optim='paged_adamw_32bit',
        learning_rate = 2.1106713456200193e-05,
        num_train_epochs = 40,
        logging_dir = "./logs/",
        logging_strategy = "epoch",
        eval_strategy = "epoch",
        save_strategy = "epoch",
        label_names = ["label"],
        load_best_model_at_end = True,
        save_total_limit = 3,
    )


trainer = Trainer(
        model = peft_model_bert,
        args = args,
        train_dataset = train_dataset,
        eval_dataset = dev_train_dataset,
        compute_metrics = compute_metrics,
        data_collator = DataCollatorWithPadding(tokenizer = tokenizer_bert, padding=True),
        callbacks = [EarlyStoppingCallback(early_stopping_patience=3)]
    )

trainer.train()

Expected behavior

Error Message:

Image

Expected Behavior:

Train the model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions