Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

round epoch only in console #30237

Merged
merged 1 commit into from
Apr 15, 2024

Conversation

xdedss
Copy link
Contributor

@xdedss xdedss commented Apr 13, 2024

What does this PR do?

This PR fixes the problem that the "epoch" value is rounded to 2 digits before logging to wandb, resulting in inaccurate plottings.

Details:

In Trainer.log function, logs["epoch"] is rounded to 2 digits. As a result, the plotting in wandb is jaggy and some data points would be missing from the plot if you select "epoch" as the x-axis

This is a minimal example to reproduce these plots:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
import numpy as np
import evaluate

# Load Dataset
dataset = load_dataset("yelp_review_full")

# Tokenization
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Data Split
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(5000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(5000))

# Model
model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)

# Training Arguments
training_args = TrainingArguments(
    output_dir="test_trainer", 
    evaluation_strategy="epoch", 
    logging_steps=1, 
    num_train_epochs=1, 
    report_to="wandb"
    )

# Metrics
metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

# Training
trainer.train()

The commit message that introduced this rounding says this is to make the logging message look better, but this value will also be sent to wandb for plotting and produce jagged curves.

What this PR do is to round the number only in the handler that goes to the console since we still want accurate epoch value for other logging & plotting purposes.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? No.
  • Did you make sure to update the documentation with your changes? (I am not sure where this should be in the documentations)
  • Did you write any new necessary tests? (I am not sure if this should be tested separately, but pytest tests\trainer\test_trainer_callback.py is successful.

Who can review?

This fix is related to the trainer @muellerzr and @pacman100

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this - looks good to me!

Happy with the changes. Once we have another approval from @muellerzr or @pacman100 we can merge

Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Makes sense to me as well. If we get some issues with people noticing increased memory, we'll need to include a to_device call down the road on logs, but on a quick glance of the code I can't 100% tell if that happens or not when logging something like large tensors (for whatever reason)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@amyeroberts amyeroberts merged commit 7668101 into huggingface:main Apr 15, 2024
21 checks passed
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Apr 18, 2024
ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024
itazap pushed a commit that referenced this pull request May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants