Normalize TensorFlow training steps by batch size #8

lewtun · 2022-02-11T13:55:15Z

This PR addresses a comment on the forums, where a reader was confused why num_train_steps wasn't normalized by the batch size.

Following the official TensorFlow examples in transformers, I've included this normalization in all the TensorFlow sections of the course. The only sections I didn't make a change are:

because in both those sections we have

num_train_steps = len(tf_train_dataset)

and I wasn't sure if this was intentional.

Rocketknight1 · 2022-02-11T15:34:10Z

Hi @lewtun! The tf.data.Dataset objects here already have a batch() operation applied - they're more like torch DataLoader objects. After a batch operation, their len() is num_samples // batch_size already, so we shouldn't need to do that division twice.

Rocketknight1 · 2022-02-11T15:38:28Z

That said, the comments are confusing because it isn't clear the division has already been performed. Maybe I should clarify them?

lewtun · 2022-02-11T15:40:36Z

Ah thanks for the insight! I'll close this PR and it would be great if you could open a new one with a clarifying remark because the comment in the code is indeed confusing for TF noobs (myself included!)

As a side note, how come the official examples normalize by batch size, e.g. here? In that example, we have

num_train_steps=int(training_args.num_train_epochs * train_batches_per_epoch)

where

train_batches_per_epoch = len(train_dataset) // total_train_batch_size

It was reading the official examples that convinced me there was an error in the course 😄

Rocketknight1 · 2022-02-11T15:48:18Z

Agreed! In that example, train_dataset is the HuggingFace Dataset, so its len() is the number of samples. When we convert it to a TF tf.data.Dataset it becomes more like a DataLoader, so its len() is num_samples // batch_size. The confusion between the two types of Dataset object has been a problem for me, so I've tried to be clear and give the variables names like tf_train_dataset when I make the conversion, but I realize it's still quite opaque for people who aren't familiar.

lewtun · 2022-02-11T15:51:10Z

Ah that makes more sense now!

Normalize TensorFlow training steps by batch size

cba5c69

lewtun requested a review from Rocketknight1 February 11, 2022 13:55

lewtun closed this Feb 11, 2022

lewtun deleted the fix-train-steps branch February 11, 2022 15:47

CaterinaBi added a commit to CaterinaBi/hugging-face-course-translation that referenced this pull request Oct 16, 2022

Translates question huggingface#8

ac83604

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize TensorFlow training steps by batch size #8

Normalize TensorFlow training steps by batch size #8

lewtun commented Feb 11, 2022

Rocketknight1 commented Feb 11, 2022

Rocketknight1 commented Feb 11, 2022

lewtun commented Feb 11, 2022 •

edited

Loading

Rocketknight1 commented Feb 11, 2022

lewtun commented Feb 11, 2022

Normalize TensorFlow training steps by batch size #8

Normalize TensorFlow training steps by batch size #8

Conversation

lewtun commented Feb 11, 2022

Rocketknight1 commented Feb 11, 2022

Rocketknight1 commented Feb 11, 2022

lewtun commented Feb 11, 2022 • edited Loading

Rocketknight1 commented Feb 11, 2022

lewtun commented Feb 11, 2022

lewtun commented Feb 11, 2022 •

edited

Loading