Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about the conclusion of this paper #6

Open
OleNet opened this issue May 31, 2023 · 1 comment
Open

A question about the conclusion of this paper #6

OleNet opened this issue May 31, 2023 · 1 comment

Comments

@OleNet
Copy link

OleNet commented May 31, 2023

"Scaling Data-Constrained Language Models" is a very nice paper, and I learn a lot from this paper.

However, I have a question about this paper:

In the abstract and Figure 1, it recommends we should train 4 epochs.

But Figure 3 shows that we should choose 59 epochs.

So my question is why the optimal epoch is not 4 epochs in Figure 3.

Thanks in advance.

@Muennighoff
Copy link
Collaborator

"Scaling Data-Constrained Language Models" is a very nice paper, and I learn a lot from this paper.

However, I have a question about this paper:

In the abstract and Figure 1, it recommends we should train 4 epochs.

But Figure 3 shows that we should choose 59 epochs.

So my question is why the optimal epoch is not 4 epochs in Figure 3.

Thanks in advance.

This is because of immense diminishing returns. So while you will be able to get better loss by training >4 epochs, returns diminish sharply (Figure 5 / attached). At 59 epochs, you're spending a lot of compute to get an extra tiny reduction in loss.

Meanwhile, at 4 epochs returns are still very close to the returns you would get from new data and your compute is well spent.

Lmk if it's unclear!

Screenshot 2023-05-31 at 8 03 28 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants