A question about the conclusion of this paper #6

OleNet · 2023-05-31T13:15:31Z

"Scaling Data-Constrained Language Models" is a very nice paper, and I learn a lot from this paper.

However, I have a question about this paper:

In the abstract and Figure 1, it recommends we should train 4 epochs.

But Figure 3 shows that we should choose 59 epochs.

So my question is why the optimal epoch is not 4 epochs in Figure 3.

Thanks in advance.

Muennighoff · 2023-05-31T18:06:24Z

"Scaling Data-Constrained Language Models" is a very nice paper, and I learn a lot from this paper.

However, I have a question about this paper:

In the abstract and Figure 1, it recommends we should train 4 epochs.

But Figure 3 shows that we should choose 59 epochs.

So my question is why the optimal epoch is not 4 epochs in Figure 3.

Thanks in advance.

This is because of immense diminishing returns. So while you will be able to get better loss by training >4 epochs, returns diminish sharply (Figure 5 / attached). At 59 epochs, you're spending a lot of compute to get an extra tiny reduction in loss.

Meanwhile, at 4 epochs returns are still very close to the returns you would get from new data and your compute is well spent.

Lmk if it's unclear!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about the conclusion of this paper #6

A question about the conclusion of this paper #6

OleNet commented May 31, 2023

Muennighoff commented May 31, 2023

A question about the conclusion of this paper #6

A question about the conclusion of this paper #6

Comments

OleNet commented May 31, 2023

Muennighoff commented May 31, 2023