MetaICL still needs to update its parameters during meta-training? #25

SongDark · 2023-09-20T16:23:04Z

In Abstrat part of this Paper, it wrotes "with no parameter updates or task-specific tem-plates"

I thought this project a new method for "Prompt Tuning" through meta-learning, aiming at providing a better Prompt/Instruction than regular In-Context Learning.
But in the code, in "model.do_train()", it updates the model's parameters by BP (loss.backward()) ? Is is still a type of FT (fine-tune) ?

If I change my base-LM to something HUGE like GPT-3 176B，it costs too much.

shmsw25 · 2023-09-20T18:25:37Z

Hi @SongDark, thank you for the question - and let me clarify about the claim in the paper.

We think of meta-training as a task-agnostic method, perhaps similar to pre-training, which should be done once in the beginning, but once completed, it doesn't have to be updated. In this context, it is true MetaICL requires training in the beginning, but once it is done, it does not require any parameter updates in order to be used for downstream tasks. Let me know if this clarifies things!

SongDark · 2023-09-20T23:11:03Z

Hi @SongDark, thank you for the question - and let me clarify about the claim in the paper.

We think of meta-training as a task-agnostic method, perhaps similar to pre-training, which should be done once in the beginning, but once completed, it doesn't have to be updated. In this context, it is true MetaICL requires training in the beginning, but once it is done, it does not require any parameter updates in order to be used for downstream tasks. Let me know if this clarifies things!

@shmsw25
I see.
Another quesion is, does MetaICL bring extra improvements to big models like GPT-3? Since it already gains strong abilities in semantic comprehension, from its enormous Parameters(176B).

shmsw25 · 2023-09-22T22:25:39Z

Hi @SongDark,

Do you mean (a) MetaICL trained with GPT-3, or (b) MetaICL trained with GPT-2 Large (the current MetaICL)?

If you mean (a), I do think MetaICL-training on top of GPT-3 is likely to boost significant gains, since the gains seem orthogonal to the scale of the model based on our experiments. We are unable to conduct such experiments since we can't run fine-tuning over GPT-3 ourselves.

If you mean (b), I think this is an empirical question and it is hard to answer without experiments. We saw in the paper (Table 6) that MetaICL based on 770M params outperforms GPT-J which is 8x larger. But GPT-3 is 230x larger, so I think outperforming GPT-3 with 230x fewer parameters might be hard. We don't know the accurate answer since we did not explicitly compare with GPT-3 in the paper, but this is an interesting question that one could follow-up!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MetaICL still needs to update its parameters during meta-training? #25

MetaICL still needs to update its parameters during meta-training? #25

SongDark commented Sep 20, 2023 •

edited

Loading

shmsw25 commented Sep 20, 2023

SongDark commented Sep 20, 2023

shmsw25 commented Sep 22, 2023

MetaICL still needs to update its parameters during meta-training? #25

MetaICL still needs to update its parameters during meta-training? #25

Comments

SongDark commented Sep 20, 2023 • edited Loading

shmsw25 commented Sep 20, 2023

SongDark commented Sep 20, 2023

shmsw25 commented Sep 22, 2023

SongDark commented Sep 20, 2023 •

edited

Loading