Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

MetaICL still needs to update its parameters during meta-training? #25

Open
SongDark opened this issue Sep 20, 2023 · 3 comments
Open

Comments

@SongDark
Copy link

SongDark commented Sep 20, 2023

In Abstrat part of this Paper, it wrotes "with no parameter updates or task-specific tem-plates"

I thought this project a new method for "Prompt Tuning" through meta-learning, aiming at providing a better Prompt/Instruction than regular In-Context Learning.
But in the code, in "model.do_train()", it updates the model's parameters by BP (loss.backward()) ? Is is still a type of FT (fine-tune) ?

If I change my base-LM to something HUGE like GPT-3 176B,it costs too much.

@shmsw25
Copy link
Contributor

shmsw25 commented Sep 20, 2023

Hi @SongDark, thank you for the question - and let me clarify about the claim in the paper.

We think of meta-training as a task-agnostic method, perhaps similar to pre-training, which should be done once in the beginning, but once completed, it doesn't have to be updated. In this context, it is true MetaICL requires training in the beginning, but once it is done, it does not require any parameter updates in order to be used for downstream tasks. Let me know if this clarifies things!

@SongDark
Copy link
Author

Hi @SongDark, thank you for the question - and let me clarify about the claim in the paper.

We think of meta-training as a task-agnostic method, perhaps similar to pre-training, which should be done once in the beginning, but once completed, it doesn't have to be updated. In this context, it is true MetaICL requires training in the beginning, but once it is done, it does not require any parameter updates in order to be used for downstream tasks. Let me know if this clarifies things!

@shmsw25
I see.
Another quesion is, does MetaICL bring extra improvements to big models like GPT-3? Since it already gains strong abilities in semantic comprehension, from its enormous Parameters(176B).

@shmsw25
Copy link
Contributor

shmsw25 commented Sep 22, 2023

Hi @SongDark,

Do you mean (a) MetaICL trained with GPT-3, or (b) MetaICL trained with GPT-2 Large (the current MetaICL)?

If you mean (a), I do think MetaICL-training on top of GPT-3 is likely to boost significant gains, since the gains seem orthogonal to the scale of the model based on our experiments. We are unable to conduct such experiments since we can't run fine-tuning over GPT-3 ourselves.

If you mean (b), I think this is an empirical question and it is hard to answer without experiments. We saw in the paper (Table 6) that MetaICL based on 770M params outperforms GPT-J which is 8x larger. But GPT-3 is 230x larger, so I think outperforming GPT-3 with 230x fewer parameters might be hard. We don't know the accurate answer since we did not explicitly compare with GPT-3 in the paper, but this is an interesting question that one could follow-up!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants