You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
In Abstrat part of this Paper, it wrotes "with no parameter updates or task-specific tem-plates"
I thought this project a new method for "Prompt Tuning" through meta-learning, aiming at providing a better Prompt/Instruction than regular In-Context Learning.
But in the code, in "model.do_train()", it updates the model's parameters by BP (loss.backward()) ? Is is still a type of FT (fine-tune) ?
If I change my base-LM to something HUGE like GPT-3 176B,it costs too much.
The text was updated successfully, but these errors were encountered:
Hi @SongDark, thank you for the question - and let me clarify about the claim in the paper.
We think of meta-training as a task-agnostic method, perhaps similar to pre-training, which should be done once in the beginning, but once completed, it doesn't have to be updated. In this context, it is true MetaICL requires training in the beginning, but once it is done, it does not require any parameter updates in order to be used for downstream tasks. Let me know if this clarifies things!
Hi @SongDark, thank you for the question - and let me clarify about the claim in the paper.
We think of meta-training as a task-agnostic method, perhaps similar to pre-training, which should be done once in the beginning, but once completed, it doesn't have to be updated. In this context, it is true MetaICL requires training in the beginning, but once it is done, it does not require any parameter updates in order to be used for downstream tasks. Let me know if this clarifies things!
@shmsw25
I see.
Another quesion is, does MetaICL bring extra improvements to big models like GPT-3? Since it already gains strong abilities in semantic comprehension, from its enormous Parameters(176B).
Do you mean (a) MetaICL trained with GPT-3, or (b) MetaICL trained with GPT-2 Large (the current MetaICL)?
If you mean (a), I do think MetaICL-training on top of GPT-3 is likely to boost significant gains, since the gains seem orthogonal to the scale of the model based on our experiments. We are unable to conduct such experiments since we can't run fine-tuning over GPT-3 ourselves.
If you mean (b), I think this is an empirical question and it is hard to answer without experiments. We saw in the paper (Table 6) that MetaICL based on 770M params outperforms GPT-J which is 8x larger. But GPT-3 is 230x larger, so I think outperforming GPT-3 with 230x fewer parameters might be hard. We don't know the accurate answer since we did not explicitly compare with GPT-3 in the paper, but this is an interesting question that one could follow-up!
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
In Abstrat part of this Paper, it wrotes "with no parameter updates or task-specific tem-plates"
I thought this project a new method for "Prompt Tuning" through meta-learning, aiming at providing a better Prompt/Instruction than regular In-Context Learning.
But in the code, in "model.do_train()", it updates the model's parameters by BP (loss.backward()) ? Is is still a type of FT (fine-tune) ?
If I change my base-LM to something HUGE like GPT-3 176B,it costs too much.
The text was updated successfully, but these errors were encountered: