Skip to content

Latest commit



19 lines (16 loc) · 1.15 KB

Language Models are Few Shot

File metadata and controls

19 lines (16 loc) · 1.15 KB

Key ideas

  • Recent advancements (BERT) show that pre-traning + fine-tuning to specific tasks is successful
  • Still, this method requires fine-tuning to tasks with 10000s of examples
  • Humans can perform a new language task with a few examples or simple instructions, they don't need 1000s of examples
  • GPT-3 with 175B parameters: autoregressive language model, able to learn with few-shot training


  • Major limitation remains that you need to fine-tune the model to specific tasks, and some tasks might not have datasets to train with
  • Potential to exploit spurious correlation between tasks
  • Apparently a huge number of parameters allows models to develop a sort of meta-learning, pattern recognition


  • Few-shot: a few demonstrations of the task at inference time are given, but no weights updates are allowed. Typically between 10 and 100 examples are provided
  • Results for few-shot happen to be much worse than state-of-the-art fine tuned models.
  • Same model as GPT-2
  • Evaluation: draw K examples from a task test's set and test them