Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources #19

Open
eagle705 opened this issue Jan 31, 2023 · 0 comments
Assignees

Comments

@eagle705
Copy link
Owner

eagle705 commented Jan 31, 2023

Note

Summary

  • 최근 인상적인 모델인 OpenAI's ChatGPT가 등장
    • undoubtedly clever, capable, and very fun to talk to
  • The natural question is how ChatGPT gets there, and where these fantastic abilities come from
    • 어디서 이러한 능력들이 오는지 질문하게됨
  • how the GPT-3.5 model family들을 보면서 알아볼 예정

Initial 2020 GPT-3, large-scale pretraining

  • GPT-3의 중요한 능력 3가지
    • Language generation: to follow a prompt and then generate a completion of the given prompt
    • In-context learning: to follow a few examples of a given task and then generate the solution for a new test case.
    • World knowledge: including factual knowledge and commonsense
  • 이러한 능력은 어디서 오나?
  • come from large-scale pretraining — to pretrain the 175B parameters model on 300B tokens (60% 2016 - 2019 C4 + 22% WebText2 + 16% Books + 3% Wikipedia)
  • A curious question is how strong the initial GPT-3 is.
    • ChatGPT standard에서는 똑똑하다고 보긴 어렵다

From 2020 GPT-3 to 2022 ChatGPT

image

  • text-davinci-003 recovered (but still worse than code-davinci-002) some in-context learning ability that is lost in text-davinci-002 (presumably because it tunes the model with LM mix-in) and further improved zero-shot ability (thanks to RLHF).
  • On the other hand, ChatGPT seems to have sacrificed nearly all of its in-context learning ability to trade for the ability to model dialog context.
  • 핵심 포인트
    • Instruction tuning does not inject new abilities into the model — all abilities are already there. Instead, instruction tuning unlocks/ elicit these abilities. This is mostly because the instruction tuning data is orders or magnitudes less than the pretraining data.
    • Instruction tuning adjusts skillsets of GPT-3.5 towards different branches. Some are better at in-context learning like text-davinci-003, some are better at dialog like ChatGPT.
    • Instruction tuning trade performance for alignment with humans. The OpenAI authors call it “alignment tax” in their instruction tuning paper. Also, many papers have reported code-davinci-002 achieves the best performance on benchmarks. Instruction tuning on code-davinci-002 gives the subsequent models alignments like zero-shot question answering, generating safe and impartial dialog responses, and rejecting questions beyond its knowledge scope.
  • 핵심포인트 in Korean
    • 능력은 모델에 내재된걸 꺼내쓰는 것일 거다
    • InstructTuning에 따라 발현되는 능력이 달라지고(branch라 표현) in context learning 능력은 떨어지나 다른 능력이 더 커지는 (Zero-shot) 형태로 진행된다

Code-Davinci-002 & Text-Davinci-002, training on code, tuning on instructions

  • 001 모델들은 COT등을 할지 못한다, 성능이 002에 비해 나쁨

The sources of complex reasoning ability and the ability to generalize to new tasks

  • the two first GPT3.5 models
    • code-davinci-002
    • text-davinci-002
  • four important abilities they exhibit that differentiate them from the initial GPT-3
    • Responding to human instruction: previously, the outputs of GPT-3 were mostly high-frequency prompt-completion patterns within the training set. Now the model generates reasonable answers to the prompt, rather than related but useless sentences.
    • Generalization to unseen tasks: when the number of instructions used for tuning the model is beyond a certain scale, the model can automatically generate completions for new instructions that are not in the training set. This ability is crucial for deployment, as users with always come up with new prompts.
    • Code generation and code understanding: obviously, because the model is trained on code.
    • Complex reasoning with chain-of-thought: previously, the model could not do tasks requiring multi-step reasoning with chain-of-thought. codex-davinci-002 and text-davinci-002 are the two initial models exhibiting chain-of-thought reasoning ability.
      • The reason that chain-of-thought is important is because that CoT is likely to be the key to unlock the emergent abilities and transcend scaling laws. See the previous blog post.
  • 정리하면,
    • human instruction에 맞게 더 잘 생성
    • unseen task를 더 잘한다 (Instruct튜닝이 특정 스케일 넘어가면 new instruction에도 잘함, 배포에서 매우 중요)
    • 코드 생성, 이해
    • CoT 잘함 (이게 emergent abilities & transcend scaling laws 이유 아닌가)
      • The ability of complex reasoning with chain-of-thought is likely to be a magical side product of training on code:
      • The initial GPT-3 is not trained on code, and it cannot do chain-of-thought
      • PaLM has 5% code training data, and it can do chain-of-thought.
      • The code data in the codex paper is 159G, approximately 28% of the initial GPT-3 570G training data. code-davinci-002 and its subsequent variants can do chain-of-thought.
      • As an intuition, think about how procedure-oriented programming is similar to solving tasks step by step, and how object-oriented programming is similar to decomposing complex tasks into simpler ones.
  • Additionally, long-term dependency might also be a nice side effect of training on code. As is pointed out by Peter Liu. “Next token prediction for language is usually very local, whereas code often requires longer dependencies to do things like close brackets or refer to distant defs”.

Are these abilities already there after pretraining or later injected by fine-tuning?

  • Are the above three abilities already there in the initial GPT-3 but triggered/ unlocked by instruction and code training or not in the initial GPT-3 but injected by instruction and code training?

text-davinci-003 & ChatGPT, the power of Reinforcement Learning from Human Feedback (RLHF)

  • We first note that the following comparisons between text-davinci-002 v.s. text-davinci-003 v.s. ChatGPT:

    • All three models are instruction tuned.
    • text-davinci-002 is a supervised instruction-tuned model
    • text-davinci-003 and ChatGPT are instruction tuned with Reinforcement Learning with Human Feedback (RLHF). This is the most prominent difference.
  • This means that most of the new model behaviors are the product of RLHF.

    • 요약하면, 정보성발화, 밸런스잡힌 답변, 부적절한질문거절, 지식밖 질문 거절
    • Informative responses: text-davinci-003’s generation is usually longer than text-davinci-002. ChatGPT’s response is even more verbose such that one has to explicitly ask, “answer me in one sentence” to make it concise. This is a direct product of RLHF.
    • Impartial responses: ChatGPT often gives very balanced responses on events involving interests from multiple entities, such as political events. This is also a product of RLHF
    • **Rejecting improper questions.**This is the combination of a content filter and the model’s own ability induced by RLHF.
    • Rejecting questions outside its knowledge scope: for example, rejecting new events that happened after Jun 2021. This is the most amazing part of RLHF because it enables the model to implicitly and automatically classify which information is within its knowledge and which is not.
  • There are two important things to notice:

    • All the abilities are intrinsically within the model, not injected by RLHF. RLHF triggers/unlock these abilities to emerge.
    • Knowing what it does not know is not achieved by writing rules; it is also unlocked by RLHF. This is a very surprising finding, as the original goal of RLHF is for alignment, which is more related to generating safe responses than knowing what the model does not know.
  • What happens behind the scene might be:

    • ChatGPT: Trade in-context learning for dialog history modeling. This is an empirical observation as ChatGPT seems not to be strongly affected by in-context demonstrations as text-davinci-003 does.
    • Text-davinci-003: recover the in-context learning ability sacrificed by text-davinci-002 and improve the zero-shot ability. We are not sure if this is also a side product of RLHF or something else. According to the instructGPT paper, this is from the LM-mixing during the RL tuning stage (not RLHF itself).
@eagle705 eagle705 self-assigned this Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant