All the papers listed in this project come from my usual reading. If you have found some new and interesting papers, I would appreciate it if you let me know!!!
-
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision: https://arxiv.org/abs/2312.09390
-
Weak-to-Strong Reasoning: https://arxiv.org/abs/2407.13647
-
Debating with More Persuasive LLMs Leads to More Truthful Answers: https://arxiv.org/abs/2402.06782
-
CriticGPT: https://openai.com/index/finding-gpt4s-mistakes-with-gpt-4/
-
Aligner: Efficient Alignment by Learning to Correct: https://arxiv.org/abs/2402.02416
-
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks: https://arxiv.org/abs/2401.06751
-
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision: https://arxiv.org/abs/2403.09472
-
Self-playing Adversarial Language Game Enhances LLM Reasoning: https://arxiv.org/abs/2404.10642
-
Theoretical Analysis of Weak-to-Strong Generalization: https://arxiv.org/abs/2405.16043
-
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models: https://arxiv.org/abs/2402.03749
-
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts: https://arxiv.org/abs/2402.15505
-
Quantifying the Gain in Weak-to-Strong Generalization: https://arxiv.org/abs/2405.15116
-
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge: https://arxiv.org/abs/2407.19594
-
Optimizing Language Model's Reasoning Abilities with Weak Supervision: https://arxiv.org/abs/2405.04086
-
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment: https://arxiv.org/abs/2405.17888
-
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization: https://arxiv.org/abs/2406.11431
-
LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement: https://arxiv.org/abs/2407.00497
-
Bayesian WeakS-to-Strong from Text Classification to Generation: https://arxiv.org/abs/2406.03199
-
Transcendence: Generative Models Can Outperform The Experts That Train Them: https://arxiv.org/abs/2406.11741
-
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models: https://arxiv.org/abs/2405.19262
-
On Scalable Oversight with Weak LLMs Judging Strong LLMs: https://arxiv.org/abs/2407.04622