LLM Reasoning

LLM Reasoning

Survey

🌟 Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities, arXiv, 2501.09686, arxiv, pdf, cication: -1

Fengli Xu, Qianyue Hao, Zefang Zong, ..., Chen Gao, Yong Li
🌟 Test-time Computing: from System-1 Thinking to System-2 Thinking, arXiv, 2501.02497, arxiv, pdf, cication: -1

Yixin Ji, Juntao Li, Hai Ye, ..., Linjian Mo, Min Zhang · (Awesome_Test_Time_LLMs - Dereck0602)
A Survey on LLM Inference-Time Self-Improvement, arXiv, 2412.14352, arxiv, pdf, cication: -1

Xiangjue Dong, Maria Teleki, James Caverlee

Reasoning

Iterate to Accelerate: A Unified Framework for Iterative Reasoning and Feedback Convergence, arXiv, 2502.03787, arxiv, pdf, cication: -1

Jacob Fein-Ashley

· (reddit)
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning, arXiv, 2502.03275, arxiv, pdf, cication: -1

DiJia Su, Hanlin Zhu, Yingchen Xu, ..., Yuandong Tian, Qinqing Zheng
🌟 LIMO: Less is More for Reasoning, arXiv, 2502.03387, arxiv, pdf, cication: -1

Yixin Ye, Zhen Huang, Yang Xiao, ..., Shijie Xia, Pengfei Liu · (LIMO - GAIR-NLP)
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate, arXiv, 2501.17703, arxiv, pdf, cication: -1

Yubo Wang, Xiang Yue, Wenhu Chen · (CritiqueFineTuning - TIGER-AI-Lab)
Reasoning Language Models: A Blueprint, arXiv, 2501.11223, arxiv, pdf, cication: -1

Maciej Besta, Julia Barth, Eric Schreiber, ..., Hubert Niewiadomski, Torsten Hoefler
PokerBench: Training Large Language Models to become Professional Poker Players, arXiv, 2501.08328, arxiv, pdf, cication: -1

Richard Zhuang, Akshat Gupta, Richard Yang, ..., Zhengyu Li, Gopala Anumanchipalli · (pokerbench - pokerllm)
🌟 OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking, arXiv, 2501.09751, arxiv, pdf, cication: -1

Zekun Xi, Wenbiao Yin, Jizhan Fang, ..., Fei Huang, Huajun Chen · (zjunlp.github)
🌟 Evolving Deeper LLM Thinking, arXiv, 2501.09891, arxiv, pdf, cication: -1

Kuang-Huei Lee, Ian Fischer, Yueh-Hua Wu, ..., Dale Schuurmans, Xinyun Chen
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong, arXiv, 2501.09775, arxiv, pdf, cication: -1

Tairan Fu, Javier Conde, Gonzalo Martínez, ..., María Grandury, Pedro Reviriego
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains, arXiv, 2501.05707, arxiv, pdf, cication: -1

Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, ..., Shuang Li, Igor Mordatch · (llm-multiagent-ft.github)
Towards AI Superhuman Reasoning for Math and beyond

· (youtu)
Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models, arXiv, 2410.02205, arxiv, pdf, cication: -1

Yinhong Liu, Zhijiang Guo, Tianya Liang, ..., Ivan Vulić, Nigel Collier
🌟 Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, arXiv, 2403.09629, arxiv, pdf, cication: -1

Eric Zelikman, Georges Harik, Yijia Shao, ..., Nick Haber, Noah D. Goodman
🌟 Token-Budget-Aware LLM Reasoning, arXiv, 2412.18547, arxiv, pdf, cication: -1

Tingxu Han, Chunrong Fang, Shiyu Zhao, ..., Zhenyu Chen, Zhenting Wang · (TALE - GeniusHTX) · (𝕏)
🌟 B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners, arXiv, 2412.17256, arxiv, pdf, cication: -1

Weihao Zeng, Yuzhen Huang, Lulu Zhao, ..., Zifei Shan, Junxian He
Deliberation in Latent Space via Differentiable Cache Augmentation, arXiv, 2412.17747, arxiv, pdf, cication: -1

Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, ..., Jun Xie, Arthur Szlam
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning, arXiv, 2412.15797, arxiv, pdf, cication: -1

Sungjin Park, Xiao Liu, Yeyun Gong, ..., Edward Choi
🌟 Chain-of-Thought Reasoning Without Prompting, arXiv, 2402.10200, arxiv, pdf, cication: -1

Xuezhi Wang, Denny Zhou · (𝕏)
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models, arXiv, 2412.11605, arxiv, pdf, cication: -1

Jiale Cheng, Xiao Liu, Cunxiang Wang, ..., Hongning Wang, Minlie Huang · (SPaR - thu-coai)
🌟 Are Your LLMs Capable of Stable Reasoning?, arXiv, 2412.13147, arxiv, pdf, cication: -1

Junnan Liu, Hongwei Liu, Linchen Xiao, ..., Songyang Zhang, Kai Chen · (GPassK. - open-compass)
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations, arXiv, 2412.13171, arxiv, pdf, cication: -1

Jeffrey Cheng, Benjamin Van Durme
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks, arXiv, 2412.15204, arxiv, pdf, cication: -1

Yushi Bai, Shangqing Tu, Jiajie Zhang, ..., Jie Tang, Juanzi Li · (longbench2.github)
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, arXiv, 2412.02830, arxiv, pdf, cication: -1

Hieu Tran, Zonghai Yao, Junda Wang, ..., Zhichao Yang, Hong Yu
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models, arXiv, 2412.02674, arxiv, pdf, cication: -1

Yuda Song, Hanlin Zhang, Carson Eisenach, ..., Dean Foster, Udaya Ghai · (𝕏)
Frontier Models are Capable of In-context Scheming, arXiv, 2412.04984, arxiv, pdf, cication: -1

Alexander Meinke, Bronson Schoen, Jérémy Scheurer, ..., Rusheb Shah, Marius Hobbhahn
🌟 Training Large Language Models to Reason in a Continuous Latent Space, arXiv, 2412.06769, arxiv, pdf, cication: -1

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, ..., Jason Weston, Yuandong Tian · (𝕏)
🌟 Paper page - Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
MALT: Improving Reasoning with Multi-Agent LLM Training, arXiv, 2412.01928, arxiv, pdf, cication: -1

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, ..., Ronald Clark, Christian Schroeder de Witt
Reverse Thinking Makes LLMs Stronger Reasoners, arXiv, 2411.19865, arxiv, pdf, cication: -1

Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, ..., Chen-Yu Lee, Tomas Pfister
🌟 Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS, arXiv, 2411.18478, arxiv, pdf, cication: -1

Jinyang Wu, Mingkuan Feng, Shuai Zhang, ..., Zengqi Wen, Jianhua Tao · (arxiv) · (jinyangwu.github)
DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power 𝕏
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games, arXiv, 2411.13543, arxiv, pdf, cication: -1

Davide Paglieri, Bartłomiej Cupiał, Samuel Coward, ..., Jack Parker-Holder, Tim Rocktäschel
🌟 Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding, arXiv, 2411.04282, arxiv, pdf, cication: -1

Haolin Chen, Yihao Feng, Zuxin Liu, ..., Caiming Xiong, Huan Wang · (LaTRO - SalesforceAIResearch)
Large Language Models Can Self-Improve in Long-context Reasoning, arXiv, 2411.08147, arxiv, pdf, cication: -1

Siheng Li, Cheng Yang, Zesen Cheng, ..., Yujiu Yang, Wai Lam
🌟 Combining Induction and Transduction for Abstract Reasoning, arXiv, 2411.02272, arxiv, pdf, cication: -1

Wen-Ding Li, Keya Hu, Carter Larsen, ..., Yewen Pu, Kevin Ellis · (𝕏)
🌟 The Surprising Effectiveness ofTest-Time Training for Abstract Reasoning

· (𝕏) · (marc - ekinakyurek)
Can Language Models Learn to Skip Steps?, arXiv, 2411.01855, arxiv, pdf, cication: -1

Tengxiao Liu, Qipeng Guo, Xiangkun Hu, ..., Xipeng Qiu, Zheng Zhang
SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization, arXiv, 2410.21411, arxiv, pdf, cication: -1

Wanhua Li, Zibin Meng, Jiawei Zhou, ..., Chuang Gan, Hanspeter Pfister · (SocialGPT - Mengzibin)
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents, arXiv, 2410.22476, arxiv, pdf, cication: -1

Ankan Mullick, Sombit Bose, Abhilash Nandy, ..., Gajula Sai Chaitanya, Pawan Goyal
Combining Induction and Transduction for Abstract Reasoning
Improve Vision Language Model Chain-of-thought Reasoning, arXiv, 2410.16198, arxiv, pdf, cication: -1

Ruohong Zhang, Bowen Zhang, Yanghao Li, ..., Ruoming Pang, Yiming Yang

· (LLaVA-Reasoner-DPO - RifleZhang)

Math Reasoning

MathArena: Evaluating LLMs on Uncontaminated Math Competitions
🌟 Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2, arXiv, 2502.03544, arxiv, pdf, cication: -1

Yuri Chervonyi, Trieu H. Trinh, Miroslav Olšák, ..., Quoc V. Le, Thang Luong · (𝕏)
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback, arXiv, 2501.10799, arxiv, pdf, cication: -1

Yen-Ting Lin, Di Jin, Tengyu Xu, ..., Hao Ma, Han Fang
🌟 The Lessons of Developing Process Reward Models in Mathematical Reasoning, arXiv, 2501.07301, arxiv, pdf, cication: -1

Zhenru Zhang, Chujie Zheng, Yangzhen Wu, ..., Jingren Zhou, Junyang Lin
🌟 BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning, arXiv, 2501.03226, arxiv, pdf, cication: -1

Beichen Zhang, Yuhong Liu, Xiaoyi Dong, ..., Dahua Lin, Jiaqi Wang · (BoostStep - beichenzbc)
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics, arXiv, 2501.04686, arxiv, pdf, cication: -1

Ruilin Luo, Zhuofan Zheng, Yifan Wang, ..., Jin Zeng, Yujiu Yang · (ursa-math.github)
🌟 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, arXiv, 2402.03300, arxiv, pdf, cication: 155

Zhihong Shao, Peiyi Wang, Qihao Zhu, ..., Y. Wu, Daya Guo · (𝕏)
continual-pre-training of Llama-3.2-3B on a mix of 📐 FineMath (our new high quality math dataset) and FineWeb-Edu. 🤗
Slow Perception: Let's Perceive Geometric Figures Step-by-step, arXiv, 2412.20631, arxiv, pdf, cication: -1

Haoran Wei, Youyang Yin, Yumeng Li, ..., Zheng Ge, Xiangyu Zhang · (Slow-Perception - Ucas-HaoranWei)
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving, arXiv, 2412.20735, arxiv, pdf, cication: -1

Yang Li, Dong Du, Linfeng Song, ..., Tao Yang, Haitao Mi
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling, arXiv, 2412.15084, arxiv, pdf, cication: -1

Zihan Liu, Yang Chen, Mohammad Shoeybi, ..., Bryan Catanzaro, Wei Ping · (research.nvidia)
Formal Mathematical Reasoning: A New Frontier in AI, arXiv, 2412.16075, arxiv, pdf, cication: -1

Kaiyu Yang, Gabriel Poesia, Jingxuan He, ..., Swarat Chaudhuri, Dawn Song
FineMath consists of 34B tokens (FineMath-3+) and 54B tokens (FineMath-3+ with InfiMM-WebMath-3+) of mathematical educational content filtered from CommonCrawl. 🤗

· (𝕏)
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs, arXiv, 2412.03205, arxiv, pdf, cication: -1

Konstantin Chernyshev, Vitaliy Polshkov, Ekaterina Artemova, ..., Alexei Miasnikov, Sergei Tilga · (u-math - Toloka)
ProcessBench: Identifying Process Errors in Mathematical Reasoning, arXiv, 2412.06559, arxiv, pdf, cication: -1

Chujie Zheng, Zhenru Zhang, Beichen Zhang, ..., Jingren Zhou, Junyang Lin
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI, arXiv, 2411.04872, arxiv, pdf, cication: -1

Elliot Glazer, Ege Erdil, Tamay Besiroglu, ..., Tetiana Grechuk, Shreepranav Varma Enugandla · (epochai) · (𝕏)
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning, arXiv, 2410.22304, arxiv, pdf, cication: -1

Yihe Deng, Paul Mineiro
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics, arXiv, 2410.21272, arxiv, pdf, cication: -1

Yaniv Nikankin, Anja Reusch, Aaron Mueller, ..., Yonatan Belinkov · (x)
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models, arXiv, 2410.07985, arxiv, pdf, cication: -1

Bofei Gao, Feifan Song, Zhe Yang, ..., Tianyu Liu, Baobao Chang · (Omni-MATH - KbsdJames) · (omni-math.github) · (huggingface) · (huggingface)

O1 Reasoning

7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient
There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study
Qwen 0.5b on GRPO
DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL

· (𝕏) · (deepscaler - agentica-project)
🌟 s1: Simple test-time scaling, arXiv, 2501.19393, arxiv, pdf, cication: -1

Niklas Muennighoff, Zitong Yang, Weijia Shi, ..., Emmanuel Candès, Tatsunori Hashimoto · (s1 - simplescaling)
a fine-tuned version of Qwen/Qwen2.5-32B-Instruct on the Bespoke-Stratos-17k dataset. 🤗
🌟 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs, arXiv, 2501.18585, arxiv, pdf, cication: -1

Yue Wang, Qiuzhi Liu, Jiahao Xu, ..., Haitao Mi, Dong Yu
How DeepSeek Changes the LLM Story 🎬
o1复现的一点点心得
🌟 simpleRL-reason - hkust-nlp

Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient · (hkust-nlp.notion)
🌟 open-r1 - huggingface
TinyZero - Jiayi-Pan

· (𝕏) · (wandb)
With R1, a lot of people have been asking “how come we didn't discover this 2 years ago?” 𝕏
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning, arXiv, 2501.12570, arxiv, pdf, cication: -1

Haotian Luo, Li Shen, Haiying He, ..., Xiaochun Cao, Dacheng Tao · (O1-Pruner - StarDewXXX)
DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs
🌟 Meta Chain-of-Thought: Unlocking System 2 Reasoning in LLMs

· (𝕏)
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning, arXiv, 2501.06458, arxiv, pdf, cication: -1

Zhongzhen Huang, Gui Geng, Shengyi Hua, ..., Pengfei Liu, Xiaofan Zhang
🌟 DeepSeek-R1 - deepseek-ai
Kimi-k1.5 - MoonshotAI

Scaling Reinforcement Learning with LLMs
🌟 Sky-T1: Train your own O1 preview model within $450

· (SkyThought - NovaSky-AI)
🌟 rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking, arXiv, 2501.04519, arxiv, pdf, cication: -1

Xinyu Guan, Li Lyna Zhang, Yifei Liu, ..., Fan Yang, Mao Yang · (rStar - microsoft)
🌟 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, arXiv, 2501.04682, arxiv, pdf, cication: -1

Violet Xiang, Charlie Snell, Kanishk Gandhi, ..., Nick Haber, Chelsea Finn
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models, arXiv, 2501.03124, arxiv, pdf, cication: -1

Mingyang Song, Zhaochen Su, Xiaoye Qu, ..., Jiawei Zhou, Yu Cheng · (prmbench.github) · (PRMBench - ssmisya) · (arxiv) · (huggingface)
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback, arXiv, 2501.03916, arxiv, pdf, cication: -1

Jiakang Yuan, Xiangchao Yan, Botian Shi, ..., Yu Qiao, Bowen Zhou
Search-o1: Agentic Search-Enhanced Large Reasoning Models, arXiv, 2501.05366, arxiv, pdf, cication: -1

Xiaoxi Li, Guanting Dong, Jiajie Jin, ..., Peitian Zhang, Zhicheng Dou · (Search-o1 - sunnynexus)
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models, arXiv, 2412.15287, arxiv, pdf, cication: 1

Yinlam Chow, Guy Tennenholtz, Izzeddin Gur, ..., Aviral Kumar, Aleksandra Faust · (𝕏)
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, arXiv, 2412.21187, arxiv, pdf, cication: -1

Xingyu Chen, Jiahao Xu, Tian Liang, ..., Haitao Mi, Dong Yu
Reasoning with o1
Reasoning with o1 🎬
PRIME - PRIME-RL

· (curvy-check-498.notion) · (𝕏)
SmallThinker-3B-preview, a new model fine-tuned from the Qwen2.5-3b-Instruct model. 🤗
distill its thinking capacities into a smaller model, enhancing their reasoning performances 𝕏

· (t)
OpenAI o1 System Card, arXiv, 2412.16720, arxiv, pdf, cication: -1

OpenAI, :, Aaron Jaech, ..., Zheng Shao, Zhuohan Li
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems, arXiv, 2412.09413, arxiv, pdf, cication: -1

Yingqian Min, Zhipeng Chen, Jinhao Jiang, ..., Zhongyuan Wang, Ji-Rong Wen · (Slow_Thinking_with_LLMs - RUCAIBox)
🌟 search-and-learn - huggingface

· (huggingface) · (𝕏)
Beyond Decoding: Meta-Generation Algorithms for Large Language Models

· (cmu-l3.github)
uncensored version of Qwen/QwQ-32B-Preview created with abliteration 🤗

· (remove-refusals-with-transformers - Sumandora)
OpenAI's o1 using "search" was a PSYOP
Inference Time Compute 🎬
Free Process Rewards without Process Labels, arXiv, 2412.01981, arxiv, pdf, cication: -1

Lifan Yuan, Wendi Li, Huayu Chen, ..., Zhiyuan Liu, Hao Peng
Natural Language Reinforcement Learning, arXiv, 2411.14251, arxiv, pdf, cication: -1

Xidong Feng, Ziyu Wan, Haotian Fu, ..., Ying Wen, Jun Wang · (arxiv) · (Natural-language-RL - waterhorse1) · (mp.weixin.qq)
Can we make any smaller opensource LLM models smarter than human?
OpenAI o1开启「后训练」时代强化学习新范式

· (bilibili)
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers, arXiv, 2411.17501, arxiv, pdf, cication: -1

Benedikt Stroebl, Sayash Kapoor, Arvind Narayanan · (𝕏) · (𝕏)
Exploring OpenAI O1 Model Replication
Open-O1 - Open-Source-O1

A Model Matching Proprietary Power with Open-Source Innovation
Patience Is The Key to Large Language Model Reasoning, arXiv, 2411.13082, arxiv, pdf, cication: -1

Yijiong Yu

· (huggingface)
🌟 O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?, arXiv, 2411.16489, arxiv, pdf, cication: -1

Zhen Huang, Haoyang Zou, Xuefeng Li, ..., Weizhe Yuan, Pengfei Liu · (O1-Journey - GAIR-NLP)
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision, arXiv, 2411.16579, arxiv, pdf, cication: -1

Zhiheng Xi, Dingwen Yang, Jixuan Huang, ..., Xuanjing Huang, Yu-Gang Jiang · (mathcritique.github)
QwQ: Reflect Deeply on the Boundaries of the Unknown

· (huggingface)
Skywork o1 Open model series 🤗
🌟 O1-Journey - GAIR-NLP
Beyond Decoding: Meta-Generation Algorithms for Large Language Models

· (simons.berkeley)
🌟 From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, arXiv, 2406.16838, arxiv, pdf, cication: -1

Sean Welleck, Amanda Bertsch, Matthew Finlayson, ..., Ilia Kulikov, Zaid Harchaoui · (cmu-l3.github)
honorable mentions to Nous Forge Reasoning API and Fireworks f1, DeepSeek appear to have made the first convincing attempt
DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! 𝕏

· (t)
🌟 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions, arXiv, 2411.14405, arxiv, pdf, cication: -1

Yu Zhao, Huifeng Yin, Bo Zeng, ..., Weihua Luo, Kaifu Zhang · (Marco-o1 - AIDC-AI)
🌟 entropix - xjdr-alt
Thinking-Claude - richards199999
Speculations on Test-Time Scaling (o1) 🎬
Tess-R1 is designed with test-time compute in mind, and has the capabilities to produce a Chain-of-Thought (CoT) reasoning before producing the final output. 🤗
LLaMA-O1 - SimpleBerry

Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace · (qbitai)
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, arXiv, 2410.13639, arxiv, pdf, cication: -1

Siwei Wu, Zhongyuan Peng, Xinrun Du, ..., Chenghua Lin, J. H. Liu
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces, arXiv, 2410.09918, arxiv, pdf, cication: -1

DiJia Su, Sainbayar Sukhbaatar, Michael Rabbat, ..., Yuandong Tian, Qinqing Zheng
O1-Journey - GAIR-NLP

A Strategic Progress Report
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, arXiv, 2410.13639, arxiv, pdf, cication: -1

Siwei Wu, Zhongyuan Peng, Xinrun Du, ..., Chenghua Lin, J. H. Liu

Disentanglement

Disentangling Memory and Reasoning Ability in Large Language Models, arXiv, 2411.13504, arxiv, pdf, cication: -1

Mingyu Jin, Weidi Luo, Sitao Cheng, ..., William Yang Wang, Yongfeng Zhang · (Disentangling-Memory-and-Reasoning - MingyuJ666)

Self Correction

ProgCo: Program Helps Self-Correction of Large Language Models, arXiv, 2501.01264, arxiv, pdf, cication: -1

Xiaoshuai Song, Yanan Wu, Weixun Wang, ..., Wenbo Su, Bo Zheng

Knowledge

Context Learning

🌟 Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization, arXiv, 2412.18525, arxiv, pdf, cication: -1

Yang Shen, Xiu-Shen Wei, Yifan Sun, ..., Yazhou Yao, Errui Ding
The broader spectrum of in-context learning, arXiv, 2412.03782, arxiv, pdf, cication: -1

Andrew Kyle Lampinen, Stephanie C. Y. Chan, Aaditya K. Singh, ..., Murray Shanahan · (𝕏)

Chain Of Thought

🌟 Demystifying Long Chain-of-Thought Reasoning in LLMs, arXiv, 2502.03373, arxiv, pdf, cication: -1

Edward Yeo, Yuxuan Tong, Morry Niu, ..., Graham Neubig, Xiang Yue · (𝕏) · (demystify-long-cot - eddycmu)
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step, arXiv, 2501.13926, arxiv, pdf, cication: -1

Ziyu Guo, Renrui Zhang, Chengzhuo Tong, ..., Hongsheng Li, Pheng-Ann Heng · (Image-Generation-CoT - ZiyuGuo99)
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model, arXiv, 2501.07246, arxiv, pdf, cication: -1

Ziyang Ma, Zhuo Chen, Yuping Wang, ..., Eng Siong Chng, Xie Chen
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning, arXiv, 2409.12183, arxiv, pdf, cication: 24

Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, ..., Kyle Mahowald, Greg Durrett · (To-CoT-or-not-to-CoT - Zayne-sprague) · (𝕏)
Internalize_CoT_Step_by_Step - da03

· (huggingface)
LLMs Do Not Think Step-by-step In Implicit Reasoning, arXiv, 2411.15862, arxiv, pdf, cication: -1

Yijiong Yu

· (𝕏)
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration, arXiv, 2410.16540, arxiv, pdf, cication: -1

Yingqian Cui, Pengfei He, Xianfeng Tang, ..., Jiliang Tang, Yue Xing · (𝕏)
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse, arXiv, 2410.21333, arxiv, pdf, cication: -1

Ryan Liu, Jiayi Geng, Addison J. Wu, ..., Tania Lombrozo, Thomas L. Griffiths

Prompt

Prompt Design at Character.AI
Does Prompt Formatting Have Any Impact on LLM Performance?, arXiv, 2411.10541, arxiv, pdf, cication: -1

Jia He, Mukund Rungta, David Koleczek, ..., Franklin X Wang, Sadid Hasan
automatic prompt optimization algorithms 𝕏
Automatic Prompt Optimization
MacOS 15.1 Apple Intelligence Prompt Templates 𝕏
use RL to automatically improve our prompts 𝕏
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs, arXiv, 2410.12405, arxiv, pdf, cication: -1

Jingming Zhuo, Songyang Zhang, Xinyu Fang, ..., Dahua Lin, Kai Chen · (ProSA - open-compass)

Projects

Open-Reasoning-Tasks - NousResearch
prompt-poet - character-ai
V0-system-prompt - 2-fly-4-ai

· (reddit)
steiner-preview updated 3 days ago Reasoning models trained on synthetic data using reinforcement learning. 🤗

Planning

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge, arXiv, 2501.18099, arxiv, pdf, cication: -1

Swarnadeep Saha, Xian Li, Marjan Ghazvininejad, ..., Jason Weston, Tianlu Wang · (𝕏)
Dynamic Planning with a LLM, arXiv, 2308.06391, arxiv, pdf, cication: -1

Gautier Dagan, Frank Keller, Alex Lascarides
On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark), arXiv, 2302.06706, arxiv, pdf, cication: -1

Karthik Valmeekam, Sarath Sreedharan, Matthew Marquez, ..., Alberto Olmo, Subbarao Kambhampati
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement, arXiv, 2310.08559, arxiv, pdf, cication: -1

Linlu Qiu, Liwei Jiang, Ximing Lu, ..., Nouha Dziri, Xiang Ren
Revealing the Barriers of Language Agents in Planning, arXiv, 2410.12409, arxiv, pdf, cication: 1

Jian Xie, Kexun Zhang, Jiangjie Chen, ..., Lei Li, Yanghua Xiao

Jian Xie, Kexun Zhang, Jiangjie Chen, ..., Lei Li, Yanghua Xiao

Misc

Denny Zhou: LLM Reasoning: Key Ideas and Limitations 🎬
Jason Wei: Scaling Paradigms for Large Language Models 🎬
智谱版o1终于也来了：直接拿下考研数学，一句话就能做小游戏！
Quick recap on the state of reasoning -- can LMs reason? 𝕏

· (youtube)
【北大对齐团队独家解读：OpenAI o1开启「后训练」时代强化学习新范式】 🎬
The Problem with Reasoners

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm_reasoning.md

llm_reasoning.md

LLM Reasoning

Survey

Reasoning

Math Reasoning

O1 Reasoning

Disentanglement

Self Correction

Knowledge

Context Learning

Chain Of Thought

Prompt

Projects

Planning

Misc

Files

llm_reasoning.md

Latest commit

History

llm_reasoning.md

File metadata and controls

LLM Reasoning

Survey

Reasoning

Math Reasoning

O1 Reasoning

Disentanglement

Self Correction

Knowledge

Context Learning

Chain Of Thought

Prompt

Projects

Planning

Misc