Skip to content

总结Prompt&LLM论文,开源数据&模型,AIGC应用

Notifications You must be signed in to change notification settings

DSXiangLi/DecryptPrompt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DecryptPrompt

如果LLM的突然到来让你感到沮丧,不妨读下主目录的Choose Your Weapon Survival Strategies for Depressed AI Academics 持续更新以下内容,Star to keep updated~

LLM资源汇总

跟着博客读论文

论文汇总

paper List

综述

  • A Survey of Large Language Models
  • Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing ⭐
  • Paradigm Shift in Natural Language Processing
  • Pre-Trained Models: Past, Present and Future
  • What Language Model Architecture and Pretraining objects work best for zero shot generalization ⭐
  • Towards Reasoning in Large Language Models: A Survey
  • Reasoning with Language Model Prompting: A Survey ⭐
  • An Overview on Language Models: Recent Developments and Outlook ⭐
  • A Survey of Large Language Models[6.29更新版]
  • Unifying Large Language Models and Knowledge Graphs: A Roadmap
  • Augmented Language Models: a Survey ⭐
  • Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
  • Challenges and Applications of Large Language Models
  • The Rise and Potential of Large Language Model Based Agents: A Survey
  • Large Language Models for Information Retrieval: A Survey
  • AI Alignment: A Comprehensive Survey
  • Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications
  • Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
  • A Survey on Language Models for Code
  • Model-as-a-Service (MaaS): A Survey

大模型能力探究

  • In Context Learning
    • LARGER LANGUAGE MODELS DO IN-CONTEXT LEARNING DIFFERENTLY
    • How does in-context learning work? A framework for understanding the differences from traditional supervised learning
    • Why can GPT learn in-context? Language Model Secretly Perform Gradient Descent as Meta-Optimizers ⭐
    • Rethinking the Role of Demonstrations What Makes incontext learning work? ⭐
    • Trained Transformers Learn Linear Models In-Context
    • In-Context Learning Creates Task Vectors
    • FUNCTION VECTORS IN LARGE LANGUAGE MODELS
  • 涌现能力
    • Sparks of Artificial General Intelligence: Early experiments with GPT-4
    • Emerging Ability of Large Language Models ⭐
    • LANGUAGE MODELS REPRESENT SPACE AND TIME
    • Are Emergent Abilities of Large Language Models a Mirage?
  • 能力评估
    • IS CHATGPT A GENERAL-PURPOSE NATURAL LANGUAGE PROCESSING TASK SOLVER?
    • Can Large Language Models Infer Causation from Correlation?
    • Holistic Evaluation of Language Model
    • Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
    • Theory of Mind May Have Spontaneously Emerged in Large Language Models
    • Beyond The Imitation Game: Quantifying And Extrapolating The Capabilities Of Language Models
    • Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
    • Demystifying GPT Self-Repair for Code Generation
    • Evidence of Meaning in Language Models Trained on Programs
    • Can Explanations Be Useful for Calibrating Black Box Models
    • On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
    • Language acquisition: do children and language models follow similar learning stages?
    • Language is primarily a tool for communication rather than thought
  • 领域能力
    • Capabilities of GPT-4 on Medical Challenge Problems
    • Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

Prompt Tunning范式

  • Tunning Free Prompt
    • GPT2: Language Models are Unsupervised Multitask Learners
    • GPT3: Language Models are Few-Shot Learners ⭐
    • LAMA: Language Models as Knowledge Bases?
    • AutoPrompt: Eliciting Knowledge from Language Models
  • Fix-Prompt LM Tunning
    • T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
    • PET-TC(a): Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference ⭐
    • PET-TC(b): PETSGLUE It’s Not Just Size That Matters Small Language Models are also few-shot learners
    • GenPET: Few-Shot Text Generation with Natural Language Instructions
    • LM-BFF: Making Pre-trained Language Models Better Few-shot Learners ⭐
    • ADEPT: Improving and Simplifying Pattern Exploiting Training
  • Fix-LM Prompt Tunning
    • Prefix-tuning: Optimizing continuous prompts for generation
    • Prompt-tunning: The power of scale for parameter-efficient prompt tuning ⭐
    • P-tunning: GPT Understands Too ⭐
    • WARP: Word-level Adversarial ReProgramming
  • LM + Prompt Tunning
    • P-tunning v2: Prompt Tuning Can Be Comparable to Fine-tunning Universally Across Scales and Tasks
    • PTR: Prompt Tuning with Rules for Text Classification
    • PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains
  • Fix-LM Adapter Tunning
    • LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS ⭐
    • LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
    • Parameter-Efficient Transfer Learning for NLP
    • INTRINSIC DIMENSIONALITY EXPLAINS THE EFFECTIVENESS OF LANGUAGE MODEL FINE-TUNING
    • DoRA: Weight-Decomposed Low-Rank Adaptation
  • Representation Tuning
  • ReFT: Representation Finetuning for Language Models

主流LLMS和预训练

  • GLM-130B: AN OPEN BILINGUAL PRE-TRAINED MODEL
  • PaLM: Scaling Language Modeling with Pathways
  • PaLM 2 Technical Report
  • GPT-4 Technical Report
  • Backpack Language Models
  • LLaMA: Open and Efficient Foundation Language Models
  • Llama 2: Open Foundation and Fine-Tuned Chat Models
  • Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
  • OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch
  • Mistral 7B
  • Ziya2: Data-centric Learning is All LLMs Need
  • MEGABLOCKS: EFFICIENT SPARSE TRAINING WITH MIXTURE-OF-EXPERTS
  • TUTEL: ADAPTIVE MIXTURE-OF-EXPERTS AT SCALE
  • Phi1- Textbooks Are All You Need ⭐
  • Phi1.5- Textbooks Are All You Need II: phi-1.5 technical report
  • Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
  • Gemini: A Family of Highly Capable Multimodal Models
  • In-Context Pretraining: Language Modeling Beyond Document Boundaries
  • LLAMA PRO: Progressive LLaMA with Block Expansion
  • QWEN TECHNICAL REPORT
  • Fewer Truncations Improve Language Modeling
  • ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

指令微调&对齐 (instruction_tunning)

  • 经典方案
    • Flan: FINETUNED LANGUAGE MODELS ARE ZERO-SHOT LEARNERS ⭐
    • Flan-T5: Scaling Instruction-Finetuned Language Models
    • ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
    • Instruct-GPT: Training language models to follow instructions with human feedback ⭐
    • T0: MULTITASK PROMPTED TRAINING ENABLES ZERO-SHOT TASK GENERALIZATION
    • Natural Instructions: Cross-Task Generalization via Natural Language Crowdsourcing Instructions
    • Tk-INSTRUCT: SUPER-NATURALINSTRUCTIONS: Generalization via Declarative Instructions on 1600+ NLP Tasks
    • ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-shot Generalization
    • Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
    • INSTRUCTEVAL Towards Holistic Evaluation of Instrucion-Tuned Large Language Models
  • SFT数据Scaling Law
    • LIMA: Less Is More for Alignment ⭐
    • Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning
    • AlpaGasus: Training A Better Alpaca with Fewer Data
    • InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
    • Instruction Mining: High-Quality Instruction Data Selection for Large Language Models
    • Visual Instruction Tuning with Polite Flamingo
    • Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases
    • Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
    • WHEN SCALING MEETS LLM FINETUNING: THE EFFECT OF DATA, MODEL AND FINETUNING METHOD
  • 新对齐/微调方案
    • WizardLM: Empowering Large Language Models to Follow Complex Instructions ⭐
    • Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning
    • Self-Alignment with Instruction Backtranslation ⭐
    • Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
    • Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks
    • PROMPT2MODEL: Generating Deployable Models from Natural Language Instructions
    • OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs
    • Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
    • Human-like systematic generalization through a meta-learning neural network
    • Magicoder: Source Code Is All You Need
    • Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
    • Generative Representational Instruction Tuning
    • InsCL: A Data-efficient Continual Learning Paradigm for Fine-tuning Large Language Models with Instructions
    • The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
    • Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
  • 指令数据生成
    • APE: LARGE LANGUAGE MODELS ARE HUMAN-LEVEL PROMPT ENGINEERS ⭐
    • SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions ⭐
    • iPrompt: Explaining Data Patterns in Natural Language via Interpretable Autoprompting
    • Flipped Learning: Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
    • Fairness-guided Few-shot Prompting for Large Language Models
    • Instruction induction: From few examples to natural language task descriptions .
    • SELF-QA Unsupervised Knowledge Guided alignment.
    • GPT Self-Supervision for a Better Data Annotator
    • The Flan Collection Designing Data and Methods
    • Self-Consuming Generative Models Go MAD
    • InstructEval: Systematic Evaluation of Instruction Selection Methods
    • Overwriting Pretrained Bias with Finetuning Data
    • Improving Text Embeddings with Large Language Models
    • MAGPIE: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
    • Scaling Synthetic Data Creation with 1,000,000,000 Personas
    • UNLEASHING REASONING CAPABILITY OF LLMS VIA SCALABLE QUESTION SYNTHESIS FROM SCRATCH
  • 如何降低通用能力损失
    • How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition
    • TWO-STAGE LLM FINE-TUNING WITH LESS SPECIALIZATION AND MORE GENERALIZATION
  • 微调经验/实验报告
    • BELLE: Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases
    • Baize: Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data
    • A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Large LM
    • Exploring ChatGPT’s Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences
    • Towards Better Instruction Following Language Models for Chinese: Investigating the Impact of Training Data and Evaluation
    • Fine tuning LLMs for Enterprise: Practical Guidelines and Recommendations
  • Others
    • Crosslingual Generalization through Multitask Finetuning
    • Cross-Task Generalization via Natural Language Crowdsourcing Instructions
    • UNIFIEDSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
    • PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
    • ROLELLM: BENCHMARKING, ELICITING, AND ENHANCING ROLE-PLAYING ABILITIES OF LARGE LANGUAGE MODELS

对话模型

  • LaMDA: Language Models for Dialog Applications
  • Sparrow: Improving alignment of dialogue agents via targeted human judgements ⭐
  • BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage
  • How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
  • DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
  • Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
  • DiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogue

思维链 (prompt_chain_of_thought)

  • 基础&进阶用法
    • 【zero-shot-COT】 Large Language Models are Zero-Shot Reasoners ⭐
    • 【few-shot COT】 Chain of Thought Prompting Elicits Reasoning in Large Language Models ⭐
    • 【SELF-CONSISTENCY 】IMPROVES CHAIN OF THOUGHT REASONING IN LANGUAGE MODELS
    • 【LEAST-TO-MOST】 PROMPTING ENABLES COMPLEX REASONING IN LARGE LANGUAGE MODELS ⭐
    • 【TOT】Tree of Thoughts: Deliberate Problem Solving with Large Language Models ⭐
    • 【Plan-and-Solve】 Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
    • 【Verify-and-Edit】: A Knowledge-Enhanced Chain-of-Thought Framework
    • 【GOT】Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
    • 【TOMT】Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning
    • 【LAMBADA】: Backward Chaining for Automated Reasoning in Natural Language
    • 【AOT】Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models ⭐
    • 【GOT】Graph of Thoughts: Solving Elaborate Problems with Large Language Models ⭐
    • 【PHP】Progressive-Hint Prompting Improves Reasoning in Large Language Models
    • 【HtT】LARGE LANGUAGE MODELS CAN LEARN RULES ⭐
    • 【DIVSE】DIVERSITY OF THOUGHT IMPROVES REASONING ABILITIES OF LARGE LANGUAGE MODELS
    • 【CogTree】From Complex to Simple: Unraveling the Cognitive Tree for Reasoning with Small Language Models
    • 【Step-Back】Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models ⭐
    • 【OPRO】LARGE LANGUAGE MODELS AS OPTIMIZERS ⭐
    • 【BOT】Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
    • Abstraction-of-Thought Makes Language Models Better Reasoners
    • 【SymbCoT】Faithful Logical Reasoning via Symbolic Chain-of-Thought
    • 【XOT】EVERYTHING OF THOUGHTS : DEFYING THE LAW OF PENROSE TRIANGLE FOR THOUGHT GENERATION
    • 【IoT】Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning
    • 【DOT】On the Diagram of Thought
    • 【ROT】Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up.
    • Thinking Forward and Backward: Effective Backward Planning with Large Language Models
  • 非传统COT问题分解方向
    • Decomposed Prompting A MODULAR APPROACH FOR Solving Complex Tasks
    • Successive Prompting for Decomposing Complex Questions
  • 分领域COT [Math, Code, Tabular, QA]
    • Solving Quantitative Reasoning Problems with Language Models
    • SHOW YOUR WORK: SCRATCHPADS FOR INTERMEDIATE COMPUTATION WITH LANGUAGE MODELS
    • Solving math word problems with processand outcome-based feedback
    • CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
    • T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering
    • LEARNING PERFORMANCE-IMPROVING CODE EDITS
    • Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
  • 原理分析
    • Chain of Thought Empowers Transformers to Solve Inherently Serial Problems ⭐
    • Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters ⭐
    • TEXT AND PATTERNS: FOR EFFECTIVE CHAIN OF THOUGHT IT TAKES TWO TO TANGO
    • Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective
    • Large Language Models Can Be Easily Distracted by Irrelevant Context
    • Chain-of-Thought Reasoning Without Prompting
    • Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs
    • Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
    • To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning ⭐
    • Why think step by step? Reasoning emerges from the locality of experience
  • 小模型COT蒸馏
    • Specializing Smaller Language Models towards Multi-Step Reasoning ⭐
    • Teaching Small Language Models to Reason
    • Large Language Models are Reasoning Teachers
    • Distilling Reasoning Capabilities into Smaller Language Models
    • The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
    • Distilling System 2 into System 1
  • COT样本自动构建/选择
    • AutoCOT:AUTOMATIC CHAIN OF THOUGHT PROMPTING IN LARGE LANGUAGE MODELS
    • Active Prompting with Chain-of-Thought for Large Language Models
    • COMPLEXITY-BASED PROMPTING FOR MULTI-STEP REASONING
  • COT能力学习
    • Large Language Models Can Self-Improve
    • Training Chain-of-Thought via Latent-Variable Inference
    • Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
    • STaR: Self-Taught Reasoner Bootstrapping ReasoningWith Reasoning
    • V-STaR: Training Verifiers for Self-Taught Reasoners
    • THINK BEFORE YOU SPEAK: TRAINING LANGUAGE MODELS WITH PAUSE TOKENS
    • SELF-DIRECTED SYNTHETIC DIALOGUES AND REVISIONS TECHNICAL REPORT
  • others
    • OlaGPT Empowering LLMs With Human-like Problem-Solving abilities
    • Challenging BIG-Bench tasks and whether chain-of-thought can solve them
    • Large Language Models are Better Reasoners with Self-Verification
    • ThoughtSource A central hub for large language model reasoning data
    • Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

RLHF

  • Deepmind
    • Teaching language models to support answers with verified quotes
    • sparrow, Improving alignment of dialogue agents via targetd human judgements ⭐
    • STATISTICAL REJECTION SAMPLING IMPROVES PREFERENCE OPTIMIZATION
    • Reinforced Self-Training (ReST) for Language Modeling
    • SLiC-HF: Sequence Likelihood Calibration with Human Feedback
    • CALIBRATING SEQUENCE LIKELIHOOD IMPROVES CONDITIONAL LANGUAGE GENERATION
    • REWARD DESIGN WITH LANGUAGE MODELS
    • Final-Answer RL Solving math word problems with processand outcome-based feedback
    • Solving math word problems with process- and outcome-based feedback
    • Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
    • BOND: Aligning LLMs with Best-of-N Distillation
    • RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
    • Generative Verifiers: Reward Modeling as Next-Token Prediction
    • Training Language Models to Self-Correct via Reinforcement Learning
  • openai
    • PPO: Proximal Policy Optimization Algorithms ⭐
    • Deep Reinforcement Learning for Human Preference
    • Fine-Tuning Language Models from Human Preferences
    • learning to summarize from human feedback
    • InstructGPT: Training language models to follow instructions with human feedback ⭐
    • Scaling Laws for Reward Model Over optimization ⭐
    • WEAK-TO-STRONG GENERALIZATION: ELICITING STRONG CAPABILITIES WITH WEAK SUPERVISION ⭐
    • PRM:Let's verify step by step ⭐
    • Training Verifiers to Solve Math Word Problems [PRM的前置依赖]
    • OpenAI Super Alignment Blog
    • LLM Critics Help Catch LLM Bugs ⭐
    • PROVER-VERIFIER GAMES IMPROVE LEGIBILITY OF LLM OUTPUTS
    • Rule Based Rewards for Language Model Safety
    • Self-critiquing models for assisting human evaluators
  • Anthropic
    • A General Language Assistant as a Laboratory for Alignmen
    • Measuring Progress on Scalable Oversight or Large Language Models
    • Red Teaming Language Models to Reduce Harms Methods,Scaling Behaviors and Lessons Learned
    • Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback ⭐
    • Constitutional AI Harmlessness from AI Feedback ⭐
    • Pretraining Language Models with Human Preferences
    • The Capacity for Moral Self-Correction in Large Language Models
    • Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Trainin
  • AllenAI, RL4LM:IS REINFORCEMENT LEARNING (NOT) FOR NATURAL LANGUAGE PROCESSING BENCHMARKS
  • 改良方案
    • RRHF: Rank Responses to Align Language Models with Human Feedback without tears
    • Chain of Hindsight Aligns Language Models with Feedback
    • AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
    • RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
    • RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
    • Training Socially Aligned Language Models in Simulated Human Society
    • RAIN: Your Language Models Can Align Themselves without Finetuning
    • Generative Judge for Evaluating Alignment
    • PEERING THROUGH PREFERENCES: UNRAVELING FEEDBACK ACQUISITION FOR ALIGNING LARGE LANGUAGE MODELS
    • SALMON: SELF-ALIGNMENT WITH PRINCIPLE-FOLLOWING REWARD MODELS
    • Large Language Model Unlearning ⭐
    • ADVERSARIAL PREFERENCE OPTIMIZATION ⭐
    • Preference Ranking Optimization for Human Alignment
    • A Long Way to Go: Investigating Length Correlations in RLHF
    • ENABLE LANGUAGE MODELS TO IMPLICITLY LEARN SELF-IMPROVEMENT FROM DATA
    • REWARD MODEL ENSEMBLES HELP MITIGATE OVEROPTIMIZATION
    • LEARNING OPTIMAL ADVANTAGE FROM PREFERENCES AND MISTAKING IT FOR REWARD
    • ULTRAFEEDBACK: BOOSTING LANGUAGE MODELS WITH HIGH-QUALITY FEEDBACK
    • MOTIF: INTRINSIC MOTIVATION FROM ARTIFICIAL INTELLIGENCE FEEDBACK
    • STABILIZING RLHF THROUGH ADVANTAGE MODEL AND SELECTIVE REHEARSAL
    • Shepherd: A Critic for Language Model Generation
    • LEARNING TO GENERATE BETTER THAN YOUR LLM
    • Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
    • Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
    • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
    • HIR The Wisdom of Hindsight Makes Language Models Better Instruction Followers
    • Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
    • A Minimaximalist Approach to Reinforcement Learning from Human Feedback
    • PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs
    • Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
    • Weak-to-Strong Extrapolation Expedites Alignment
    • Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
    • Token-level Direct Preference Optimization
    • SimPO: Simple Preference Optimization with a Reference-Free Reward
    • AUTODETECT: Towards a Unified Framework for Automated Weakness Detection in Large Language Models
    • META-REWARDING LANGUAGE MODELS: Self-Improving Alignment with LLM-as-a-Meta-Judge
    • HELPSTEER: Multi-attribute Helpfulness Dataset for STEERLM
    • Recursive Introspection: Teaching Language Model Agents How to Self-Improve
    • Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
  • RL探究
    • UNDERSTANDING THE EFFECTS OF RLHF ON LLM GENERALISATION AND DIVERSITY
    • A LONG WAY TO GO: INVESTIGATING LENGTH CORRELATIONS IN RLHF
    • THE TRICKLE-DOWN IMPACT OF REWARD (IN-)CONSISTENCY ON RLHF
    • Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
    • HUMAN FEEDBACK IS NOT GOLD STANDARD
    • CONTRASTIVE POST-TRAINING LARGE LANGUAGE MODELS ON DATA CURRICULUM
    • Language Models Resist Alignment

Post Train

  • Inference Scaling
    • An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
    • Are More LM Calls All You Need? Towards the Scaling Properties of Compound AI Systems
    • Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
    • Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters ⭐
    • Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
    • Planning In Natural Language Improves LLM Search For Code Generation
    • ReST-MCTS∗ : LLM Self-Training via Process Reward Guided Tree Search
    • AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training
    • Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
    • The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

LLM Agent 让模型使用工具 (llm_agent)

  • A Survey on Large Language Model based Autonomous Agents
  • PERSONAL LLM AGENTS: INSIGHTS AND SURVEY ABOUT THE CAPABILITY, EFFICIENCY AND SECURITY
  • 基于prompt通用方案
    • ReAct: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS ⭐
    • Self-ask: MEASURING AND NARROWING THE COMPOSITIONALITY GAP IN LANGUAGE MODELS ⭐
    • MRKL SystemsA modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning
    • PAL: Program-aided Language Models
    • ART: Automatic multi-step reasoning and tool-use for large language models
    • ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models ⭐
    • Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
    • Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models ⭐
    • Faithful Chain-of-Thought Reasoning
    • Reflexion: Language Agents with Verbal Reinforcement Learning ⭐
    • Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework
    • RestGPT: Connecting Large Language Models with Real-World RESTful APIs
    • ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models
    • InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems
    • TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents
    • ControlLLM: Augment Language Models with Tools by Searching on Graphs
    • Reflexion: an autonomous agent with dynamic memory and self-reflection
    • AutoAgents: A Framework for Automatic Agent Generation
    • GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension
    • PreAct: Predicting Future in ReAct Enhances Agent's Planning Ability
    • TOOLLLM: FACILITATING LARGE LANGUAGE MODELS TO MASTER 16000+ REAL-WORLD APIS ⭐ -AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
    • AIOS: LLM Agent Operating System
    • LLMCompiler An LLM Compiler for Parallel Function Calling
    • Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval
  • 基于微调通用方案
    • TALM: Tool Augmented Language Models
    • Toolformer: Language Models Can Teach Themselves to Use Tools ⭐
    • Tool Learning with Foundation Models
    • Tool Maker:Large Language Models as Tool Maker
    • TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
    • AgentTuning: Enabling Generalized Agent Abilities for LLMs
    • SWIFTSAGE: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
    • FireAct: Toward Language Agent Fine-tuning
    • Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
    • REST MEETS REACT: SELF-IMPROVEMENT FOR MULTI-STEP REASONING LLM AGENT
    • Efficient Tool Use with Chain-of-Abstraction Reasoning
    • Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
    • AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
    • Agent Lumos: Unified and Modular Training for Open-Source Language Agents
    • ToolGen: Unified Tool Retrieval and Calling via Generation
  • 调用模型方案
    • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
    • Gorilla:Large Language Model Connected with Massive APIs ⭐
    • OpenAGI: When LLM Meets Domain Experts
  • 垂直领域
    • 数据分析
      • DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning
      • InsightLens: Discovering and Exploring Insights from Conversational Contexts in Large-Language-Model-Powered Data Analysis
      • Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
      • Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System
      • TaskWeaver: A Code-First Agent Framework
      • Automated Social Science: Language Models as Scientist and Subjects
      • Data Interpreter: An LLM Agent For Data Science
    • 金融
      • WeaverBird: Empowering Financial Decision-Making with Large Language Model, Knowledge Base, and Search Engine
      • FinGPT: Open-Source Financial Large Language Models
      • FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design
      • AlphaFin:使用检索增强股票链框架对财务分析进行基准测试
      • A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist ⭐
      • Can Large Language Models Beat Wall Street? Unveiling the Potential of AI in stock Selection
      • ENHANCING ANOMALY DETECTION IN FINANCIAL MARKETS WITH AN LLM-BASED MULTI-AGENT FRAMEWORK
      • TRADINGGPT: MULTI-AGENT SYSTEM WITH LAYERED MEMORY AND DISTINCT CHARACTERS FOR ENHANCED FINANCIAL TRADING PERFORMANCE
      • FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models
      • LLMFactor: Extracting Profitable Factors through Prompts for Explainable Stock Movement Prediction
      • Alpha-GPT: Human-AI Interactive Alpha Mining for Quantitative Investment
      • Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs
    • 生物医疗
      • GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information
      • ChemCrow Augmenting large language models with chemistry tools
      • Generating Explanations in Medical Question-Answering by Expectation Maximization Inference over Evidence
      • Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents
      • Integrating Chemistry Knowledge in Large Language Models via Prompt Engineering
    • web/mobile Agent
      • AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
      • A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
      • Mind2Web: Towards a Generalist Agent for the Web
      • MiniWoB++ Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
      • WEBARENA: A REALISTIC WEB ENVIRONMENT FORBUILDING AUTONOMOUS AGENTS
      • AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
      • WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
      • WebVoyager: Building an End-to-end Web Agent with Large Multimodal Models
      • CogAgent: A Visual Language Model for GUI Agents
      • Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
      • WebCanvas: Benchmarking Web Agents in Online Environments
    • software engineer
    • Agents in Software Engineering: Survey, Landscape, and Vision
    • ChatDev: Communicative Agents for Software Development
    • 其他
      • ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
      • WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
      • ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
      • PointLLM: Empowering Large Language Models to Understand Point Clouds
      • Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models
      • CarExpert: Leveraging Large Language Models for In-Car Conversational Question Answering
      • SCIAGENTS: AUTOMATING SCIENTIFIC DISCOVERY THROUGH MULTI-AGENT INTELLIGENT GRAPH REASONING
  • 评估
    • Evaluating Verifiability in Generative Search Engines
    • Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions
    • API-Bank: A Benchmark for Tool-Augmented LLMs
    • ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
    • Automatic Evaluation of Attribution by Large Language Models
    • Benchmarking Large Language Models in Retrieval-Augmented Generation
    • ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
    • Agent-as-a-Judge: Evaluate Agents with Agents
  • MultiAgent
    • GENERATIVE AGENTS
    • LET MODELS SPEAK CIPHERS: MULTIAGENT DEBATE THROUGH EMBEDDINGS
    • War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars
    • Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
    • Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
    • Generative Agents: Interactive Simulacra of Human Behavior ⭐
    • AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
    • System-1.x: Learning to Balance Fast and Slow Planning with Language Models
    • Agents Thinking Fast and Slow:A Talker-Reasoner Architecture
    • 多智能体系统
      • Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
      • MULTI-AGENT COLLABORATION: HARNESSING THE POWER OF INTELLIGENT LLM AGENTS
      • Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
  • 任务型智能体协作
    • METAAGENTS: SIMULATING INTERACTIONS OF HUMAN BEHAVIORS FOR LLM-BASED TASK-ORIENTED COORDINATION VIA COLLABORATIVE
    • CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society ⭐
    • Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf
    • Communicative Agents for Software Development ⭐
    • MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
    • METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK
  • 智能体路由
    • One Agent To Rule Them All: Towards Multi-agent Conversational AI
    • A Multi-Agent Conversational Recommender System
  • 基座模型路由&Ensemble
    • Large Language Model Routing with Benchmark Datasets
    • LLM-BL E N D E R: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
    • RouteLLM: Learning to Route LLMs with Preference Data
    • More Agents Is All You Need
    • Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
  • 自主学习和探索进化
    • AppAgent: Multimodal Agents as Smartphone Users
    • Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution
    • LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
    • Empowering Large Language Model Agents through Action Learning
    • Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
    • OS-COPILOT: TOWARDS GENERALIST COMPUTER AGENTS WITH SELF-IMPROVEMENT
    • LLAMA RIDER: SPURRING LARGE LANGUAGE MODELS TO EXPLORE THE OPEN WORLD
    • PAST AS A GUIDE: LEVERAGING RETROSPECTIVE LEARNING FOR PYTHON CODE COMPLETION
    • AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents
    • A Survey on Self-Evolution of Large Language Models
    • ExpeL: LLM Agents Are Experiential Learners
    • ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy
  • 其他
    • LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
    • Inference with Reference: Lossless Acceleration of Large Language Models
    • RecallM: An Architecture for Temporal Context Understanding and Question Answering
    • LLaMA Rider: Spurring Large Language Models to Explore the Open World
    • LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks

RAG

  • 经典论文
    • WebGPT:Browser-assisted question-answering with human feedback
    • WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
    • WebCPM: Interactive Web Search for Chinese Long-form Question Answering ⭐
    • REPLUG: Retrieval-Augmented Black-Box Language Models ⭐
    • RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit
    • Atlas: Few-shot Learning with Retrieval Augmented Language Models
    • RRAML: Reinforced Retrieval Augmented Machine Learning
    • FRESHLLMS: REFRESHING LARGE LANGUAGE MODELS WITH SEARCH ENGINE AUGMENTATION
  • 微调
    • RLCF:Aligning the Capabilities of Large Language Models with the Context of Information Retrieval via Contrastive Feedback
    • RA-DIT: RETRIEVAL-AUGMENTED DUAL INSTRUCTION TUNING
    • CHAIN-OF-NOTE: ENHANCING ROBUSTNESS IN RETRIEVAL-AUGMENTED LANGUAGE MODELS
    • RAFT: Adapting Language Model to Domain Specific RAG
    • Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence
  • 其他论文
    • Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation
    • PDFTriage: Question Answering over Long, Structured Documents
    • Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading ⭐
    • Active Retrieval Augmented Generation
    • kNN-LM Does Not Improve Open-ended Text Generation
    • Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model
    • DORIS-MAE: Scientific Document Retrieval using Multi-level Aspect-based Queries
    • Factuality Enhanced Language Models for Open-Ended Text Generation
    • KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
    • Complex Claim Verification with Evidence Retrieved in the Wild
    • Retrieval-Augmented Generation for Large Language Models: A Survey
    • ChatQA: Building GPT-4 Level Conversational QA Models
    • RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
    • Benchmarking Large Language Models in Retrieval-Augmented Generation
    • T-RAG: Lessons from the LLM Trenches
    • ARAGOG: Advanced RAG Output Grading
    • ActiveRAG: Revealing the Treasures of Knowledge via Active Learning
    • OpenResearcher: Unleashing AI for Accelerated Scientific Research
    • Contextual.ai-RAG2.0
    • Mindful-RAG: A Study of Points of Failure in Retrieval Augmented Generation
    • Memory3 : Language Modeling with Explicit Memory
  • 优化检索
    • IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions
    • HyDE:Precise Zero-Shot Dense Retrieval without Relevance Labels
    • PROMPTAGATOR : FEW-SHOT DENSE RETRIEVAL FROM 8 EXAMPLES
    • Query Rewriting for Retrieval-Augmented Large Language Models
    • Query2doc: Query Expansion with Large Language Models ⭐
    • Query Expansion by Prompting Large Language Models ⭐
    • Anthropic Contextual Retrieval
    • Multi-Level Querying using A Knowledge Pyramid
  • Ranking
    • A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
    • RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models
    • Improving Passage Retrieval with Zero-Shot Question Generation
    • Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
    • RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
    • Ranking Manipulation for Conversational Search Engines
    • Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents
    • Opensource Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
    • T2Ranking: A large-scale Chinese Benchmark for Passage Ranking
    • Learning to Filter Context for Retrieval-Augmented Generation
  • 传统搜索方案
    • ASK THE RIGHT QUESTIONS:ACTIVE QUESTION REFORMULATION WITH REINFORCEMENT LEARNING
    • Query Expansion Techniques for Information Retrieval a Survey
    • Learning to Rewrite Queries
    • Managing Diversity in Airbnb Search
  • 新向量模型用于Recall和Ranking
    • Augmented Embeddings for Custom Retrievals
    • BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
    • 网易为RAG设计的BCE Embedding技术报告
    • BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models
    • D2LLM: Decomposed and Distilled Large Language Models for Semantic Search
    • Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training
  • 优化推理结果
    • Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
  • 动态RAG(When to Search & Search Plan)
    • SELF-RAG: LEARNING TO RETRIEVE, GENERATE, AND CRITIQUE THROUGH SELF-REFLECTION ⭐
    • Self-Knowledge Guided Retrieval Augmentation for Large Language Models
    • Self-DC: When to retrieve and When to generate Self Divide-and-Conquer for Compositional Unknown Questions
    • Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs
    • Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
    • REAPER: Reasoning based Retrieval Planning for Complex RAG Systems
    • When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
    • PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers
    • ONEGEN: EFFICIENT ONE-PASS UNIFIED GENERATION AND RETRIEVAL FOR LLMS
  • Graph RAG
    • GRAPH Retrieval-Augmented Generation: A Survey
    • From Local to Global: A Graph RAG Approach to Query-Focused Summarization
    • GRAG: Graph Retrieval-Augmented Generation
    • GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning
    • THINK-ON-GRAPH: DEEP AND RESPONSIBLE REASONING OF LARGE LANGUAGE MODEL ON KNOWLEDGE GRAPH
    • LightRAG: Simple and Fast Retrieval-Augmented Generation
    • THINK-ON-GRAPH: DEEP AND RESPONSIBLE REASON- ING OF LARGE LANGUAGE MODEL ON KNOWLEDGE GRAPH
    • StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
  • Multistep RAG
    • SYNERGISTIC INTERPLAY BETWEEN SEARCH AND LARGE LANGUAGE MODELS FOR INFORMATION RETRIEVAL
    • Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
    • Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
    • RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation
    • IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
    • Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
    • Search-in-the-Chain: Towards Accurate, Credible and Traceable Large Language Models for Knowledge-intensive Tasks
    • MindSearch 思·索: Mimicking Human Minds Elicits Deep AI Searcher

大模型图表理解和生成

  • survey
    • Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study
    • Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding - A Survey
    • Exploring the Numerical Reasoning Capabilities of Language Models: A Comprehensive Analysis on Tabular Data
  • prompt
    • Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning
    • Tab-CoT: Zero-shot Tabular Chain of Thought
    • Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
  • fintuning
    • TableLlama: Towards Open Large Generalist Models for Tables
    • TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
  • multimodal
    • MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
    • ChartLlama: A Multimodal LLM for Chart Understanding and Generation
    • ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
    • ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning
    • ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
    • MATCHA : Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
    • UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
    • TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
    • Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs
    • TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
    • TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

LLM+KG

  • 综述类
  • KG用于大模型推理
    • Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs
    • MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models
    • Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering
    • Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models
    • BRING YOUR OWN KG: Self-Supervised Program Synthesis for Zero-Shot KGQA
    • StructGPT: A General Framework for Large Language Model to Reason over Structured Data
  • 大模型用于KG构建
    • Enhancing Knowledge Graph Construction Using Large Language Models
    • LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT
    • ITERATIVE ZERO-SHOT LLM PROMPTING FOR KNOWLEDGE GRAPH CONSTRUCTION
    • Exploring Large Language Models for Knowledge Graph Completion

Humanoid Agents

  • HABITAT 3.0: A CO-HABITAT FOR HUMANS, AVATARS AND ROBOTS
  • Humanoid Agents: Platform for Simulating Human-like Generative Agents
  • Voyager: An Open-Ended Embodied Agent with Large Language Models
  • Shaping the future of advanced robotics
  • AUTORT: EMBODIED FOUNDATION MODELS FOR LARGE SCALE ORCHESTRATION OF ROBOTIC AGENTS
  • ROBOTIC TASK GENERALIZATION VIA HINDSIGHT TRAJECTORY SKETCHES
  • ALFWORLD: ALIGNING TEXT AND EMBODIED ENVIRONMENTS FOR INTERACTIVE LEARNING
  • MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
  • LEGENT: Open Platform for Embodied Agents

pretrain_data & pretrain

  • DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
  • The Pile: An 800GB Dataset of Diverse Text for Language Modeling
  • CCNet: Extracting High Quality Monolingual Datasets fromWeb Crawl Data
  • WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models
  • CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model
  • In-Context Pretraining: Language Modeling Beyond Document Boundaries
  • Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
  • Zyda: A 1.3T Dataset for Open Language Modeling
  • Entropy Law: The Story Behind Data Compression and LLM Performance
  • Data, Data Everywhere: A Guide for Pretraining Dataset Construction
  • Data curation via joint example selection further accelerates multimodal learning
  • IMPROVING PRETRAINING DATA USING PERPLEXITY CORRELATIONS
  • AI models collapse when trained on recursively generated data

领域模型SFT(domain_llms)

  • 金融
    • BloombergGPT: A Large Language Model for Finance
    • FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis
    • CFGPT: Chinese Financial Assistant with Large Language Model
    • CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model
    • InvestLM: A Large Language Model for Investment using Financial Domain Instruction Tuning
    • BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark
    • PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance
    • The FinBen: An Holistic Financial Benchmark for Large Language Models
    • XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters
    • Towards Trustworthy Large Language Models in Industry Domains
    • When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments
    • A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges
  • 生物医疗
    • MedGPT: Medical Concept Prediction from Clinical Narratives
    • BioGPT:Generative Pre-trained Transformer for Biomedical Text Generation and Mining
    • PubMed GPT: A Domain-specific large language model for biomedical text ⭐
    • ChatDoctor:Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge
    • Med-PaLM:Large Language Models Encode Clinical Knowledge[V1,V2] ⭐
    • SMILE: Single-turn to Multi-turn Inclusive Language Expansion via ChatGPT for Mental Health Support
    • Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue
  • 其他
    • Galactia:A Large Language Model for Science
    • Augmented Large Language Models with Parametric Knowledge Guiding
    • ChatLaw Open-Source Legal Large Language Model ⭐
    • MediaGPT : A Large Language Model For Chinese Media
    • KITLM: Domain-Specific Knowledge InTegration into Language Models for Question Answering
    • EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerce
    • TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
    • LLEMMA: AN OPEN LANGUAGE MODEL FOR MATHEMATICS
    • MEDITAB: SCALING MEDICAL TABULAR DATA PREDICTORS VIA DATA CONSOLIDATION, ENRICHMENT, AND REFINEMENT
    • PLLaMa: An Open-source Large Language Model for Plant Science
    • ADAPTING LARGE LANGUAGE MODELS VIA READING COMPREHENSION

LLM超长文本处理 (long_input)

  • 位置编码、注意力机制优化
  • 上文压缩排序方案
    • Lost in the Middle: How Language Models Use Long Contexts ⭐
    • LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
    • LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression ⭐
    • Learning to Compress Prompts with Gist Tokens
    • Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering
    • LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
    • PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models
    • Are Long-LLMs A Necessity For Long-Context Tasks?
  • 训练和模型架构方案
    • Never Train from Scratch: FAIR COMPARISON OF LONGSEQUENCE MODELS REQUIRES DATA-DRIVEN PRIORS
    • Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
    • Never Lost in the Middle: Improving Large Language Models via Attention Strengthening Question Answering
    • Focused Transformer: Contrastive Training for Context Scaling
    • Effective Long-Context Scaling of Foundation Models
    • ON THE LONG RANGE ABILITIES OF TRANSFORMERS
    • Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
    • POSE: EFFICIENT CONTEXT WINDOW EXTENSION OF LLMS VIA POSITIONAL SKIP-WISE TRAINING
    • LONGLORA: EFFICIENT FINE-TUNING OF LONGCONTEXT LARGE LANGUAGE MODELS
    • LongAlign: A Recipe for Long Context Alignment of Large Language Models
    • Data Engineering for Scaling Language Models to 128K Context
    • MEGALODON: Efficient LLM Pretraining and Inference with Unlimited Context Length
    • Make Your LLM Fully Utilize the Context
    • Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models
  • 效率优化
    • Efficient Attention: Attention with Linear Complexities
    • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
    • HyperAttention: Long-context Attention in Near-Linear Time
    • FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
    • With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation

LLM长文本生成(long_output)

  • Re3 : Generating Longer Stories With Recursive Reprompting and Revision
  • RECURRENTGPT: Interactive Generation of (Arbitrarily) Long Text
  • DOC: Improving Long Story Coherence With Detailed Outline Control
  • Weaver: Foundation Models for Creative Writing
  • Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

NL2SQL

  • 大模型方案
    • DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction ⭐
    • C3: Zero-shot Text-to-SQL with ChatGPT ⭐
    • SQL-PALM: IMPROVED LARGE LANGUAGE MODEL ADAPTATION FOR TEXT-TO-SQL
    • BIRD Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQL ⭐
    • A Case-Based Reasoning Framework for Adaptive Prompting in Cross-Domain Text-to-SQL
    • ChatDB: AUGMENTING LLMS WITH DATABASES AS THEIR SYMBOLIC MEMORY
    • A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability
    • Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning
    • Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios
  • Domain Knowledge Intensive
    • Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic Knowledge
    • Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion
    • Towards Robustness of Text-to-SQL Models against Synonym Substitution
    • FinQA: A Dataset of Numerical Reasoning over Financial Data
  • others
    • RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL
    • MIGA: A Unified Multi-task Generation Framework for Conversational Text-to-SQL

Code Generation

  • Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
  • Codeforces as an Educational Platform for Learning Programming in Digitalization
  • Competition-Level Code Generation with AlphaCode
  • CODECHAIN: TOWARDS MODULAR CODE GENERATION THROUGH CHAIN OF SELF-REVISIONS WITH REPRESENTATIVE SUB-MODULES
  • AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation

降低模型幻觉 (reliability)

  • Survey
    • Large language models and the perils of their hallucinations
    • Survey of Hallucination in Natural Language Generation
    • Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
    • A Survey of Hallucination in Large Foundation Models
    • A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
    • Calibrated Language Models Must Hallucinate
    • Why Does ChatGPT Fall Short in Providing Truthful Answers?
  • Prompt or Tunning
    • R-Tuning: Teaching Large Language Models to Refuse Unknown Questions
    • PROMPTING GPT-3 TO BE RELIABLE
    • ASK ME ANYTHING: A SIMPLE STRATEGY FOR PROMPTING LANGUAGE MODELS ⭐
    • On the Advance of Making Language Models Better Reasoners
    • RefGPT: Reference → Truthful & Customized Dialogues Generation by GPTs and for GPTs
    • Rethinking with Retrieval: Faithful Large Language Model Inference
    • GENERATE RATHER THAN RETRIEVE: LARGE LANGUAGE MODELS ARE STRONG CONTEXT GENERATORS
    • Large Language Models Struggle to Learn Long-Tail Knowledge
  • Decoding Strategy
    • Trusting Your Evidence: Hallucinate Less with Context-aware Decoding ⭐
    • SELF-REFINE:ITERATIVE REFINEMENT WITH SELF-FEEDBACK ⭐
    • Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference
    • Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
    • Enabling Large Language Models to Generate Text with Citations
    • Factuality Enhanced Language Models for Open-Ended Text Generation
    • KL-Divergence Guided Temperature Sampling
    • KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection
    • CONTRASTIVE DECODING IMPROVES REASONING IN LARGE LANGUAGE MODEL
    • Contrastive Decoding: Open-ended Text Generation as Optimization
  • Probing and Detection
    • Automatic Evaluation of Attribution by Large Language Models
    • QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization
    • Zero-Resource Hallucination Prevention for Large Language Models
    • LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples
    • Language Models (Mostly) Know What They Know ⭐
    • LM vs LM: Detecting Factual Errors via Cross Examination
    • Do Language Models Know When They’re Hallucinating References?
    • SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
    • SELF-CONTRADICTORY HALLUCINATIONS OF LLMS: EVALUATION, DETECTION AND MITIGATION
    • Self-consistency for open-ended generations
    • Improving Factuality and Reasoning in Language Models through Multiagent Debate
    • Selective-LAMA: Selective Prediction for Confidence-Aware Evaluation of Language Models
    • Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
  • Reviewing and Calibration
    • Truth-o-meter: Collaborating with llm in fighting its hallucinations
    • RARR: Researching and Revising What Language Models Say, Using Language Models
    • CRITIC: LARGE LANGUAGE MODELS CAN SELFCORRECT WITH TOOL-INTERACTIVE CRITIQUING
    • VALIDATING LARGE LANGUAGE MODELS WITH RELM
    • PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions
    • Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback
    • Adaptive Chameleon or Stubborn Sloth: Unraveling the Behavior of Large Language Models in Knowledge Clashes
    • Woodpecker: Hallucination Correction for Multimodal Large Language Models
    • Zero-shot Faithful Factual Error Correction

大模型评估(evaluation)

  • 事实性评估
    • TRUSTWORTHY LLMS: A SURVEY AND GUIDELINE FOR EVALUATING LARGE LANGUAGE MODELS’ ALIGNMENT
    • TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
    • TRUE: Re-evaluating Factual Consistency Evaluation
    • FACTSCORE: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
    • KoLA: Carefully Benchmarking World Knowledge of Large Language Models
    • When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories
    • FACTOOL: Factuality Detection in Generative AI A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
    • LONG-FORM FACTUALITY IN LARGE LANGUAGE MODELS
  • 检测任务
    • Detecting Pretraining Data from Large Language Models
    • Scalable Extraction of Training Data from (Production) Language Models
    • Rethinking Benchmark and Contamination for Language Models with Rephrased Samples

推理优化(inference)

  • Fast Transformer Decoding: One Write-Head is All You Need
  • Fast Inference from Transformers via Speculative Decoding
  • GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
  • Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
  • SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference
  • BatchPrompt: Accomplish more with less
  • You Only Cache Once: Decoder-Decoder Architectures for Language Models

模型知识编辑黑科技(model_edit)

  • ROME:Locating and Editing Factual Associations in GPT
  • Transformer Feed-Forward Layers Are Key-Value Memories
  • MEMIT: Mass-Editing Memory in a Transformer
  • MEND:Fast Model Editing at Scale
  • Editing Large Language Models: Problems, Methods, and Opportunities
  • Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
  • Automata-based constraints for language model decoding
  • SGLang: Efficient Execution of Structured Language Model Programs
  • PROMPT CACHE: MODULAR ATTENTION REUSE FOR LOW-LATENCY INFERENCE

模型合并和剪枝(model_merge)

  • Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
  • DARE Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
  • EDITING MODELS WITH TASK ARITHMETIC
  • TIES-Merging: Resolving Interference When Merging Models
  • LM-Cocktail: Resilient Tuning of Language Models via Model Merging
  • SLICEGPT: COMPRESS LARGE LANGUAGE MODELS BY DELETING ROWS AND COLUMNS
  • Checkpoint Merging via Bayesian Optimization in LLM Pretrainin
  • Arcee's MergeKit: A Toolkit for Merging Large Language Models

MOE

  • Tricks for Training Sparse Translation Models
  • ST-MoE: Designing Stable and Transferable Sparse Expert Models
  • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
  • GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
  • GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
  • OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER
  • DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
  • Dense-to-Sparse Gate for Mixture-of-Experts
  • Efficient Large Scale Language Modeling with Mixtures of Experts

Other Prompt Engineer(prompt_engineer)

  • Calibrate Before Use: Improving Few-Shot Performance of Language Models
  • In-Context Instruction Learning
  • LEARNING PERFORMANCE-IMPROVING CODE EDITS
  • Boosting Theory-of-Mind Performance in Large Language Models via Prompting
  • Generated Knowledge Prompting for Commonsense Reasoning
  • RECITATION-AUGMENTED LANGUAGE MODELS
  • kNN PROMPTING: BEYOND-CONTEXT LEARNING WITH CALIBRATION-FREE NEAREST NEIGHBOR INFERENCE
  • EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
  • Causality-aware Concept Extraction based on Knowledge-guided Prompting
  • LARGE LANGUAGE MODELS AS OPTIMIZERS
  • Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization
  • Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
  • RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions
  • MedPrompt: Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
  • DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
  • Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels
  • In-Context Learning for Extreme Multi-Label Classification
  • Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
  • DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
  • CONNECTING LARGE LANGUAGE MODELS WITH EVOLUTIONARY ALGORITHMS YIELDS POWERFUL PROMP OPTIMIZERS
  • TextGrad: Automatic "Differentiation" via Text
  • Task Facet Learning: A Structured Approach to Prompt Optimization
  • LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language
  • PAS: Data-Efficient Plug-and-Play Prompt Augmentation System
  • Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
  • From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice

Multimodal

  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
  • Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
  • LLava Visual Instruction Tuning
  • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
  • BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
  • mPLUG-Owl : Modularization Empowers Large Language Models with Multimodality
  • LVLM eHub: A Comprehensive Evaluation Benchmark for Large VisionLanguage Models
  • Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
  • PaLM-E: An Embodied Multimodal Language Model
  • TabLLM: Few-shot Classification of Tabular Data with Large Language Models
  • AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
  • Sora tech report
  • Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study
  • OCR
    • Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
    • Large OCR Model:An Empirical Study of Scaling Law for OCR
    • ON THE HIDDEN MYSTERY OF OCR IN LARGE MULTIMODAL MODELS
  • PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers
  • Many-Shot In-Context Learning in Multimodal Foundation Models
  • Adding Conditional Control to Text-to-Image Diffusion Models
  • Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Timeseries LLM

  • TimeGPT-1
  • Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
  • TIME-LLM: TIME SERIES FORECASTING BY REPROGRAMMING LARGE LANGUAGE MODELS
  • Large Language Models Are Zero-Shot Time Series Forecasters
  • TEMPO: PROMPT-BASED GENERATIVE PRE-TRAINED TRANSFORMER FOR TIME SERIES FORECASTING
  • Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing
  • Lag-Llama: Towards Foundation Models for Time Series Forecasting
  • PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting

Quanization

  • AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
  • LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
  • LLM.int8() 8-bit Matrix Multiplication for Transformers at Scale
  • SmoothQuant Accurate and Efficient Post-Training Quantization for Large Language Models

Adversarial Attacking

  • Curiosity-driven Red-teaming for Large Language Models
  • Red Teaming Language Models with Language Models
  • EXPLORE, ESTABLISH, EXPLOIT: RED-TEAMING LANGUAGE MODELS FROM SCRATCH

Others

  • Pretraining on the Test Set Is All You Need 哈哈作者你是懂讽刺文学的
  • Learnware: Small Models Do Big
  • The economic potential of generative AI
  • A PhD Student’s Perspective on Research in NLP in the Era of Very Large Language Models