Agentic Software Issue Resolution with Large Language Models: A Survey

📰 News

📅 2026.01: Paper update! We add 34 papers from 2025.10 to 2025.12. Now, we have included 160 papers in this survey.
📅 2025.12: We release the first survey on agentic software issue resolution!
📅 2025.10: We summarize 126 papers about issue resolution, from 2023.10 to 2025.10!

Introduction

We classified this survey into three main parts: Benchmarks, Technologies and Empirical Studies.

Up to 2026-01-06, automated issue solving technologies can be mainly surveyed from 2 perspectives: Scaffold Design and Learning Strategy.

Benchmarks

For Benchmarks, we summarized the existing benchmarks into 2 categories for their different tasks.

@End-To-End
@Reproduction Test Generation
@Localization

Literature	Name	Scope	Journal/Conference	Time	Link
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?	SWE-bench	End-To-End	ICLR'24	2023-10	Paper Code
SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents	SWT-Bench	Reproduction Test Generation	NeurIPS'24	2024-06	Paper Code
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java	Muti-SWE-bench	End-To-End	ARXIV	2024-08	Paper Code
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?	SWE-bench Mutimodal	End-To-End	ICLR'25	2024-10	Paper Code
SWE-Bench+: Enhanced Coding Benchmark for LLMs	SWE-Bench+	End-To-End	ARXIV	2024-10	Paper
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark	TestGenEval	Reproduction Test Generation	ICLR'25	2024-10	Paper Code
A Real-World Benchmark for Evaluating Fine-Grained Issue Solving Capabilities of Large Language Models	FAUN-Eval	End-To-End	ARXIV	2024-11	Paper
TDD-Bench Verified: Can LLMs Generate Tests for Issues Before They Get Resolved?	TDD-Bench	Reproduction Test Generation	ARXIV	2024-11	Paper Code
CodeV: Issue Resolving with Visual Data	Visual SWE-bench	End-To-End	ACL Findings'25	2024-12	Paper Code
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving	Muti-SWE-bench	End-To-End	ARXIV	2025-04	Paper Code
LiveSWEBench	LiveSWEBench	End-To-End	BLOG	2025-04	link Code
LocAgent: Graph-Guided LLM Agents for Code Localization	LocBench	Localization	ARXIV	2025-03	Paper Code
Automated Benchmark Generation for Repository-Level Coding Tasks	SWEE-Bench/SWA-Bench	End-To-End	ARXIV	2025-03	Paper
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation	FEA-Bench	End-To-End	ACL'25	2025-03	Paper Code
OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution	OmniGIRL	End-To-End	ISSTA'25	2025-05	Paper Code
-	SWE-bench Multilingual	End-To-End	BLOG	2025-05	link Code
SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents	SWE-PolyBench	End-To-End	ARXIV	2025-04	Paper Code
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents	SWE-rebench	End-To-End	ARXIV	2025-05	Paper Code
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents	GSO	End-To-End	NeurIPS'25	2025-05	Paper Code
SWE-bench Goes Live!	SWE-bench-Live	End-To-End	ARXIV	2025-05	Paper Code
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench	UTBoost	-	ARXIV	2025-06	Paper Code
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks	SWE-Factory	End-To-End	ARXIV	2025-06	Paper Code
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving	Swing-Arena	End-To-End	ARXIV	2025-06	Paper Code
SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation	SPICE	-	ASE'25	2025-07	Paper
SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks	SWE-MERA	End-To-End	ARXIV	2025-07	Paper
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?	SWE-Perf	End-To-End	ARXIV	2025-07	Paper Code
NoCode-bench: A Benchmark for Evaluating Natural Language-Driven Feature Addition	NoCode-bench	End-To-End	ARXIV	2025-08	Paper Code
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?	SWE-Bench Pro	End-To-End	ARXIV	2025-09	Paper Code
SWE-QA: Can Language Models Answer Repository-level Code Questions?	SWE-QA-Bench	QA	ARXIV	2025-09	Paper Code
A Benchmark for Localizing Code and Non-Code Issues in Software Projects	MULocBench	Localization	ARXIV	2025-10	Paper Code
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models	SWE-Compass	End-To-End	ARXIV	2025-11	Paper Code
SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?	SWE-fficiency	End-To-End	ARXIV	2025-11	Paper Code
SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories	SWE-Bench++	End-To-End	ARXIV	2025-12	Paper Code
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios	SWE-EVO	End-To-End	ARXIV	2025-12	Paper Code

Technologies

Scaffold Design

From the perspective of Design Paradigms, we can classify them into 2 categories following benchmarks:

@End-To-End
@Single-Phased

End-to-End

For End-To-End Method, we can further classify them into 2 categories:

@Agent-Based Method
@Pipeline-Based Method

Literature	Name	Journal/Conference	Time	Label	URL
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?	BM25 RAG	ICLR 2024	2023-10	@Pipeline	Paper Code
SWE-agent: Agent-computer interfaces enable automated software engineering	SWE-Agent	NeurIPS 2024	2024-05	@Agent	Paper Code
Autocoderover: Autonomous program improvement	AutoCodeRover	ISSTA 2024	2024-04	@Agent	Paper Code
CodeR: Issue Resolving with Multi-Agent and Task Graphs	CodeR	Arxiv	2024-06	@Agent	Paper Code
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration	LingmaAgent/RepoUnderstander	FSE Companion 2025	2024-06	@Agent	Paper Code
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution	MAGIS	NeurIPS 2024	2024-03	@Agent	Paper Code
MASAI: Modular Architecture for Software-engineering AI Agents	MASAI	Arxiv	2024-06	@Agent	Paper
OpenDevin: An Open Platform forAI Software Developers as Generalist Agents	OpenDevin(AllHands)	Arxiv	2024-06	@Agent	Paper
Agentless: Demystifying llm-based software engineering agents	Agentless	FSE 2025	2024-07	@Pipeline	Paper Code
OpenHands: An Open Platform for AI Software Developers as Generalist Agents	OpenHands	ICLR 2025	2024-07	@Agent	Paper Code
Specrover: Code intent extraction via llms	SpecRover (AutoCodeRover-v2)	ICSE 2025	2024-08	@Agent	Paper Code
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases	CodexGraph	Arxiv	2024-08	@Agent	Paper Code
SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer	SuperCoder	Arxiv	2024-09	@Agent	Paper
Hyperagent: Generalist software engineering agents to solve coding tasks at scale	HyperAgent	Arxiv	2024-09	@Agent	Paper
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph	RepoGraph	ICLR 2025	2024-10	@Pipeline	Paper Code
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement	SWE-Search	ICLR 2025	2024-10	@Agent	Paper Code
OpenHands: An Open Platform for AI Software Developers as Generalist Agents	OpenHands CodeAct	ICLR 2025	2024-10	@Agent	Paper Code
-	Composio SWE-Kit	Blog	2024-10	@Pipeline	Link Code
Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage	Infant Agent	Arxiv	2024-11	@Agent	Paper
MarsCode Agent: AI-native Automated Bug Fixing	MarsCode Agent	Arxiv	2024-11	@Agent	Paper
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement	SWESynInfer	FSE 2025 Industry	2024-11	@Pipeline	Paper Code
-	Nebius AI	Blog	2024-11	@Agent	Paper
CodeV: Issue Resolving with Visual Data	CodeV	Arxiv	2024-12	@Pipeline	Paper Code
-	Aide	Blog	2024-12	@Agent	Link
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments	Learn-By-Interact	Arxiv	2025-01	@Agent	Paper
PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework	PatchPilot	ICML 2025	2025-02	@Pipeline	Paper Code
CodeMonkeys: Scaling Test-Time Compute for Software Engineering	CodeMonkeys	Arxiv	2025-02	@Pipeline	Paper Code
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution	Agentless Mini	ARXIV	2025-02	@Pipeline	Paper Code
-	Agentless Lite	Blog	2025-02	@Pipeline	Code
-	Syntheo	Blog	2025-02	@Agent	Link
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution	SWE-Fixer	ACL Findings 2025	2025-02	@Pipeline	Paper
-	AgentScope	Blog	2025-03	@Agent	Link
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal	DARS	Arxiv	2025-03	@Agent	Paper Code
Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs	KGCompass	Arxiv	2025-03	@Pipeline	Paper
-	Augment Agent v0	Blog	2025-03	@Agent	Link Code
-	CORTEXA	Blog	2025-03	@Pipeline	Link
-	Refact.ai	Blog	2025-03	@Agent	Link Code
-	Lingxi	Blog	2025-04	@Agent	Link Code
-	Trae IDE	Blog	2025-05	@Agent	Link
-	devlo	Blog	2025-05	@Agent	Link
Putting It All into Context: Simplifying Agents with LCLMs	LCLM	Arxiv	2025-05	@Pipeline	Paper
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks	CGM-SWE-PY	NeurIPS'25	2025-05	@Pipeline	Paper
InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction	InfantAgent-Next	Arxiv	2025-05	@Agent	Paper Code
Coding Agents with Multimodal Browsing are Generalist Problem Solvers	OpenHands-Versa	Arxiv	2025-06	@Agent	Paper Code
EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair	EXPEREPAIR	Arxiv	2025-06	@Agent	Paper
Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing	GUIRepair	ASE'25	2025-06	@Pipeline	Paper
SemAgent: A Semantics Aware Program Repair Agent	SemAgent	Arxiv	2025-06	@Pipeline	Paper
Nemotron-Cortexa: Enhancing LLM Agents for Software Engineering Tasks via Improved Localization and Solution Diversity	Nemotron-Cortexa	ICML'25	2025-06	@Pipeline	Paper Code
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving	Agent KB	Arxiv	2025-07	@Agent	Paper Code
Prometheus: Unified Knowledge Graphs for Issue Resolution in Multilingual Codebases	Prometheus	Arxiv	2025-07	@Agent	Paper Code
SWE-Exp: Experience-Driven Software Issue Resolution	SWE-Exp	Arxiv	2025-07	@Agent	Paper Code
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution	SWE-Debate	Arxiv	2025-07	@Agent	Paper Code
Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling	Trae Agent	Arxiv	2025-07	@Agent	Paper Code
SynFix: Dependency-Aware Program Repair via RelationGraph Analysis	SynFix	ACL Findings'25	2025-07	@Pipeline	Paper
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents	SE-Agent	NeurIPS'25	2025-08	@Agent	Paper Code
CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs	CoreThink	Arxiv	2025-09	@Agent	Paper
Improving the Efficiency of LLM Agent Systems through Trajectory Reduction	AgentDiet	FSE'26	2025-09	@Agent	Paper
Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs	Lita	Arxiv	2025-10	@Agent	Paper
Lingxi: Repository-Level Issue Resolution Framework Enhanced by Procedural Knowledge Guided Scaling	Lingxi	Arxiv	2025-10	@Agent	Paper Code
SIADAFIX: issue description response for adaptive program repair	SIADAFIX	Arxiv	2025-10	@Agent	Paper Code
TOM-SWE: User Mental Modeling For Software Engineering Agents	TOM-SWE	Arxiv	2025-10	@Agent	Paper Code
TDFlow: Agentic Workflows for Test Driven Software Engineering	TDFlow	Arxiv	2025-10	@Pipeline	Paper
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?	Live-SWE-agent	Arxiv	2025-11	@Agent	Paper Code
InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution	InfCode	Arxiv	2025-11	@Agent	Paper
Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale	CCA	Arxiv	2025-12	@Agent	Paper

Single-Phased

For Single-Phased Method, we discuss them in 3 categories separately:

@Localization
@Reproduction @Regression

where, @Reproduction indicates the reproduction test generation, @Regression indicates the regression test selection.

Issue Localization

Literature	Name	Journal/Conference	Time	URL
BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning	BLAZE	Arxiv	2024-08	Paper Code
OrcaLoca: An LLM Agent Framework for Software Issue Localization	OrcaLoca	ICML 2025	2025-02	Paper Code
Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models	BugCerberus	Arxiv	2025-02	Paper
LocAgent: Graph-Guided LLM Agents for Code Localization	LocAgent	ACL 2025	2025-03	Paper Code
CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching	CoSIL	ASE 2025	2025-03	Paper Code
CORNSTACK: HIGH-QUALITY CONTRASTIVE DATA FOR BETTER CODE RETRIEVAL AND RERANKING	CoRNStack	ICLR 2025	2025-03	Paper Code
SweRank: Software Issue Localization with Code Ranking	SweRank	Arxiv	2025-05	Paper Code
CoRet: Improved Retriever for Code Editing	CoRet	Arxiv	2025-06	Paper
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization	SACL	Arxiv	2025-07	Paper
Meta-RAG on Large Codebases Using Code Summarization	Meta-RAG	Arxiv	2025-08	Paper
Tool-integrated Reinforcement Learning for Repo Deep Search	RepoSearcher	Arxiv	2025-08	Paper
Improving Code Localization with Repository Memory	RepoMem	Arxiv	2025-10	Paper
Hierarchical Reward Modeling for Fault Localization in Large Code Repositories	HiLoRM	EMNLP Findings 2026	2025-11	Paper Code
SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization	SweRank+	Arxiv	2025-12	Paper Code
One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents	RepoNavigator	Arxiv	2025-12	Paper
GraphLocator: Graph-guided Causal Reasoning for Issue Localization	GraphLocator	FSE 2026	2025-12	Paper

Issue Reproduction

Literature	Name	Journal/Conference	Time	URL
AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions	AEGIS	FSE 2025 Industry	2024-11	Paper
LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues	EvoCoder	ARXIV	2024-11	Paper
Agentic Bug Reproduction for Effective Automated Program Repair at Google	BRT Agent	Arxiv	2025-02	Paper
Otter: Generating Tests from Issues to Validate SWE Patches	Otter	ICML 2025	2025-02	Paper
Issue2Test: Generating Reproducing Test Cases from Issue Reports	Issue2Test	Arxiv	2025-03	Paper
AssertFlip: Reproducing Bugs via Inversion of LLM-Generated Passing Tests	AssertFlip	Arxiv	2025-07	Paper
Execution-Feedback Driven Test Generation from SWE Issues	Otter++	Arxiv	2025-08	Paper
Automated Generation of Issue-Reproducing Tests by Combining LLMs and Search-Based Testing	BLAST	Arxiv	2025-09	Paper Code

Regression Test Selection

Literature	Name	Journal/Conference	Time	URL
When Old Meets New: Evaluating the Impact of Regression Tests on SWE Issue Resolution	TestPrune	Arxiv	2025-10	Paper

Learning Strategy

From the perspective of Learning Strategy, we discuss them in 2 aspects:

@Data
@Training

Data

Literature	Name	Journal/Conference	Time	URL
R2E: Turning any GitHub Repository into a Programming Agent Environment	R2E	ICML 2024	2024-07	Paper Code
Training Software Engineering Agents and Verifiers with SWE-Gym	SWE-Gym	ICML 2025	2024-12	Paper Code
R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents	R2E-Gym	ARXIV	2024-04	Paper Code
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs	SWE-Synth	ARXIV	2024-04	Paper Code
SWE-smith: Scaling Data for Software Engineering Agents	SWE-smith	NeurIPS 2025	2024-04	Paper Code
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks	SWE-Factory	ARXIV	2025-06	Paper Code
SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling	SWE-Dev	ACL Findings 2025	2025-06	Paper Code
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development	SWE-Dev	ARXIV	2025-06	Paper Code
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs	Skywork-SWE	ARXIV	2025-06	Paper
SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories	SWE-Mirror	ARXIV	2025-09	Paper
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving	SWE-Lego	ARXIV	2026-01	Paper

Training

For Training-Based Method, we can further classify them into 2 categories:

@SFT-Based Method
@RL-Based Method

We only display @RL if the method use both SFT and RL techniques.

Literature	Name	Evaluation Method	Journal/Conference	Time	Label	URL
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement	Lingma SWE-GPT	SWESynInfer	FSE 2025 Industry	2024-11	@SFT	Paper
Repository Structure-Aware Training Makes SLMs Better Issue Resolver	ReSAT	Agentless	ARXIV	2024-12	@SFT	Paper
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution	SWE-Fixer	SWE-Fixer	ARXIV	2025-02	@SFT	Paper
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution	SWE-RL	Agentless Mini	ARXIV	2025-02	@RL	Paper Code
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning	SoRFT	Agentless	ACL 2025	2025-02	@RL	Paper
SEAlign: Alignment Training for Software Engineering Agent	SEAlign	OpenHands	ARXIV	2025-03	@SFT	Paper
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute	SWE-Reasoner	SWE-SynInfer+	ARXIV	2025-04	@RL	Paper Code
Co-PatcheR: Collaborative Software Patching with Component(s)-specific Small Reasoning Models	Co-PatcheR	PatchPilot	ARXIV	2025-05	@SFT	Paper Code
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering	EvoScale	Satori-SWE	ARXIV	2025-05	@RL	Paper Code
Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards	Agent-RLVR	Agentless	ARXIV	2025-06	@RL	Paper
MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution	MCTS-Refined	Agentless-1.0	ASE 2025	2025-06	@SFT	Paper
-	DeepSWE	-	Blog	2025-07	@RL	Link
-	SWE-Swiss	-	Blog	2025-08	@RL	Link Code
RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale	RepoForge	OpenHands	ARXIV	2025-08	@RL	Paper
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning	-	-	ARXIV	2025-08	@RL	Paper
Devstral: Fine-tuning Language Models for Coding Agent Applications	Devstral-Small	OpenHands	ARXIV	2025-08	@RL	Paper
When Agents go Astray: Course-Correcting SWE Agents with PRMs	SWE-PRM	-	ARXIV	2025-09	@RL	Paper
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents	Kimi-Dev	Kimi-Dev	ARXIV	2025-09	@RL	Paper
CWM: An Open-Weights LLM for Research on Code Generation with World Models	CWM	CWM	ARXIV	2025-09	@RL	Paper
Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization	EntroPO	R2E	ARXIV	2025-09	@SFT	Paper Code
KAT-Coder Technical Report	KAT-Coder	Claude Code	ARXIV	2025-09	@RL	Paper
BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills	BugPilot	R2E	ARXIV	2025-10	@RL	Paper
Think-Search-Patch: A Retrieval-Augmented Reasoning Framework for Repository-Level Code Repair	TSP	TSP	EMNLP 2025	2025-11	@SFT	Paper Code
Training Versatile Coding Agents in Synthetic Environments	SWE-Playground	OpenHands	ARXIV	2025-12	@SFT	Paper Code
Toward Training Superintelligent Software Agents through Self-Play SWE-RL	Self-Play SWE-RL	bash+editor	ARXIV	2025-12	@RL	Paper
Context as a Tool: Context Management for Long-Horizon SWE-Agents	CAT/SWE-Compressor	OpenHands	Arxiv	2025-12	@SFT	Paper
SWE-RM: Execution-free Feedback For Software Engineering Agents	SWE-RM	OpenHands	Arxiv	2025-12	@RL	Paper

Empirical Studies

Literature	Journal/Conference	Time	URL
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents	ICLR 2025	2024-08	Paper Code
Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios	SANER	2024-10	Paper
An Empirical Study on LLM-based Agents for Automated Bug Fixing	ARXIV	2024-11	Paper
Large Language Model Critics for Execution-Free Evaluation of Code Changes	ARXIV	2025-01	Paper
Interactive Agents to Overcome Ambiguity in Software Engineering	ARXIV	2025-02	Paper
Unveiling Pitfalls: Understanding Why AI-driven Code Agents Fail at GitHub Issue Resolution	ARXIV	2025-03	Paper
Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study	ICSE 2026	2025-03	Paper
SWE-Bench-CL: Continual Learning for Coding Agents	ARXIV	2025-06	Paper
The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason	ARXIV	2025-06	Paper
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems	ARXIV	2025-06	Paper
PAGENT: Learning to Patch Software Engineering Agents	ARXIV	2025-06	Paper
Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories	ARXIV	2025-06	Paper
Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench	ARXIV	2025-06	Paper
SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints	ARXIV	2025-09	Paper
An Empirical Study on Failures in Automated Issue Solving	ARXIV	2025-09	Paper
Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation	ARXIV	2025-10	Paper
More with Less: An Empirical Study of Turn-Control Strategies for Efficient Coding Agents	ICSE 2026	2025-10	Paper
Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories	ARXIV	2025-10	Paper
SABER: Small Actions, Big Errors -- Safeguarding Mutating Steps in LLM Agents	ARXIV	2025-11	Paper
Process-Centric Analysis of Agentic Software Systems	ARXIV	2025-12	Paper
SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs	ARXIV	2025-12	Paper
Does SWE-Bench-Verified Test Agent Ability or Model Memory?	ARXIV	2025-12	Paper

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Software Issue Resolution with Large Language Models: A Survey

📰 News

Introduction

Table of Contents

Benchmarks

Technologies

Scaffold Design

End-to-End

Single-Phased

Issue Localization

Issue Reproduction

Regression Test Selection

Learning Strategy

Data

Training

Empirical Studies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

License

ZhonghaoJiang/Awesome-Issue-Solving

Folders and files

Latest commit

History

Repository files navigation

Agentic Software Issue Resolution with Large Language Models: A Survey

📰 News

Introduction

Table of Contents

Benchmarks

Technologies

Scaffold Design

End-to-End

Single-Phased

Issue Localization

Issue Reproduction

Regression Test Selection

Learning Strategy

Data

Training

Empirical Studies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Packages