Skip to content

ZhonghaoJiang/Awesome-Issue-Solving

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 

Repository files navigation

Agentic Software Issue Resolution with Large Language Models: A Survey

đź“° News

  • đź“… 2026.01: Paper update! We add 34 papers from 2025.10 to 2025.12. Now, we have included 160 papers in this survey.
  • đź“… 2025.12: We release the first survey on agentic software issue resolution!
  • đź“… 2025.10: We summarize 126 papers about issue resolution, from 2023.10 to 2025.10!

Introduction

We classified this survey into three main parts: Benchmarks, Technologies and Empirical Studies.

Up to 2026-01-06, automated issue solving technologies can be mainly surveyed from 2 perspectives: Scaffold Design and Learning Strategy.

Table of Contents

Benchmarks

For Benchmarks, we summarized the existing benchmarks into 2 categories for their different tasks.

@End-To-End
@Reproduction Test Generation
@Localization

Literature Name Scope Journal/Conference Time Link
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? SWE-bench End-To-End ICLR'24 2023-10 Paper
Code
SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents SWT-Bench Reproduction Test Generation NeurIPS'24 2024-06 Paper
Code
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java Muti-SWE-bench End-To-End ARXIV 2024-08 Paper
Code
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? SWE-bench Mutimodal End-To-End ICLR'25 2024-10 Paper
Code
SWE-Bench+: Enhanced Coding Benchmark for LLMs SWE-Bench+ End-To-End ARXIV 2024-10 Paper
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark TestGenEval Reproduction Test Generation ICLR'25 2024-10 Paper
Code
A Real-World Benchmark for Evaluating Fine-Grained Issue Solving Capabilities of Large Language Models FAUN-Eval End-To-End ARXIV 2024-11 Paper
TDD-Bench Verified: Can LLMs Generate Tests for Issues Before They Get Resolved? TDD-Bench Reproduction Test Generation ARXIV 2024-11 Paper
Code
CodeV: Issue Resolving with Visual Data Visual SWE-bench End-To-End ACL Findings'25 2024-12 Paper
Code
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Muti-SWE-bench End-To-End ARXIV 2025-04 Paper
Code
LiveSWEBench LiveSWEBench End-To-End BLOG 2025-04 link
Code
LocAgent: Graph-Guided LLM Agents for Code Localization LocBench Localization ARXIV 2025-03 Paper
Code
Automated Benchmark Generation for Repository-Level Coding Tasks SWEE-Bench/SWA-Bench End-To-End ARXIV 2025-03 Paper
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation FEA-Bench End-To-End ACL'25 2025-03 Paper
Code
OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution OmniGIRL End-To-End ISSTA'25 2025-05 Paper
Code
- SWE-bench Multilingual End-To-End BLOG 2025-05 link
Code
SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents SWE-PolyBench End-To-End ARXIV 2025-04 Paper
Code
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents SWE-rebench End-To-End ARXIV 2025-05 Paper
Code
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents GSO End-To-End NeurIPS'25 2025-05 Paper
Code
SWE-bench Goes Live! SWE-bench-Live End-To-End ARXIV 2025-05 Paper
Code
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench UTBoost - ARXIV 2025-06 Paper
Code
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks SWE-Factory End-To-End ARXIV 2025-06 Paper
Code
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving Swing-Arena End-To-End ARXIV 2025-06 Paper
Code
SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation SPICE - ASE'25 2025-07 Paper
SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks SWE-MERA End-To-End ARXIV 2025-07 Paper
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? SWE-Perf End-To-End ARXIV 2025-07 Paper
Code
NoCode-bench: A Benchmark for Evaluating Natural Language-Driven Feature Addition NoCode-bench End-To-End ARXIV 2025-08 Paper
Code
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? SWE-Bench Pro End-To-End ARXIV 2025-09 Paper
Code
SWE-QA: Can Language Models Answer Repository-level Code Questions? SWE-QA-Bench QA ARXIV 2025-09 Paper
Code
A Benchmark for Localizing Code and Non-Code Issues in Software Projects MULocBench Localization ARXIV 2025-10 Paper
Code
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models SWE-Compass End-To-End ARXIV 2025-11 Paper
Code
SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads? SWE-fficiency End-To-End ARXIV 2025-11 Paper
Code
SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories SWE-Bench++ End-To-End ARXIV 2025-12 Paper
Code
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios SWE-EVO End-To-End ARXIV 2025-12 Paper
Code

Technologies

Scaffold Design

From the perspective of Design Paradigms, we can classify them into 2 categories following benchmarks:

@End-To-End
@Single-Phased

End-to-End

For End-To-End Method, we can further classify them into 2 categories:

@Agent-Based Method
@Pipeline-Based Method

Literature Name Journal/Conference Time Label URL
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? BM25 RAG ICLR 2024 2023-10 @Pipeline Paper
Code
SWE-agent: Agent-computer interfaces enable automated software engineering SWE-Agent NeurIPS 2024 2024-05 @Agent Paper
Code
Autocoderover: Autonomous program improvement AutoCodeRover ISSTA 2024 2024-04 @Agent Paper
Code
CodeR: Issue Resolving with Multi-Agent and Task Graphs CodeR Arxiv 2024-06 @Agent Paper
Code
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration LingmaAgent/RepoUnderstander FSE Companion 2025 2024-06 @Agent Paper
Code
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution MAGIS NeurIPS 2024 2024-03 @Agent Paper
Code
MASAI: Modular Architecture for Software-engineering AI Agents MASAI Arxiv 2024-06 @Agent Paper
OpenDevin: An Open Platform forAI Software Developers as Generalist Agents OpenDevin(AllHands) Arxiv 2024-06 @Agent Paper
Agentless: Demystifying llm-based software engineering agents Agentless FSE 2025 2024-07 @Pipeline Paper
Code
OpenHands: An Open Platform for AI Software Developers as Generalist Agents OpenHands ICLR 2025 2024-07 @Agent Paper
Code
Specrover: Code intent extraction via llms SpecRover (AutoCodeRover-v2) ICSE 2025 2024-08 @Agent Paper
Code
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases CodexGraph Arxiv 2024-08 @Agent Paper
Code
SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer SuperCoder Arxiv 2024-09 @Agent Paper
Hyperagent: Generalist software engineering agents to solve coding tasks at scale HyperAgent Arxiv 2024-09 @Agent Paper
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph RepoGraph ICLR 2025 2024-10 @Pipeline Paper
Code
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement SWE-Search ICLR 2025 2024-10 @Agent Paper
Code
OpenHands: An Open Platform for AI Software Developers as Generalist Agents OpenHands CodeAct ICLR 2025 2024-10 @Agent Paper
Code
- Composio SWE-Kit Blog 2024-10 @Pipeline Link
Code
Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage Infant Agent Arxiv 2024-11 @Agent Paper
MarsCode Agent: AI-native Automated Bug Fixing MarsCode Agent Arxiv 2024-11 @Agent Paper
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement SWESynInfer FSE 2025 Industry 2024-11 @Pipeline Paper
Code
- Nebius AI Blog 2024-11 @Agent Paper
CodeV: Issue Resolving with Visual Data CodeV Arxiv 2024-12 @Pipeline Paper
Code
- Aide Blog 2024-12 @Agent Link
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Learn-By-Interact Arxiv 2025-01 @Agent Paper
PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework PatchPilot ICML 2025 2025-02 @Pipeline Paper
Code
CodeMonkeys: Scaling Test-Time Compute for Software Engineering CodeMonkeys Arxiv 2025-02 @Pipeline Paper
Code
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Agentless Mini ARXIV 2025-02 @Pipeline Paper
Code
- Agentless Lite Blog 2025-02 @Pipeline Code
- Syntheo Blog 2025-02 @Agent Link
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution SWE-Fixer ACL Findings 2025 2025-02 @Pipeline Paper
- AgentScope Blog 2025-03 @Agent Link
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal DARS Arxiv 2025-03 @Agent Paper
Code
Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs KGCompass Arxiv 2025-03 @Pipeline Paper
- Augment Agent v0 Blog 2025-03 @Agent Link
Code
- CORTEXA Blog 2025-03 @Pipeline Link
- Refact.ai Blog 2025-03 @Agent Link
Code
- Lingxi Blog 2025-04 @Agent Link
Code
- Trae IDE Blog 2025-05 @Agent Link
- devlo Blog 2025-05 @Agent Link
Putting It All into Context: Simplifying Agents with LCLMs LCLM Arxiv 2025-05 @Pipeline Paper
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks CGM-SWE-PY NeurIPS'25 2025-05 @Pipeline Paper
InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction InfantAgent-Next Arxiv 2025-05 @Agent Paper
Code
Coding Agents with Multimodal Browsing are Generalist Problem Solvers OpenHands-Versa Arxiv 2025-06 @Agent Paper
Code
EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair EXPEREPAIR Arxiv 2025-06 @Agent Paper
Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing GUIRepair ASE'25 2025-06 @Pipeline Paper
SemAgent: A Semantics Aware Program Repair Agent SemAgent Arxiv 2025-06 @Pipeline Paper
Nemotron-Cortexa: Enhancing LLM Agents for Software Engineering Tasks via Improved Localization and Solution Diversity Nemotron-Cortexa ICML'25 2025-06 @Pipeline Paper
Code
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Agent KB Arxiv 2025-07 @Agent Paper
Code
Prometheus: Unified Knowledge Graphs for Issue Resolution in Multilingual Codebases Prometheus Arxiv 2025-07 @Agent Paper
Code
SWE-Exp: Experience-Driven Software Issue Resolution SWE-Exp Arxiv 2025-07 @Agent Paper
Code
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution SWE-Debate Arxiv 2025-07 @Agent Paper
Code
Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling Trae Agent Arxiv 2025-07 @Agent Paper
Code
SynFix: Dependency-Aware Program Repair via RelationGraph Analysis SynFix ACL Findings'25 2025-07 @Pipeline Paper
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents SE-Agent NeurIPS'25 2025-08 @Agent Paper
Code
CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs CoreThink Arxiv 2025-09 @Agent Paper
Improving the Efficiency of LLM Agent Systems through Trajectory Reduction AgentDiet FSE'26 2025-09 @Agent Paper
Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs Lita Arxiv 2025-10 @Agent Paper
Lingxi: Repository-Level Issue Resolution Framework Enhanced by Procedural Knowledge Guided Scaling Lingxi Arxiv 2025-10 @Agent Paper
Code
SIADAFIX: issue description response for adaptive program repair SIADAFIX Arxiv 2025-10 @Agent Paper
Code
TOM-SWE: User Mental Modeling For Software Engineering Agents TOM-SWE Arxiv 2025-10 @Agent Paper
Code
TDFlow: Agentic Workflows for Test Driven Software Engineering TDFlow Arxiv 2025-10 @Pipeline Paper
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly? Live-SWE-agent Arxiv 2025-11 @Agent Paper
Code
InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution InfCode Arxiv 2025-11 @Agent Paper
Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale CCA Arxiv 2025-12 @Agent Paper

Single-Phased

For Single-Phased Method, we discuss them in 3 categories separately:

@Localization
@Reproduction @Regression

where, @Reproduction indicates the reproduction test generation, @Regression indicates the regression test selection.

Issue Localization
Literature Name Journal/Conference Time URL
BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning BLAZE Arxiv 2024-08 Paper
Code
OrcaLoca: An LLM Agent Framework for Software Issue Localization OrcaLoca ICML 2025 2025-02 Paper
Code
Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models BugCerberus Arxiv 2025-02 Paper
LocAgent: Graph-Guided LLM Agents for Code Localization LocAgent ACL 2025 2025-03 Paper
Code
CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching CoSIL ASE 2025 2025-03 Paper
Code
CORNSTACK: HIGH-QUALITY CONTRASTIVE DATA FOR BETTER CODE RETRIEVAL AND RERANKING CoRNStack ICLR 2025 2025-03 Paper
Code
SweRank: Software Issue Localization with Code Ranking SweRank Arxiv 2025-05 Paper
Code
CoRet: Improved Retriever for Code Editing CoRet Arxiv 2025-06 Paper
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization SACL Arxiv 2025-07 Paper
Meta-RAG on Large Codebases Using Code Summarization Meta-RAG Arxiv 2025-08 Paper
Tool-integrated Reinforcement Learning for Repo Deep Search RepoSearcher Arxiv 2025-08 Paper
Improving Code Localization with Repository Memory RepoMem Arxiv 2025-10 Paper
Hierarchical Reward Modeling for Fault Localization in Large Code Repositories HiLoRM EMNLP Findings 2026 2025-11 Paper
Code
SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization SweRank+ Arxiv 2025-12 Paper
Code
One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents RepoNavigator Arxiv 2025-12 Paper
GraphLocator: Graph-guided Causal Reasoning for Issue Localization GraphLocator FSE 2026 2025-12 Paper
Issue Reproduction
Literature Name Journal/Conference Time URL
AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions AEGIS FSE 2025 Industry 2024-11 Paper
LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues EvoCoder ARXIV 2024-11 Paper
Agentic Bug Reproduction for Effective Automated Program Repair at Google BRT Agent Arxiv 2025-02 Paper
Otter: Generating Tests from Issues to Validate SWE Patches Otter ICML 2025 2025-02 Paper
Issue2Test: Generating Reproducing Test Cases from Issue Reports Issue2Test Arxiv 2025-03 Paper
AssertFlip: Reproducing Bugs via Inversion of LLM-Generated Passing Tests AssertFlip Arxiv 2025-07 Paper
Execution-Feedback Driven Test Generation from SWE Issues Otter++ Arxiv 2025-08 Paper
Automated Generation of Issue-Reproducing Tests by Combining LLMs and Search-Based Testing BLAST Arxiv 2025-09 Paper
Code
Regression Test Selection
Literature Name Journal/Conference Time URL
When Old Meets New: Evaluating the Impact of Regression Tests on SWE Issue Resolution TestPrune Arxiv 2025-10 Paper

Learning Strategy

From the perspective of Learning Strategy, we discuss them in 2 aspects:

@Data
@Training

Data

Literature Name Journal/Conference Time URL
R2E: Turning any GitHub Repository into a Programming Agent Environment R2E ICML 2024 2024-07 Paper
Code
Training Software Engineering Agents and Verifiers with SWE-Gym SWE-Gym ICML 2025 2024-12 Paper
Code
R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents R2E-Gym ARXIV 2024-04 Paper
Code
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs SWE-Synth ARXIV 2024-04 Paper
Code
SWE-smith: Scaling Data for Software Engineering Agents SWE-smith NeurIPS 2025 2024-04 Paper
Code
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks SWE-Factory ARXIV 2025-06 Paper
Code
SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling SWE-Dev ACL Findings 2025 2025-06 Paper
Code
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development SWE-Dev ARXIV 2025-06 Paper
Code
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs Skywork-SWE ARXIV 2025-06 Paper
SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories SWE-Mirror ARXIV 2025-09 Paper
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving SWE-Lego ARXIV 2026-01 Paper

Training

For Training-Based Method, we can further classify them into 2 categories:

@SFT-Based Method
@RL-Based Method

We only display @RL if the method use both SFT and RL techniques.

Literature Name Evaluation Method Journal/Conference Time Label URL
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement Lingma SWE-GPT SWESynInfer FSE 2025 Industry 2024-11 @SFT Paper
Repository Structure-Aware Training Makes SLMs Better Issue Resolver ReSAT Agentless ARXIV 2024-12 @SFT Paper
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution SWE-Fixer SWE-Fixer ARXIV 2025-02 @SFT Paper
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution SWE-RL Agentless Mini ARXIV 2025-02 @RL Paper
Code
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning SoRFT Agentless ACL 2025 2025-02 @RL Paper
SEAlign: Alignment Training for Software Engineering Agent SEAlign OpenHands ARXIV 2025-03 @SFT Paper
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute SWE-Reasoner SWE-SynInfer+ ARXIV 2025-04 @RL Paper
Code
Co-PatcheR: Collaborative Software Patching with Component(s)-specific Small Reasoning Models Co-PatcheR PatchPilot ARXIV 2025-05 @SFT Paper
Code
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering EvoScale Satori-SWE ARXIV 2025-05 @RL Paper
Code
Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards Agent-RLVR Agentless ARXIV 2025-06 @RL Paper
MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution MCTS-Refined Agentless-1.0 ASE 2025 2025-06 @SFT Paper
- DeepSWE - Blog 2025-07 @RL Link
- SWE-Swiss - Blog 2025-08 @RL Link
Code
RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale RepoForge OpenHands ARXIV 2025-08 @RL Paper
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning - - ARXIV 2025-08 @RL Paper
Devstral: Fine-tuning Language Models for Coding Agent Applications Devstral-Small OpenHands ARXIV 2025-08 @RL Paper
When Agents go Astray: Course-Correcting SWE Agents with PRMs SWE-PRM - ARXIV 2025-09 @RL Paper
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents Kimi-Dev Kimi-Dev ARXIV 2025-09 @RL Paper
CWM: An Open-Weights LLM for Research on Code Generation with World Models CWM CWM ARXIV 2025-09 @RL Paper
Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization EntroPO R2E ARXIV 2025-09 @SFT Paper
Code
KAT-Coder Technical Report KAT-Coder Claude Code ARXIV 2025-09 @RL Paper
BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills BugPilot R2E ARXIV 2025-10 @RL Paper
Think-Search-Patch: A Retrieval-Augmented Reasoning Framework for Repository-Level Code Repair TSP TSP EMNLP 2025 2025-11 @SFT Paper
Code
Training Versatile Coding Agents in Synthetic Environments SWE-Playground OpenHands ARXIV 2025-12 @SFT Paper
Code
Toward Training Superintelligent Software Agents through Self-Play SWE-RL Self-Play SWE-RL bash+editor ARXIV 2025-12 @RL Paper
Context as a Tool: Context Management for Long-Horizon SWE-Agents CAT/SWE-Compressor OpenHands Arxiv 2025-12 @SFT Paper
SWE-RM: Execution-free Feedback For Software Engineering Agents SWE-RM OpenHands Arxiv 2025-12 @RL Paper

Empirical Studies

Literature Journal/Conference Time URL
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents ICLR 2025 2024-08 Paper
Code
Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios SANER 2024-10 Paper
An Empirical Study on LLM-based Agents for Automated Bug Fixing ARXIV 2024-11 Paper
Large Language Model Critics for Execution-Free Evaluation of Code Changes ARXIV 2025-01 Paper
Interactive Agents to Overcome Ambiguity in Software Engineering ARXIV 2025-02 Paper
Unveiling Pitfalls: Understanding Why AI-driven Code Agents Fail at GitHub Issue Resolution ARXIV 2025-03 Paper
Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study ICSE 2026 2025-03 Paper
SWE-Bench-CL: Continual Learning for Coding Agents ARXIV 2025-06 Paper
The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason ARXIV 2025-06 Paper
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems ARXIV 2025-06 Paper
PAGENT: Learning to Patch Software Engineering Agents ARXIV 2025-06 Paper
Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories ARXIV 2025-06 Paper
Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench ARXIV 2025-06 Paper
SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints ARXIV 2025-09 Paper
An Empirical Study on Failures in Automated Issue Solving ARXIV 2025-09 Paper
Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation ARXIV 2025-10 Paper
More with Less: An Empirical Study of Turn-Control Strategies for Efficient Coding Agents ICSE 2026 2025-10 Paper
Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories ARXIV 2025-10 Paper
SABER: Small Actions, Big Errors -- Safeguarding Mutating Steps in LLM Agents ARXIV 2025-11 Paper
Process-Centric Analysis of Agentic Software Systems ARXIV 2025-12 Paper
SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs ARXIV 2025-12 Paper
Does SWE-Bench-Verified Test Agent Ability or Model Memory? ARXIV 2025-12 Paper

About

Agentic Software Issue Resolution with Large Language Models: A Survey

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •