A comprehensive toolkit for analyzing Software Engineering (SWE) agents, focusing on performance evaluation, feature analysis, and benchmarking.
This project provides tools and utilities for:
- Analyzing agent performance on software engineering tasks
- Computing various metrics (code, dependency, error, instance, patch, type)
- Evaluating performance gaps between different agent implementations
- Processing and analyzing data from OpenHands and SWE-bench
-
analysis/
: Core analysis modulesfeatures/metrics/
: Various metric implementations for agent analysismodels/
: Data models for OpenHands and SWE-benchperformance_gap.py
: Performance gap analysis utilitiesusage.py
: Usage analysis tools
-
notebooks/
: Jupyter notebooks for analysis and visualizationcondenser_results.ipynb
: Analysis of condenser resultslocalization_metrics.ipynb
: Metrics for code localizationperformance_gap.ipynb
: Performance gap analysis
- Python ≥ 3.12
- Dependencies are managed through Poetry
- Ensure you have Poetry installed
- Clone this repository
- Run
poetry install
to install dependencies
The toolkit can be used either through its Python modules or via the provided Jupyter notebooks for interactive analysis.
This project is licensed under the MIT License - see the LICENSE file for details.