-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
Hi maintainers,
Thanks for the great work on this project. I’d like to request adding evaluation support for SWE-bench_Multilingual.
Motivation
SWE-bench has become a widely used benchmark for evaluating code-fixing agents on real GitHub issues/PR-style tasks. In addition, SWE-bench_Multilingual extends this setting to multilingual repositories, which is increasingly important for evaluating real-world performance beyond English-only codebases.
Supporting these benchmarks in this repo would make it easier to:
run standardized evaluations and compare results across models/agents,
reproduce published numbers,
evaluate multilingual code repair capabilities in a consistent pipeline.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels