Skip to content

Encountering Empty Patches and Severe Looping Issues while Testing the swe-bench_verified Dataset using Openhands Framework #14

@Succoney

Description

@Succoney

I have been testing the swe-bench_verified dataset using the Seed-coder-8b-instruct model along with the Openhands framework. Out of 179 test cases, the accuracy achieved was only 3.9%, which is significantly lower than the reported 11.2% in the paper.

Key Findings:

Empty Patches: During the testing process, I observed that the Seed-coder model produced many empty patches, which seems to be the primary reason for the low solution rate. Out of the 179 test cases, only 7 were solved, 37 produced patches, and 138 did not produce any patches at all.

Looping Issues: Additionally, I encountered significant looping issues during inference. For instance, the system keeps repeating similar tasks without making progress, as shown in the following example:
{
"role": "assistant",
"content": "<function=execute_bash><parameter=command> grep -rn 'ORDER BY' /workspace/django__django__3.0/django/db/models/sql/compiler.py:86-100 "
},
{
"role": "user",
"content": "EXECUTION RESULT of [execute_bash]: grep: /workspace/django__django__3.0/django/db/models/sql/compiler.py:86-100: No such file or directory [The command completed with exit code 2.] [Current working directory: /workspace/django__django__3.0] [Python interpreter: /opt/miniconda3/envs/testbed/bin/python] [Command finished with exit code 2]"
},
{
"role": "assistant",
"content": "<function=execute_bash><parameter=command> grep -rn 'ORDER BY' /workspace/django__django__3.0/django/db/models/sql/compiler.py:257-300 "
},
{
"role": "user",
"content": "EXECUTION RESULT of [execute_bash]: grep: /workspace/django__django__3.0/django/db/models/sql/compiler.py:257-300: No such file or directory [The command completed with exit code 2.] [The command finished with exit code 2]"
},
{
"role": "assistant",
"content": "<function=execute_bash><parameter=command> grep -rn 'ORDER BY' /workspace/django__django__3.0/django/db/models/sql/compiler.py:359-400 "
}
Additional Observations:

Looping Issue: In total, 66 test cases out of the 179 displayed looping behavior during inference.

Excessive Inference Steps: 114 test cases exceeded 120 rounds of inference, with 5 test cases exhibiting both looping and excessive rounds.

Experimental Setup:
Openhands Framework: Maximum inference steps set to 120; using CodeActAgent with parallelism set to 5.

Seed-coder 7b: Temperature set to 0.0; using OpenAI parameter format and deployed locally with vllm. Maximum model length is 32768.

Could you kindly assist in resolving these issues or provide any insights on potential causes?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions