[Bug]: Get empty git diff after inference although the agent did some work #228

kevin-support-bot · 2025-01-22T08:49:41Z

@tangken333, In matplotlib__matplotlib-25442, the agent stuck in a loop ~~as the workspace is empty~~ due to the symlink.

You can use this script to visualize the eval like this.

tangken333 · 2025-01-22T19:05:30Z

So why the workspace is empty. What I can do for that? Thanks

tangken333 · 2025-01-22T19:58:56Z

Besides, I found some diff is in a invalid format. Do you know what's wrong with that? Thanks!

SmartManoj · 2025-01-23T03:27:59Z

So why the workspace is empty. What I can do for that? Thanks

Check /testbed folder.

I found some diff is in a invalid format.

It shouldn't be. It is generated using git diff. Would you isolate that diff?

tangken333 · 2025-01-23T05:22:02Z

Thanks for the reply. Could you please tell me more about how to solve the workspace empty problem? I don't know how to go inside the environment. And I would like to ask if this empty problem happened from the building of docker or from each running. Thanks!

SmartManoj · 2025-01-23T05:27:29Z

You could use docker desktop

Maybe, there is a bug in the docker image. Here, the files are copied.

tangken333 · 2025-01-23T05:51:02Z

I am testing on a Linux server, so might can not use the desktop.

So if I have already finished the Docker building, is there anything I can do now to save the test? I have tested on two different Linux servers, and both have the empty git diff problem. Is there any temperate patch I can do?

A little bit hurry for the testing. Please let me know. Thanks!

SmartManoj · 2025-01-23T06:02:16Z

You could run docker exec -it <container-id> /bin/bash to open the container terminal. docker ps -a to get the container id.

I am debugging that instance now.

is there anything I can do now to save the test

Which test?

temperate patch

u meant temporary?

tangken333 · 2025-01-23T06:18:27Z

You could run docker exec -it /bin/bash to open the container terminal. docker ps -a to get the container id.

Yeah I know how to run Docker, but the container will delete after running, so I cannot go into the environment now. As for I docker run bash, I can see the testbed there.

I am debugging that instance now.

Really thanks for your help!

Which test?

I mean my running on swe-bench

u meant temporary?

Yes, temporary (damn writing completion tools)

SmartManoj · 2025-01-23T06:36:05Z

but the container will delete after running

Is the keep_runtime_alive sandbox config set?

tangken333 · 2025-01-23T06:40:56Z

Seems no.
All I do is:

clone the project and checkout to the CodeAct V2.1 version.
install the project by activate a conda and pip install .
run the command: ./evaluation/swe_bench/scripts/run_infer.sh llm.eval_o3 HEAD CodeActAgent 300 100 1 princeton-nlp/SWE-bench_Lite test

SmartManoj · 2025-01-23T06:42:25Z

Workaround: You can change the directory to /testbed here

SmartManoj · 2025-01-23T07:34:45Z

clone the project and checkout to the CodeAct V2.1 version.

Would you give the commit hash?

Works in the latest version.

tangken333 · 2025-01-23T07:38:17Z

I am using the commit: 6498204
Not sure if this is a random problem. Cause it should not happen if you are always running and developing.

Workaround: You can change the directory to /testbed

This seems to works! I am running on that now.

SmartManoj · 2025-01-23T07:39:10Z

All-Hands-AI#5549 This change fixed that.

tangken333 · 2025-01-23T07:41:22Z

All-Hands-AI#5549 This change fixed that.

I went through this bug fix! Before I merge this bugfix, all the git diffs are empty. After merging this bugfix, some of the git diffs are empty. Not sure they come from the same reason.

SmartManoj · 2025-01-23T07:49:30Z

All-Hands-AI#5659 Did you apply this one too?

tangken333 · 2025-01-23T07:51:09Z

No I didn't try the 5659. I will try later. Thanks for your help!

For #228

SmartManoj · 2025-01-23T09:34:53Z

I am using the commit: 6498204

added 9262560 on top of that to not remove the container and run only that instance;

@tangken333 Edit:
find /workspace/matplotlib__matplotlib__3.7 works after mv also.

SmartManoj · 2025-01-23T10:59:29Z

I am using the commit: 6498204

Why did you choose this commit?

Original traj uses ls -R

tangken333 · 2025-01-23T19:28:32Z

Why did you choose this commit?

I am following the commitId from the metadata. And I check the version of CodeAct agent, it is V2.1 (correct).

By the way, after I got the result, I face some new problems.

If I directed run the evaluation script from you
./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh /root/OpenHands/evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Lite-test/CodeActAgent/gpt-4o-2024-08-06_maxiter_100_N_v2.1-no-hint-run_1/output.jsonl
I got this error:

Detecting whether PROCESS_FILEPATH is in OH format or in SWE-bench format
==============================================================
The file IS NOT in SWE-bench format.
Merged output file with fine-grained report will be saved to /root/OpenHands/evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Lite-test/CodeActAgent/gpt-4o-2024-08-06_maxiter_100_N_v2.1-no-hint-run_1
Traceback (most recent call last):
  File "/root/Openhands_new/evaluation/benchmarks/swe_bench/scripts/eval/convert_oh_output_to_swe_json.py", line 6, in <module>
    from evaluation.benchmarks.swe_bench.eval_infer import process_git_patch
  File "/root/Openhands_new/evaluation/benchmarks/swe_bench/eval_infer.py", line 8, in <module>
    from swebench.harness.grading import get_eval_report
  File "/root/anaconda3/envs/openhands/lib/python3.12/site-packages/swebench/__init__.py", line 46, in <module>
    from swebench.harness.run_evaluation import (
  File "/root/anaconda3/envs/openhands/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 48, in <module>
    from swebench.harness.modal_eval import (
  File "/root/anaconda3/envs/openhands/lib/python3.12/site-packages/swebench/harness/modal_eval/__init__.py", line 1, in <module>
    from swebench.harness.modal_eval.run_evaluation_modal import run_instances_modal
  File "/root/anaconda3/envs/openhands/lib/python3.12/site-packages/swebench/harness/modal_eval/run_evaluation_modal.py", line 206, in <module>
    image=swebench_image.add_local_file(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Image' object has no attribute 'add_local_file'. Did you mean: 'copy_local_file'?
SWEBENCH_FORMAT_JSONL: /root/OpenHands/evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Lite-test/CodeActAgent/gpt-4o-2024-08-06_maxiter_100_N_v2.1-no-hint-run_1/output.swebench.jsonl
Error: /root/OpenHands/evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Lite-test/CodeActAgent/gpt-4o-2024-08-06_maxiter_100_N_v2.1-no-hint-run_1/output.swebench.jsonl does not exist. There is probably an error in the conversion process.

Then I just copy the git diff from the output and set to SWE-bench evaluation format manually and run the evaluation by myself. I got some invalid patch (I am using GPT-4o):

After I changed the model from GPT-4o to Claude-3.5-S, test on three instances, I can still get 1 invalid patch, but better. Is that related to the model?

SmartManoj · 2025-01-24T03:13:11Z

I am following the commitId from the metadata.

Would you provide the URL?

Your swebench version?

Is there any difference between the two diffs? eval_infer.sh uses this script to convert to swe_bench format.

tangken333 · 2025-01-24T06:57:54Z

Would you provide the URL?

All-Hands-AI#4537

Your swebench version?
swebench 3.0.4

Is there any difference between the two diffs? eval_infer.sh uses this script to convert to swe_bench format.

I don't think there are differences because I directly get the value from the dict. I give you the output.jsonl.

openhands_44_100_2.jsonl.zip

SmartManoj · 2025-01-24T07:08:46Z

Would you provide the URL?

All-Hands-AI#4537

outputs folder in .gitignore Would you give the direct link to JSON file?

swebench 3.0.4

Would you use this old SWE-bench version v2.0.13?

tangken333 · 2025-01-24T07:48:00Z

outputs folder in .gitignore Would you give the direct link to JSON file?

Sorry I didn't get this. What do you mean

Would you use this old SWE-bench version v2.0.13?

Yes I will try now

SmartManoj · 2025-01-24T07:51:47Z

I am following the commitId from the metadata.

Would you give the direct link to the metadata.json file?

tangken333 · 2025-01-24T07:58:04Z

I checkout to this commitId, and run by this command using the "HEAD"
./evaluation/swe_bench/scripts/run_infer.sh llm.eval_o3 HEAD CodeActAgent 300 100 1 princeton-nlp/SWE-bench_Lite test

SmartManoj · 2025-01-24T07:59:51Z

How did you get this image?

tangken333 · 2025-01-24T08:03:37Z

https://huggingface.co/spaces/OpenHands/evaluation

tangken333 · 2025-01-24T08:14:24Z

I got the valid patch now, by changing the swebench version and running the script. Thanks!

SmartManoj · 2025-01-25T01:30:49Z

But the commit ID is ea2cca3 as mentioned here.

tangken333 · 2025-01-25T04:02:55Z

interesting, I don't know. But they both belong to one pull request: All-Hands-AI#4537, which is strong related to this task.

I guess there is not too much different except for some small bugfix?

SmartManoj · 2025-01-25T05:40:08Z

79 commits

tangken333 · 2025-01-25T06:56:03Z

my bad, I will look at this also. But I am using a newer version. Should be better?

SmartManoj · 2025-01-25T07:03:41Z

Should be better?

Is the objective to just evaluate using GPT-4o? Why not verified-mini?

tangken333 · 2025-01-25T07:30:36Z

Is the objective to just evaluate using GPT-4o?

I am only using 4o now for my test. poor qwq

Why not verified-mini?

Oh I just know that. Thanks I will take a look.

SmartManoj added a commit that referenced this issue Jan 23, 2025

MRE

9262560

For #228

SmartManoj closed this as completed Jan 25, 2025

[Bug]: Get empty git diff after inference although the agent did some work #228

[Bug]: Get empty git diff after inference although the agent did some work #228

Comments

kevin-support-bot bot commented Jan 22, 2025 • edited by SmartManoj Loading

tangken333 commented Jan 22, 2025

tangken333 commented Jan 22, 2025

SmartManoj commented Jan 23, 2025

tangken333 commented Jan 23, 2025

SmartManoj commented Jan 23, 2025 • edited Loading

tangken333 commented Jan 23, 2025

SmartManoj commented Jan 23, 2025 • edited Loading

tangken333 commented Jan 23, 2025 • edited Loading

SmartManoj commented Jan 23, 2025

tangken333 commented Jan 23, 2025

SmartManoj commented Jan 23, 2025

SmartManoj commented Jan 23, 2025

tangken333 commented Jan 23, 2025 • edited Loading

SmartManoj commented Jan 23, 2025

tangken333 commented Jan 23, 2025

SmartManoj commented Jan 23, 2025

tangken333 commented Jan 23, 2025

SmartManoj commented Jan 23, 2025 • edited Loading

SmartManoj commented Jan 23, 2025

tangken333 commented Jan 23, 2025 • edited Loading

SmartManoj commented Jan 24, 2025 • edited Loading

tangken333 commented Jan 24, 2025

SmartManoj commented Jan 24, 2025 • edited Loading

tangken333 commented Jan 24, 2025

SmartManoj commented Jan 24, 2025

tangken333 commented Jan 24, 2025

SmartManoj commented Jan 24, 2025

tangken333 commented Jan 24, 2025

tangken333 commented Jan 24, 2025

SmartManoj commented Jan 25, 2025

tangken333 commented Jan 25, 2025 • edited Loading

SmartManoj commented Jan 25, 2025

tangken333 commented Jan 25, 2025

SmartManoj commented Jan 25, 2025

tangken333 commented Jan 25, 2025

kevin-support-bot bot commented Jan 22, 2025 •

edited by SmartManoj

Loading

SmartManoj commented Jan 23, 2025 •

edited

Loading

SmartManoj commented Jan 23, 2025 •

edited

Loading

tangken333 commented Jan 23, 2025 •

edited

Loading

tangken333 commented Jan 23, 2025 •

edited

Loading

SmartManoj commented Jan 23, 2025 •

edited

Loading

tangken333 commented Jan 23, 2025 •

edited

Loading

SmartManoj commented Jan 24, 2025 •

edited

Loading

SmartManoj commented Jan 24, 2025 •

edited

Loading

tangken333 commented Jan 25, 2025 •

edited

Loading