Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: initialize default metadata with all required fields #6583

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Feb 2, 2025

This PR fixes an issue where the evaluation script would error out if metadata.json doesn't exist. Now it will initialize a default EvalMetadata with all required fields when the file is missing.

Changes:

  • Added a fallback to create default metadata when metadata.json doesn't exist
  • Initializes EvalMetadata with all required fields:
    • agent_class: "dummy_agent" (placeholder)
    • llm_config: LLMConfig with "dummy_model"
    • max_iterations: 1
    • eval_output_dir: input file directory
    • start_time: current time
    • git_commit: current commit hash
    • dataset: from args.dataset
  • Maintains type safety by ensuring metadata is always an EvalMetadata instance

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:543d0ba-nikolaik   --name openhands-app-543d0ba   docker.all-hands.dev/all-hands-ai/openhands:543d0ba

@xingyaoww xingyaoww marked this pull request as ready for review February 2, 2025 05:02
@xingyaoww xingyaoww requested review from csmith49 and neubig February 2, 2025 05:02
Copy link
Collaborator

@csmith49 csmith49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Is it worth moving the defaults to the fallbacks in EvalMetadata so they're shared across all benchmarks? Looks pretty specific to SWE-bench.

@xingyaoww
Copy link
Collaborator Author

Is it worth moving the defaults to the fallbacks in EvalMetadata so they're shared across all benchmarks?

I think most other benchmarks won't have "eval_infer.py" which is pretty special/specific to SWE-Bench (e.g., only doing patch evaluation, but not inference), so I think having this being special case for SWE-Bench is probably ok?

@xingyaoww xingyaoww merged commit 90bbd4e into main Feb 3, 2025
14 checks passed
@xingyaoww xingyaoww deleted the fix/default-eval-metadata branch February 3, 2025 18:52
zchn pushed a commit to zchn/OpenHands that referenced this pull request Feb 4, 2025
adityasoni9998 pushed a commit to adityasoni9998/OpenHands that referenced this pull request Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants