Conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
Review Summary
This PR removes two model configurations (claude-haiku-4-5-20251001 and deepseek-chat) from .github/run-eval/resolve_model_config.py, but leaves numerous references to these models throughout the codebase. This will cause test failures and CI failures.
🔴 Critical Issues (Must Fix)
1. Test Failure
File: tests/github_workflows/test_resolve_model_config.py
Line: 49
Issue: Test explicitly asserts result[1]["id"] == "deepseek-chat", but this model no longer exists in MODELS.
Fix: Update the test assertion to reference a model that still exists.
2. CI Workflow Failure - run-examples.yml
File: .github/workflows/run-examples.yml
Line: 66
Issue: Uses LLM_MODEL: openhands/claude-haiku-4-5-20251001 which is no longer in supported configurations.
Fix: Update to a supported model like claude-sonnet-4-5-20250929.
3. CI Workflow Failure - integration-runner.yml
File: .github/workflows/integration-runner.yml
Line: 92
Issue: Uses model: litellm_proxy/deepseek/deepseek-chat which is no longer supported.
Fix: Update to a supported model or remove this test configuration.
🟠 Important Issues (Should Address)
4. Verified Models List - claude-haiku
File: openhands-sdk/openhands/sdk/llm/utils/verified_models.py
Line: 23
Issue: claude-haiku-4-5-20251001 still listed in VERIFIED_ANTHROPIC_MODELS.
Fix: Remove for consistency, or clarify in PR description why it remains verified for general use but removed from eval configs.
5. Verified Models List - deepseek-chat
File: openhands-sdk/openhands/sdk/llm/utils/verified_models.py
Line: 53
Issue: deepseek-chat still listed in VERIFIED_OPENHANDS_MODELS.
Fix: Remove for consistency, or clarify in PR description why it remains verified for general use but removed from eval configs.
6. Test Default Model
File: tests/integration/utils/llm_judge.py
Line: 105
Issue: Default LLM_JUDGE_MODEL is litellm_proxy/claude-haiku-4-5-20251001.
Fix: Update to a currently supported model to avoid runtime failures.
Additional References Found
These may also need updates:
tests/sdk/llm/test_model_features.py- Contains test cases referencing both modelstests/fixtures/llm_data/- Test data files and README referencingdeepseek-chatexamples/03_github_workflows/*/README.md- Documentation mentioningclaude-haiku-4-5-20251001
Recommendation
Before merging, please:
- Fix all Critical issues (items 1-3) to prevent CI/test failures
- Address Important issues (items 4-6) for consistency
- Search for and update any remaining references in tests and documentation
- Run the test suite to verify no failures:
pytest tests/github_workflows/test_resolve_model_config.py
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like or Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
|
My bad, spoke too fast. I see these are only the models for evals! I thought they were available to people e.g. in proxies or such. Edited to add: as long as it has conflicts, I don't see the dismiss review option. Maybe PEBKAC! 😅 |
|
@OpenHands Merge main into this PR and fix the conflicts. Commit and push here. |
|
I'm on it! enyst can track my progress at all-hands.dev |
Final summary (new since my last update)What I verified
Conciseness / extra changes check
Conflict resolution performed
Commit(s)
|
Summary
Remove models we dont want to evaluate
And delete the useless test: it's not even run by the CI and it doesn't test anything important.
Checklist
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:4d1fad5-pythonRun
All tags pushed for this build
About Multi-Architecture Support
4d1fad5-python) is a multi-arch manifest supporting both amd64 and arm644d1fad5-python-amd64) are also available if needed