DRAFT: Critic agent that forks generator settings for PR-style review#1706
DRAFT: Critic agent that forks generator settings for PR-style review#1706
Conversation
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||
|
@OpenHands do a /codereview-roasted and help me address all review comments in this PR. When you are done, you should help me write a script in examples/0 folder that demonstrate this critic work in action. You can refer to the existing delagation example, as well as the example introduced in #1269 |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
Final summary (double-checked)✅ Checklist against your request
🔍 Conciseness / extraneous changes check
📌 What changed (since the last state on main)
🚀 Push status
|
|
@OpenHands review all current PR reviews and modify the code to address all the review comments. If you check examples/01_standalone_sdk/33_hooks/33_hooks.py example, you can see how we can set a Hook at Finish event. Can you extend the hook system, so we can pass in custom callback function python for the finish event hook, then we should modify the critic agent's implementation so critic agent runs in a callback function in a finish hook? |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Address PR review comments: - Make llm field required in AgentReviewCritic (not optional) - Remove _extract_llm and _extract_agent methods - Add agent_factory parameter to allow custom agent creation - Add create_critic_stop_hook() helper for callback-based stop hooks - Create get_critic_agent in openhands.tools.preset.critic Extend hook system: - Add HookType.CALLBACK for Python callback functions - Update HookDefinition to support callback field - Update HookExecutor to execute callback hooks - Add validation to ensure callback is provided for CALLBACK type Move example: - Move from examples/00_critic/ to examples/01_standalone_sdk/34_critic_agent_hook.py - Update example to use callback-based hook with agent_factory Add tests: - Add tests for callback hook functionality - Update critic tests to use mock LLM fixture Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI've successfully addressed all the PR review comments and implemented the requested features for PR #1706. The changes have been pushed to the Changes Made1. PR Review Comments Addressed ✅
2. Hook System Extended for Python Callbacks ✅
3. Critic Stop Hook Helper ✅
4. Tests Added ✅
Files Changed
Example Usagefrom openhands.sdk.critic.impl.agent_review import (
AgentReviewCritic,
create_critic_stop_hook,
)
from openhands.tools.preset.critic import get_critic_agent
# Create critic with custom agent factory
critic = AgentReviewCritic(
llm=llm,
agent_factory=get_critic_agent, # Optional: uses default if not provided
review_style="roasted",
)
# Create callback-based stop hook
hook_config = HookConfig(
stop=[
HookMatcher(
hooks=[create_critic_stop_hook(critic, str(workspace))]
)
]
)The PR is now updated at: #1706 |
Remove the main() function and write the example as a direct script, consistent with other examples in the 01_standalone_sdk folder. Co-authored-by: openhands <openhands@all-hands.dev>
Add EXAMPLES_STYLE section noting that examples in 01_standalone_sdk should be written as direct scripts without a main() function wrapper. Co-authored-by: openhands <openhands@all-hands.dev>
4f2ad9a to
4475dfc
Compare
|
[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
@OpenHands please merge from main and resolve all the conflicts |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Resolve conflicts in critic module __init__.py files by including both AgentReviewCritic (from this branch) and APIBasedCritic (from main). Co-authored-by: openhands <openhands@all-hands.dev>
|
Successfully merged main into the Summary of ChangesConflicts Resolved:
Additional Fix:
The changes have been committed and pushed to the |
|
@OpenHands please fix the failing actions on PR #1706 at branch |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
- Fix circular import in agent_review.py by using lazy imports for Agent and Conversation - Add model_rebuild() call in SDK __init__.py to resolve forward references - Use SkipJsonSchema for agent_factory field to prevent JSON schema generation errors - Rename 34_critic_example.py to 36_critic_example.py to avoid duplicate example numbers Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI've fixed the failing CI checks on PR #1706 by addressing the following issues: Changes Made
Verification
CommitThe changes have been pushed to the
The CI should now pass for the SDK tests, OpenAPI schema check, and duplicate example number check. |
…onents - Add IterativeRefinement class for automatic retry loop based on critic scores - Add CriticResultCollector for capturing critic results via callbacks - Add get_default_critic() utility for auto-configuring All-Hands LLM proxy - Add default_followup_prompt() for generating follow-up prompts - Add IterativeRefinementResult dataclass for structured results - Update 34_critic_example.py to use the new SDK components - Add comprehensive tests for all new components This refactoring enables PR #1706 (AgentReviewCritic) to reuse the same iterative refinement infrastructure defined here. Related: OpenHands/OpenHands#2221 Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands we've just merged #1879 into main. please check the diff against main for this PR, understand what it implements. And then try to merge from main resolve all conflicts and rewrite this PR based on the schema/pattern that PR 1879 sets up for iterative refinement. This PR is essentially doing Iterative Refinement with a CodeReviewCritic. |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
- Add get_followup_prompt() override for code-review-specific feedback - Add workspace_dir parameter to get git diff automatically - Update evaluate() to fetch git diff from workspace when not provided - Update 34_critic_agent_hook.py example to use iterative refinement pattern - Merge from main to incorporate IterativeRefinementConfig from PR #1879 This aligns AgentReviewCritic with the iterative refinement pattern established in PR #1879, allowing the critic to be used with the built-in retry mechanism instead of requiring stop hooks. Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI've successfully merged from main and rewritten PR #1706 to use the iterative refinement pattern established in PR #1879. The changes have been pushed to the What was done:
Key Design Changes:The
Verification:
The PR is now aligned with the iterative refinement schema/pattern from PR #1879. |
|
@OpenHands please fix the failing actions on PR #1706 at branch |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
…licate number Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI fixed the failing CI actions on PR #1706 by addressing the duplicate example number issue. ProblemAfter merging from main, there were two example files with the same numeric prefix
This caused the FixRenamed CI StatusAll required checks are now passing:
The only remaining failing check is CommitThe changes have been pushed to the |
…ritic_agent_hook.py The AgentReviewCritic example (34_critic_agent_hook.py) replaces the old APIBasedCritic example. The old example was renamed to 38 to avoid conflicts but is no longer needed. Co-authored-by: openhands <openhands@all-hands.dev>
…elector Combines API-based critic and AgentReviewCritic into a single example file. Use CRITIC_MODE=api (default) or CRITIC_MODE=agent_review to select. - Removes 34_critic_agent_hook.py and 38_critic_example.py - Creates unified 34_critic_example.py with both critic modes - API mode: uses external critic API endpoint (auto-configures for All-Hands proxy) - Agent review mode: spawns separate OpenHands agent for PR-style code review Co-authored-by: openhands <openhands@all-hands.dev>
…elector Combines API-based critic and AgentReviewCritic into a single example file. Use CRITIC_MODE=api (default) or CRITIC_MODE=agent_review to select. - Removes 34_critic_agent_hook.py and 38_critic_example.py - Creates unified 34_critic_example.py with both critic modes - API mode: uses external critic API endpoint (auto-configures for All-Hands proxy) - Agent review mode: spawns separate OpenHands agent for PR-style code review - Fix serialization issue: exclude agent_factory from JSON serialization Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands please fix the failing actions on PR #1706 at branch |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
- Use SkipJsonSchema for AgentReviewCritic.agent_factory to prevent OpenAPI schema generation errors - Use Field(exclude=True) for HookDefinition.callback to exclude from serialization This fixes the 'Cannot generate a JsonSchema for core_schema.CallableSchema' error that was causing the CI 'Check OpenAPI Schema' job to fail. Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI've fixed the failing CI action on PR #1706 by addressing the "Check OpenAPI Schema" job failure. ProblemThe CI was failing with the error: This occurred because SolutionMade minimal changes to 2 files:
Verification
The changes have been pushed and should trigger a new CI run that passes the "Check OpenAPI Schema" job. |
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
Summary
AgentReviewCritic, aCriticBaseimplementation that spawns a separate OpenHands agent to review the current git diff.Fixes #1704
Checklist
@xingyaoww can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:74b963f-pythonRun
All tags pushed for this build
About Multi-Architecture Support
74b963f-python) is a multi-arch manifest supporting both amd64 and arm6474b963f-python-amd64) are also available if needed