Skip to content

Comments

DRAFT: Critic agent that forks generator settings for PR-style review#1706

Draft
xingyaoww wants to merge 17 commits intomainfrom
openhands/critic-agent-reviewer
Draft

DRAFT: Critic agent that forks generator settings for PR-style review#1706
xingyaoww wants to merge 17 commits intomainfrom
openhands/critic-agent-reviewer

Conversation

@xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 12, 2026

Summary

  • Add AgentReviewCritic, a CriticBase implementation that spawns a separate OpenHands agent to review the current git diff.
  • The critic forks the current agent settings (LLM + agent config) from the running conversation rather than reading critic-specific environment variables.
  • Includes unit tests for critic output parsing.

Fixes #1704

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@xingyaoww can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:74b963f-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-74b963f-python \
  ghcr.io/openhands/agent-server:74b963f-python

All tags pushed for this build

ghcr.io/openhands/agent-server:74b963f-golang-amd64
ghcr.io/openhands/agent-server:74b963f-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:74b963f-golang-arm64
ghcr.io/openhands/agent-server:74b963f-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:74b963f-java-amd64
ghcr.io/openhands/agent-server:74b963f-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:74b963f-java-arm64
ghcr.io/openhands/agent-server:74b963f-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:74b963f-python-amd64
ghcr.io/openhands/agent-server:74b963f-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:74b963f-python-arm64
ghcr.io/openhands/agent-server:74b963f-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:74b963f-golang
ghcr.io/openhands/agent-server:74b963f-java
ghcr.io/openhands/agent-server:74b963f-python

About Multi-Architecture Support

  • Each variant tag (e.g., 74b963f-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 74b963f-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Jan 12, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk
   __init__.py24291%67–68
openhands-sdk/openhands/sdk/critic/impl
   agent_review.py1096540%38–39, 41, 47, 136–138, 144–146, 153, 156–157, 159, 161, 166–167, 169–172, 174–175, 180, 182, 185, 188–190, 195, 207–217, 236–238, 258, 272, 303–305, 307, 309, 311, 319–320, 326, 328–329, 335, 343–345, 350–352, 357
openhands-sdk/openhands/sdk/hooks
   config.py1401390%77, 105, 117–118, 124, 126, 204, 209, 270, 319–320, 322–323
   executor.py911187%36, 65, 77, 96, 111, 160–161, 175–176, 181–182
TOTAL17890555068% 

@xingyaoww
Copy link
Collaborator Author

@OpenHands do a /codereview-roasted and help me address all review comments in this PR.

When you are done, you should help me write a script in examples/0 folder that demonstrate this critic work in action. You can refer to the existing delagation example, as well as the example introduced in #1269

@openhands-ai
Copy link

openhands-ai bot commented Jan 13, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 13, 2026

Final summary (double-checked)

✅ Checklist against your request

  • Address all PR review comments found on PR DRAFT: Critic agent that forks generator settings for PR-style review #1706
    • Switched parsing/output to use CriticResult directly (no custom output model).
    • Added llm: LLM | None field to AgentReviewCritic so callers can pass the LLM explicitly (removes the need for the “extract_llm hack” as the primary path).
  • Add an example script demonstrating the critic working in action
    • Added examples/00_critic/00_agent_review_critic_stop_hook.py showing a Stop-hook gating flow with AgentReviewCritic.

🔍 Conciseness / extraneous changes check

  • The diff vs origin/main contains only:
    • New critic implementation + exports
    • New unit tests for parsing
    • New example script
  • No unrelated refactors or drive-by formatting changes outside those files.

📌 What changed (since the last state on main)

  • New critic: openhands-sdk/openhands/sdk/critic/impl/agent_review.py
    • AgentReviewCritic.llm: LLM | None = None
    • _parse_output() now returns CriticResult
    • evaluate() returns parsed CriticResult directly
  • Exports wired up
    • openhands-sdk/openhands/sdk/critic/__init__.py
    • openhands-sdk/openhands/sdk/critic/impl/__init__.py
  • New tests: tests/sdk/critic/test_agent_review_critic.py
    • Updated assertions to match CriticResult
  • New example: examples/00_critic/00_agent_review_critic_stop_hook.py
    • Creates a temp git repo, generates a diff, runs the critic, and uses a Stop hook to gate stopping until approval.

🚀 Push status

View full conversation

@xingyaoww
Copy link
Collaborator Author

@OpenHands review all current PR reviews and modify the code to address all the review comments. If you check examples/01_standalone_sdk/33_hooks/33_hooks.py example, you can see how we can set a Hook at Finish event. Can you extend the hook system, so we can pass in custom callback function python for the finish event hook, then we should modify the critic agent's implementation so critic agent runs in a callback function in a finish hook?

@openhands-ai
Copy link

openhands-ai bot commented Jan 20, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

Address PR review comments:
- Make llm field required in AgentReviewCritic (not optional)
- Remove _extract_llm and _extract_agent methods
- Add agent_factory parameter to allow custom agent creation
- Add create_critic_stop_hook() helper for callback-based stop hooks
- Create get_critic_agent in openhands.tools.preset.critic

Extend hook system:
- Add HookType.CALLBACK for Python callback functions
- Update HookDefinition to support callback field
- Update HookExecutor to execute callback hooks
- Add validation to ensure callback is provided for CALLBACK type

Move example:
- Move from examples/00_critic/ to examples/01_standalone_sdk/34_critic_agent_hook.py
- Update example to use callback-based hook with agent_factory

Add tests:
- Add tests for callback hook functionality
- Update critic tests to use mock LLM fixture

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 20, 2026

Summary

I've successfully addressed all the PR review comments and implemented the requested features for PR #1706. The changes have been pushed to the openhands/critic-agent-reviewer branch.

Changes Made

1. PR Review Comments Addressed ✅

  • Made llm field required in AgentReviewCritic (no longer optional)
  • Removed _extract_llm method - no longer needed since LLM is now required
  • Removed _extract_agent method - replaced with agent_factory parameter pattern
  • Added agent_factory parameter - allows users to provide custom agent creation (similar to get_planning_agent pattern)
  • Created get_critic_agent in openhands.tools.preset.critic - provides a pre-configured critic agent with file viewing tools
  • Moved example from examples/00_critic/ to examples/01_standalone_sdk/34_critic_agent_hook.py

2. Hook System Extended for Python Callbacks ✅

  • Added HookType.CALLBACK - new hook type for Python callback functions
  • Updated HookDefinition - added callback field for Python functions
  • Updated HookExecutor - handles callback execution alongside command execution
  • Added validation - ensures callback is provided for CALLBACK type hooks

3. Critic Stop Hook Helper ✅

  • Created create_critic_stop_hook() - helper function that creates a callback-based stop hook
  • The callback:
    • Gets the current git diff from the workspace
    • Runs the critic agent to review the diff
    • Returns allow/deny based on the critic's decision
    • Provides feedback to the agent if denied

4. Tests Added ✅

  • Added tests for callback hook functionality (6 new tests)
  • Updated critic tests to use mock LLM fixture
  • Added test to verify llm is required

Files Changed

  • openhands-sdk/openhands/sdk/critic/impl/agent_review.py - Refactored critic implementation
  • openhands-sdk/openhands/sdk/hooks/config.py - Added CALLBACK hook type
  • openhands-sdk/openhands/sdk/hooks/executor.py - Added callback execution support
  • openhands-tools/openhands/tools/preset/critic.py - New file with get_critic_agent
  • examples/01_standalone_sdk/34_critic_agent_hook.py - New example demonstrating callback hooks
  • tests/sdk/critic/test_agent_review_critic.py - Updated tests
  • tests/sdk/hooks/test_executor.py - Added callback hook tests

Example Usage

from openhands.sdk.critic.impl.agent_review import (
    AgentReviewCritic,
    create_critic_stop_hook,
)
from openhands.tools.preset.critic import get_critic_agent

# Create critic with custom agent factory
critic = AgentReviewCritic(
    llm=llm,
    agent_factory=get_critic_agent,  # Optional: uses default if not provided
    review_style="roasted",
)

# Create callback-based stop hook
hook_config = HookConfig(
    stop=[
        HookMatcher(
            hooks=[create_critic_stop_hook(critic, str(workspace))]
        )
    ]
)

The PR is now updated at: #1706

View full conversation

Remove the main() function and write the example as a direct script,
consistent with other examples in the 01_standalone_sdk folder.

Co-authored-by: openhands <openhands@all-hands.dev>
Add EXAMPLES_STYLE section noting that examples in 01_standalone_sdk
should be written as direct scripts without a main() function wrapper.

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww xingyaoww force-pushed the openhands/critic-agent-reviewer branch from 4f2ad9a to 4475dfc Compare January 20, 2026 02:23
@all-hands-bot
Copy link
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

@xingyaoww
Copy link
Collaborator Author

@OpenHands please merge from main and resolve all the conflicts

@openhands-ai
Copy link

openhands-ai bot commented Feb 4, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

Resolve conflicts in critic module __init__.py files by including both
AgentReviewCritic (from this branch) and APIBasedCritic (from main).

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Feb 4, 2026

Successfully merged main into the openhands/critic-agent-reviewer branch and resolved all conflicts.

Summary of Changes

Conflicts Resolved:

  • openhands-sdk/openhands/sdk/critic/__init__.py - Merged to include both AgentReviewCritic (from this branch) and APIBasedCritic (from main)
  • openhands-sdk/openhands/sdk/critic/impl/__init__.py - Same resolution, including both critic implementations

Additional Fix:

  • Applied auto-fix from pre-commit hook for a typing annotation in openhands-sdk/openhands/sdk/hooks/config.py (changed "HookConfig" string annotation to direct HookConfig reference)

The changes have been committed and pushed to the openhands/critic-agent-reviewer branch on GitHub. PR #1706 is now updated with the merge from main.

View full conversation

@xingyaoww
Copy link
Collaborator Author

@OpenHands please fix the failing actions on PR #1706 at branch openhands/critic-agent-reviewer

@openhands-ai
Copy link

openhands-ai bot commented Feb 4, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

- Fix circular import in agent_review.py by using lazy imports for Agent and Conversation
- Add model_rebuild() call in SDK __init__.py to resolve forward references
- Use SkipJsonSchema for agent_factory field to prevent JSON schema generation errors
- Rename 34_critic_example.py to 36_critic_example.py to avoid duplicate example numbers

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Feb 4, 2026

Summary

I've fixed the failing CI checks on PR #1706 by addressing the following issues:

Changes Made

  1. Fixed circular import in agent_review.py:

    • Moved Agent, Conversation imports to lazy imports inside functions that use them
    • Moved LLMConvertibleEvent, HookDefinition, HookResult, HookEvent to TYPE_CHECKING block
    • Added from __future__ import annotations for forward references
    • Changed AgentFactory type alias to use string forward reference "Agent"
  2. Fixed OpenAPI schema generation:

    • Added model_rebuild() call in openhands/sdk/__init__.py to resolve forward references after all imports are done
    • Used SkipJsonSchema wrapper for the agent_factory field since Callable types can't be serialized to JSON schema
  3. Fixed duplicate example numbers:

    • Renamed 34_critic_example.py to 36_critic_example.py to avoid conflict with existing 34_critic_agent_hook.py

Verification

  • ✅ SDK imports work correctly: from openhands.sdk import Agent, Conversation
  • ✅ OpenAPI schema generation succeeds
  • ✅ All 2087 SDK tests pass (with pytest-xdist parallel execution)
  • ✅ All 45 critic tests pass
  • ✅ Duplicate example number check passes
  • ✅ Pre-commit hooks pass (ruff, pyright, etc.)

Commit

The changes have been pushed to the openhands/critic-agent-reviewer branch:

  • Commit: 843e4ab0 - "Fix circular import and duplicate example numbers"

The CI should now pass for the SDK tests, OpenAPI schema check, and duplicate example number check.

View full conversation

xingyaoww pushed a commit that referenced this pull request Feb 8, 2026
…onents

- Add IterativeRefinement class for automatic retry loop based on critic scores
- Add CriticResultCollector for capturing critic results via callbacks
- Add get_default_critic() utility for auto-configuring All-Hands LLM proxy
- Add default_followup_prompt() for generating follow-up prompts
- Add IterativeRefinementResult dataclass for structured results
- Update 34_critic_example.py to use the new SDK components
- Add comprehensive tests for all new components

This refactoring enables PR #1706 (AgentReviewCritic) to reuse the same
iterative refinement infrastructure defined here.

Related: OpenHands/OpenHands#2221

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww
Copy link
Collaborator Author

@OpenHands we've just merged #1879 into main. please check the diff against main for this PR, understand what it implements. And then try to merge from main resolve all conflicts and rewrite this PR based on the schema/pattern that PR 1879 sets up for iterative refinement. This PR is essentially doing Iterative Refinement with a CodeReviewCritic.

@openhands-ai
Copy link

openhands-ai bot commented Feb 9, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

- Add get_followup_prompt() override for code-review-specific feedback
- Add workspace_dir parameter to get git diff automatically
- Update evaluate() to fetch git diff from workspace when not provided
- Update 34_critic_agent_hook.py example to use iterative refinement pattern
- Merge from main to incorporate IterativeRefinementConfig from PR #1879

This aligns AgentReviewCritic with the iterative refinement pattern
established in PR #1879, allowing the critic to be used with the
built-in retry mechanism instead of requiring stop hooks.

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Feb 9, 2026

Summary

I've successfully merged from main and rewritten PR #1706 to use the iterative refinement pattern established in PR #1879. The changes have been pushed to the openhands/critic-agent-reviewer branch.

What was done:

  1. Merged from main - Incorporated all changes from PR feat(examples): update critic example to demonstrate iterative refinement #1879 including:

    • IterativeRefinementConfig class
    • CriticMixin with _check_iterative_refinement method
    • get_followup_prompt pattern in CriticBase
  2. Updated AgentReviewCritic (openhands-sdk/openhands/sdk/critic/impl/agent_review.py):

    • Added workspace_dir parameter to specify where to get git diff from
    • Added _get_git_patch() method to automatically fetch git diff from workspace
    • Updated evaluate() to fetch git diff when not provided (enables iterative refinement)
    • Added get_followup_prompt() override with code-review-specific feedback
  3. Updated example (examples/01_standalone_sdk/34_critic_agent_hook.py):

    • Rewrote to demonstrate iterative refinement pattern instead of stop hooks
    • Shows how to use AgentReviewCritic with IterativeRefinementConfig
    • The critic now automatically evaluates git diffs when the agent tries to finish

Key Design Changes:

The AgentReviewCritic now supports two usage patterns:

  1. Iterative Refinement (Recommended): Attach to an agent with IterativeRefinementConfig. The critic automatically gets the git diff from the workspace and triggers follow-up prompts if quality threshold isn't met.

  2. Stop Hook (Alternative): Use with create_critic_stop_hook() for callback-based blocking.

Verification:

  • All critic tests pass (45 tests)
  • All iterative refinement tests pass (16 tests)
  • Lint checks pass (ruff)
  • Type checks pass (pyright)

The PR is now aligned with the iterative refinement schema/pattern from PR #1879.

View full conversation

@xingyaoww
Copy link
Collaborator Author

@OpenHands please fix the failing actions on PR #1706 at branch openhands/critic-agent-reviewer

@openhands-ai
Copy link

openhands-ai bot commented Feb 9, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

…licate number

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Feb 9, 2026

Summary

I fixed the failing CI actions on PR #1706 by addressing the duplicate example number issue.

Problem

After merging from main, there were two example files with the same numeric prefix 36_:

  • 36_critic_example.py (from this branch)
  • 36_event_json_to_openai_messages.py (from main)

This caused the check-duplicates CI check to fail.

Fix

Renamed 36_critic_example.py to 38_critic_example.py to avoid the duplicate number conflict.

CI Status

All required checks are now passing:

  • ✅ pre-commit
  • ✅ tools-tests
  • ✅ sdk-tests
  • ✅ cross-tests
  • ✅ agent-server-tests
  • ✅ build-binary-and-test (ubuntu-latest)
  • ✅ Check OpenAPI Schema
  • ✅ check-duplicates (was failing, now fixed)

The only remaining failing check is check-examples (named [Optional] Docs example), which is not a required check. It fails because the new example files are not yet documented in the docs repository - this is expected for a draft PR and can be addressed when the PR is ready for final review.

Commit

0a2b7ed0 fix: rename 36_critic_example.py to 38_critic_example.py to avoid duplicate number

The changes have been pushed to the openhands/critic-agent-reviewer branch.

View full conversation

…ritic_agent_hook.py

The AgentReviewCritic example (34_critic_agent_hook.py) replaces the old
APIBasedCritic example. The old example was renamed to 38 to avoid conflicts
but is no longer needed.

Co-authored-by: openhands <openhands@all-hands.dev>
…elector

Combines API-based critic and AgentReviewCritic into a single example file.
Use CRITIC_MODE=api (default) or CRITIC_MODE=agent_review to select.

- Removes 34_critic_agent_hook.py and 38_critic_example.py
- Creates unified 34_critic_example.py with both critic modes
- API mode: uses external critic API endpoint (auto-configures for All-Hands proxy)
- Agent review mode: spawns separate OpenHands agent for PR-style code review

Co-authored-by: openhands <openhands@all-hands.dev>
…elector

Combines API-based critic and AgentReviewCritic into a single example file.
Use CRITIC_MODE=api (default) or CRITIC_MODE=agent_review to select.

- Removes 34_critic_agent_hook.py and 38_critic_example.py
- Creates unified 34_critic_example.py with both critic modes
- API mode: uses external critic API endpoint (auto-configures for All-Hands proxy)
- Agent review mode: spawns separate OpenHands agent for PR-style code review
- Fix serialization issue: exclude agent_factory from JSON serialization

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww
Copy link
Collaborator Author

@OpenHands please fix the failing actions on PR #1706 at branch openhands/critic-agent-reviewer

@openhands-ai
Copy link

openhands-ai bot commented Feb 10, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

- Use SkipJsonSchema for AgentReviewCritic.agent_factory to prevent
  OpenAPI schema generation errors
- Use Field(exclude=True) for HookDefinition.callback to exclude from
  serialization

This fixes the 'Cannot generate a JsonSchema for core_schema.CallableSchema'
error that was causing the CI 'Check OpenAPI Schema' job to fail.

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Feb 10, 2026

Summary

I've fixed the failing CI action on PR #1706 by addressing the "Check OpenAPI Schema" job failure.

Problem

The CI was failing with the error:

pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.CallableSchema

This occurred because Callable types (like agent_factory in AgentReviewCritic and callback in HookDefinition) cannot be serialized to JSON schema, which broke OpenAPI schema generation.

Solution

Made minimal changes to 2 files:

  1. openhands-sdk/openhands/sdk/critic/impl/agent_review.py:

    • Added SkipJsonSchema wrapper to the agent_factory field to exclude it from JSON schema generation
    • Import added: from pydantic.json_schema import SkipJsonSchema
  2. openhands-sdk/openhands/sdk/hooks/config.py:

    • Changed callback field to use Field(default=None, exclude=True) to exclude it from serialization

Verification

  • make test-server-schema now passes
  • ✅ All existing tests pass (tests/sdk/critic/test_agent_review_critic.py)
  • ✅ Pre-commit checks pass
  • ✅ Changes pushed to openhands/critic-agent-reviewer branch

The changes have been pushed and should trigger a new CI run that passes the "Check OpenAPI Schema" job.

View full conversation

@openhands-ai
Copy link

openhands-ai bot commented Feb 10, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Review Thread Gate

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1706 at branch `openhands/critic-agent-reviewer`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@enyst enyst added behavior-initiative This is related to the system prompt sections and LLM steering. and removed behavior-initiative This is related to the system prompt sections and LLM steering. labels Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Interleave codereview-roasted with agent task: add critic-agent model that reviews git diff and returns pass/fail

4 participants