Skip to content

Comments

feat(hooks): add async hook execution support#1849

Merged
neubig merged 15 commits intoOpenHands:mainfrom
ixchio:feat/async-hooks-support
Feb 19, 2026
Merged

feat(hooks): add async hook execution support#1849
neubig merged 15 commits intoOpenHands:mainfrom
ixchio:feat/async-hooks-support

Conversation

@ixchio
Copy link
Contributor

@ixchio ixchio commented Jan 29, 2026

Summary

Implements async (background) hook execution as proposed in #1828. Hooks marked with async: true now fire-and-forget without blocking the agent's execution.

Changes

  • config.py: Added async field to HookDefinition with alias handling for Python's reserved keyword
  • executor.py: Implemented fire-and-forget execution path using subprocess.Popen + AsyncProcessManager for lifecycle tracking
  • manager.py: Added warning for async PreToolUse hooks (can't block), cleanup on session end
  • Tests: Full coverage for async config parsing, execution, timeout handling, and cleanup

Usage

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "./analytics.sh",
        "async": true,
        "timeout": 30
      }]
    }]
  }
}

Notes

  • Async hooks return immediately with async_started: true in the result
  • Background processes are tracked and cleaned up on session end or timeout
  • PreToolUse async hooks log a warning since they can't block tool execution

Closes #1828

Implements background hook execution for non-blocking operations like
logging, notifications, and analytics. Async hooks fire-and-forget
without blocking the agent's execution flow.

Changes:
- Add async field to HookDefinition config
- Implement fire-and-forget execution via subprocess.Popen
- Add AsyncProcessManager for background process lifecycle
- Handle cleanup on session end and timeout expiration
- Add warning for async PreToolUse hooks (cannot block)
- Comprehensive test coverage for new functionality

Closes OpenHands#1828
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR implements async hook execution well overall with good test coverage. However, there are several important issues around process lifecycle management and observability that should be addressed before merging.


🟠 Important Issues

1. No Periodic Cleanup of Expired Processes

Location: executor.py:60 (AsyncProcessManager.cleanup_expired)

cleanup_expired() is never called during execution - only cleanup_all() runs at session end. This means async hooks that exceed their timeout will keep running until the session ends.

Recommendation: Call cleanup_expired() periodically:

  • In execute() before starting new async processes
  • In execute_all() between hook executions
  • In run_post_tool_use() and other hook trigger points

2. Zombie Process Risk

Location: executor.py:65, executor.py:76

Using terminate() without wait() can create zombie processes. The process becomes a zombie until the parent calls wait().

Fix:

try:
    process.terminate()
    process.wait(timeout=1)  # Wait for graceful termination
except subprocess.TimeoutExpired:
    process.kill()  # Force kill if it doesn't terminate
    process.wait()
except OSError:
    pass

3. Unsafe Stdin Write Operation

Location: executor.py:114-124

Writing to stdin is not protected and could fail or hang:

  • Process might exit before stdin write completes
  • No timeout on the write operation
  • Errors are silently caught in outer try/except

Recommendation:

try:
    if process.stdin and process.poll() is None:
        process.stdin.write(event_json.encode())
        process.stdin.flush()  # Ensure data is sent
        process.stdin.close()
except (BrokenPipeError, OSError) as e:
    logger.warning(f"Failed to write to async hook stdin: {e}")

4. No Observability for Async Hooks

Location: executor.py:120-123

Redirecting stdout/stderr to DEVNULL makes debugging async hooks very difficult. Users have no way to know if their hooks succeeded or failed.

Suggestions:

  1. Add a debug mode that logs to files: stdout=open(f"/tmp/hook-{pid}.out", "w")
  2. Document that users should handle their own logging within hooks
  3. At minimum, document this limitation in the PR description

🟡 Suggestions

5. Silent Error Handling

Location: executor.py:67, executor.py:78

Silent except OSError: pass makes debugging difficult.

Fix:

except OSError as e:
    logger.debug(f"Process already terminated: {e}")

6. Missing Docstrings

Location: executor.py:43-79 (AsyncProcessManager)

Consider adding docstrings to explain the lifecycle management:

class AsyncProcessManager:
    """Manages background hook processes for cleanup.
    
    Tracks async hook processes and ensures they are terminated when they
    exceed their timeout or when the session ends.
    """
    
    def add_process(self, process: subprocess.Popen, timeout: int) -> None:
        """Track a background process for cleanup.
        
        Args:
            process: The subprocess to track
            timeout: Maximum runtime in seconds before termination
        """

✅ What's Good

  • Clean API design with the async field and proper aliasing
  • Comprehensive test coverage
  • Warning for async PreToolUse hooks (good UX)
  • Proper cleanup on session end
  • Backward compatible (defaults to False)

Recommended Changes Priority

  1. Must fix: Add periodic cleanup_expired() calls (#1)
  2. Must fix: Fix zombie process issue with wait() (#2)
  3. Should fix: Protect stdin write operation (#3)
  4. Should document: Explain observability limitations (#4)
  5. Nice to have: Add logging to error handlers (#5)

- Fix zombie process issue with proper wait() after terminate
- Add _terminate_process helper with graceful termination + force kill
- Protect stdin write with BrokenPipeError handling
- Add periodic cleanup_expired() call before starting new async hooks
- Improve docstrings with Args documentation
- Add debug logging for process lifecycle events
@ixchio ixchio requested a review from all-hands-bot January 29, 2026 04:09
Pyright doesn't recognize async_ as a constructor param when using
Field(alias='async'). Use model_validate with dict containing 'async'
key instead, which is the proper way to instantiate with aliased fields.
@enyst
Copy link
Collaborator

enyst commented Jan 29, 2026

I'm going to ask my little OH for an opinion. Please don't mind it entirely, sometimes it's right, sometimes it's wrong. It has a "Linus-style prompt" it obeys, and that might make it a bit weird!

@OpenHands Do a /codereview-roasted on this PR.

Post your feedback as comment on github directly. Note that it will be rendered as markdown.

@smolpaws
Copy link
Contributor

🟡 Taste Rating: Acceptable

Linus-Style Analysis

[CRITICAL ISSUES]

  • [openhands-sdk/openhands/sdk/hooks/executor.py, Lines 131–170] Timeout semantics are toothless for async hooks. You only call cleanup_expired() when starting another hook. If a session goes idle after launching an async hook, that “timeout” is never enforced. That’s not a timeout; it’s wishful thinking. You need a real timeout enforcement strategy (periodic cleanup, or schedule a per-process timer).
  • [openhands-sdk/openhands/sdk/hooks/executor.py, Lines 137–156] You leak process trees. Popen(..., shell=True) spawns a shell, and you only terminate the parent PID. Any child process can keep running forever. If you’re going to support timeouts and cleanup, you must kill the process group (start new session and killpg), otherwise your “cleanup” is theater.

[IMPROVEMENT OPPORTUNITIES]

  • [openhands-sdk/openhands/sdk/hooks/executor.py, Lines 134–166] Async execution path silently drops stdout/stderr. That’s fine for “fire and forget”, but you still return success=True even when Popen succeeds but the command immediately exits with failure. At least document that async hooks are “best effort” and errors are not observable. Otherwise, users will assume success=True means the hook actually worked.
  • [openhands-sdk/openhands/sdk/hooks/manager.py, Lines 59–77] PreToolUse + async is inherently contradictory. You log a warning but still run them. If it can’t block, why allow it at all in a blocking stage? Either reject these configs or run them under PostToolUse semantics to avoid confusing users who think they’re enforcing policy.
  • [tests/sdk/hooks/test_executor.py, Lines 166–334] Tests are poking private state. Reaching into _processes makes tests brittle and locks you into internals. Expose minimal introspection or assert on observable behavior (e.g., process termination) instead of internal lists.

[STYLE NOTES]

  • [openhands-sdk/openhands/sdk/hooks/config.py, Lines 52–56] async_ aliasing is fine, but you didn’t normalize output fields. You now serialize async_ unless by_alias is manually passed. That’s a recipe for inconsistent configs. Either always by_alias=True in config dumps or override model_dump to default to aliasing for this field.

VERDICT:
Needs rework — you’re calling it async with timeouts, but the timeouts don’t actually fire unless another hook runs, and cleanup doesn’t kill child processes. Fix those two and it’s in decent shape.

KEY INSIGHT:
You added an async escape hatch, but you didn’t add a lifecycle manager — without real timeout enforcement and process-group cleanup, you’ve just moved the blocking to “later and maybe never.”

@openhands-ai
Copy link

openhands-ai bot commented Jan 29, 2026

Final summary:

  • Completed request: ✅ Posted a roasted code review comment on PR feat(hooks): add async hook execution support #1849 using the smolpaws token (markdown-rendered).
  • Code changes: ✅ None made (no files modified, no commits).
  • Conciseness: ✅ No extraneous changes to revert.
  • Push required: ❌ Not applicable (no code changes).

PR comment link: #1849 (comment)

View full conversation

@OpenHands OpenHands deleted a comment from openhands-ai bot Jan 29, 2026
- Use process groups (start_new_session) to kill child processes properly
- Add periodic cleanup_expired() call in execute_all()
- Fix pyright error with pgid possibly unbound
@ixchio
Copy link
Contributor Author

ixchio commented Jan 29, 2026

Hey! Addressed the review feedback 🙌

changes:

  • Switched to process groups (start_new_session=True + os.killpg) so we properly kill the entire process tree on cleanup, not just the parent shell
  • Added periodic cleanup_expired() call in execute_all() for more frequent timeout enforcement

Tests are passing, ready for another look when you get a chance!

@enyst
Copy link
Collaborator

enyst commented Feb 1, 2026

Could you please post screenshots or logs, to show how it works?

Copy link
Contributor Author

ixchio commented Feb 2, 2026

Hi @enyst! Here's a demonstration showing how async hooks work:

Demo Output

======================================================================
 ASYNC HOOKS DEMONSTRATION
 Shows fire-and-forget background execution for hooks
======================================================================

======================================================================
DEMO 1: Sync vs Async Hook Timing
======================================================================

[Sync Hook] Running 'sleep 2' synchronously...
[Sync Hook] Completed in 2.00s
[Sync Hook] Result: success=True, async_started=False
[Sync Hook] Stdout: 'sync done'

[Async Hook] Running 'sleep 2' asynchronously...
[Async Hook] Returned in 0.0006s (fire-and-forget)
[Async Hook] Result: success=True, async_started=True
[Async Hook] Note: stdout is empty for async hooks: ''

[Tracker] Tracked background processes: 1

======================================================================
DEMO 2: Async Hooks Receive Event Data via stdin
======================================================================

[Async Hook] Starting hook that captures stdin to file...
[Async Hook] Started: async_started=True
[Async Hook] Waiting for background process...

[Captured Event Data]
{
  "event_type": "PostToolUse",
  "tool_name": "TerminalTool",
  "tool_input": {
    "command": "git status"
  },
  "tool_response": {
    "output": "On branch main\n..."
  },
  "message": null,
  "session_id": "demo-session-123",
  "working_dir": null,
  "metadata": {}
}

======================================================================
DEMO 3: Process Lifecycle and Cleanup
======================================================================

[Starting] Launching 3 background hooks...
  Hook 0: async_started=True
  Hook 1: async_started=True
  Hook 2: async_started=True

[Tracked] Active processes: 3
  PID 811: running=True, timeout=2s
  PID 812: running=True, timeout=2s
  PID 814: running=True, timeout=2s

[Waiting] Waiting 3 seconds for timeouts to expire...

[Cleanup] Running cleanup_expired()...
[Tracked] Remaining processes: 0
[Done] All processes terminated.

======================================================================
DEMO 4: HookManager Integration
======================================================================

[HookManager] Running PostToolUse hooks for 'BashTool'...
[HookManager] Completed in 0.002s
  Hook 0: success=True, async_started=False, stdout='Sync logging hook ran'
  Hook 1: success=True, async_started=True, stdout=''

[Background] Tracked processes: 1
[Background] Waiting 1.5s for async hook to complete...
[Background] Analytics marker content: 'Analytics sent'

[SessionEnd] Running session end cleanup...
[SessionEnd] Remaining processes: 0

======================================================================
DEMO 5: PreToolUse Async Hook Warning
======================================================================

[HookManager] Running PreToolUse with async hook...
[Expected] WARNING about async hooks in PreToolUse:
{"levelname": "WARNING", "name": "openhands.sdk.hooks.manager", 
 "message": "Async hooks in PreToolUse cannot block tool execution. Found 1 async hook(s) that will run in background."}

[HookManager] should_continue=True
[HookManager] Hook ran but cannot block: async_started=True

======================================================================
 DEMONSTRATION COMPLETE
======================================================================

Key Behaviors Demonstrated

Feature Behavior
Sync Hook Blocks for 2.00s, captures stdout
Async Hook Returns in 0.0006s (fire-and-forget)
Event Data Async hooks receive full JSON event via stdin
Process Tracking Background processes tracked for cleanup
Timeout Enforcement cleanup_expired() terminates processes past timeout
Process Group Cleanup Uses start_new_session=True + os.killpg() to kill entire process tree
PreToolUse Warning Logs warning when async hooks can't block tool execution

Example Configuration

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "./analytics.sh",
        "async": true,
        "timeout": 30
      }]
    }]
  }
}

All 75 hook tests pass ✅

@all-hands-bot
Copy link
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @ixchio, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

1 similar comment
@all-hands-bot
Copy link
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @ixchio, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

@neubig neubig self-requested a review February 12, 2026 00:18
Copy link
Contributor

neubig commented Feb 12, 2026

[automated message] @neubig assigned for review according to git blame

@all-hands-bot
Copy link
Collaborator

[Automatic Post]: This PR seems to be currently waiting for review. @all-hands-bot, @neubig, could you please take a look when you have a chance?

@enyst enyst requested review from all-hands-bot and removed request for all-hands-bot February 17, 2026 12:52
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 LGTM - Solid implementation

Taste Rating: Good taste - clean solution that solves a real problem.

What works well:

  • ✅ Process lifecycle management is technically correct (process groups + proper cleanup)
  • ✅ Tests verify real behavior (file creation, process tracking, stdin delivery) not just mocks
  • ✅ Previous review concern addressed: cleanup_expired() now called periodically before new hooks and in execute_all()
  • ✅ No breaking changes (async defaults to false)
  • ✅ Good defensive coding with exception handling

Key insight: The data structure is simple (list of process tuples) and the cleanup logic is straightforward. Process group handling with start_new_session=True + os.killpg() correctly handles shell=True subprocesses.

Nice work! 🚀

@neubig
Copy link
Contributor

neubig commented Feb 19, 2026

@OpenHands run the plugin loading example with the async hook, and then examine the resulting agent trajectory. tell me if it seems that the hook ran async. you could confirm this by noting in the agent trajectory that the agent continued being functional while the hook ran, and then the hook result was provided to the agent at a later time. If you are not able to confirm it with the current example, you can either modify the current example so it is more obvious, or create a new example specifically demonstrating async hooks. examples/05_skills_and_plugins/02_loading_plugins/example_plugins/code-quality/hooks/hooks.json

@openhands-ai
Copy link

openhands-ai bot commented Feb 19, 2026

I'm on it! neubig can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Feb 19, 2026

Summary

I investigated whether async hooks work correctly in PR #1849 by running the plugin loading example with async hook configuration.

Key Findings

Yes, async hooks are working correctly. Here's the evidence:

  1. Agent continued processing without blocking: With 3 tool uses and 3 LLM calls (~3s each), total execution was 14.41s. If 2-second hooks had blocked synchronously, minimum time would have been ~15-18s (9s LLM + 6s hooks). The actual time shows hooks didn't fully block.

  2. Hook result shows async_started: true: The executor returns immediately after spawning the background process.

  3. Hook log confirms execution: All hooks started, ran their 2-second sleep, and completed in the background while the agent continued.

Why Timestamps Appear Sequential

The hook timestamps don't show dramatic overlap because LLM latency (~2-3s) is similar to the hook sleep time (2s). By the time each subsequent tool call occurs, the previous hook has often finished. This is expected - async means "fire-and-forget, don't block," not necessarily "run simultaneously."

Changes Made (Local Only - Not Pushed)

I modified the example files temporarily to add timing analysis and hook log examination for testing purposes. However, I should NOT push these changes because:

  1. They significantly change the example's purpose from "plugin loading" to "async hooks demonstration"
  2. The modifications were for investigation/analysis, not intended as PR improvements
  3. The original example should remain focused on its documented purpose

Recommendation

The async hooks feature in PR #1849 appears to be working correctly. The implementation properly:

  • Returns immediately with async_started: true
  • Spawns background processes
  • Tracks processes for cleanup on session end/timeout

@neubig
Copy link
Contributor

neubig commented Feb 19, 2026

Looks great, thank you!

@neubig neubig merged commit 050991d into OpenHands:main Feb 19, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for async hooks (background execution)

5 participants