feat(hooks): add async hook execution support#1849
Conversation
Implements background hook execution for non-blocking operations like logging, notifications, and analytics. Async hooks fire-and-forget without blocking the agent's execution flow. Changes: - Add async field to HookDefinition config - Implement fire-and-forget execution via subprocess.Popen - Add AsyncProcessManager for background process lifecycle - Handle cleanup on session end and timeout expiration - Add warning for async PreToolUse hooks (cannot block) - Comprehensive test coverage for new functionality Closes OpenHands#1828
all-hands-bot
left a comment
There was a problem hiding this comment.
Summary
This PR implements async hook execution well overall with good test coverage. However, there are several important issues around process lifecycle management and observability that should be addressed before merging.
🟠 Important Issues
1. No Periodic Cleanup of Expired Processes
Location: executor.py:60 (AsyncProcessManager.cleanup_expired)
cleanup_expired() is never called during execution - only cleanup_all() runs at session end. This means async hooks that exceed their timeout will keep running until the session ends.
Recommendation: Call cleanup_expired() periodically:
- In
execute()before starting new async processes - In
execute_all()between hook executions - In
run_post_tool_use()and other hook trigger points
2. Zombie Process Risk
Location: executor.py:65, executor.py:76
Using terminate() without wait() can create zombie processes. The process becomes a zombie until the parent calls wait().
Fix:
try:
process.terminate()
process.wait(timeout=1) # Wait for graceful termination
except subprocess.TimeoutExpired:
process.kill() # Force kill if it doesn't terminate
process.wait()
except OSError:
pass3. Unsafe Stdin Write Operation
Location: executor.py:114-124
Writing to stdin is not protected and could fail or hang:
- Process might exit before stdin write completes
- No timeout on the write operation
- Errors are silently caught in outer try/except
Recommendation:
try:
if process.stdin and process.poll() is None:
process.stdin.write(event_json.encode())
process.stdin.flush() # Ensure data is sent
process.stdin.close()
except (BrokenPipeError, OSError) as e:
logger.warning(f"Failed to write to async hook stdin: {e}")4. No Observability for Async Hooks
Location: executor.py:120-123
Redirecting stdout/stderr to DEVNULL makes debugging async hooks very difficult. Users have no way to know if their hooks succeeded or failed.
Suggestions:
- Add a debug mode that logs to files:
stdout=open(f"/tmp/hook-{pid}.out", "w") - Document that users should handle their own logging within hooks
- At minimum, document this limitation in the PR description
🟡 Suggestions
5. Silent Error Handling
Location: executor.py:67, executor.py:78
Silent except OSError: pass makes debugging difficult.
Fix:
except OSError as e:
logger.debug(f"Process already terminated: {e}")6. Missing Docstrings
Location: executor.py:43-79 (AsyncProcessManager)
Consider adding docstrings to explain the lifecycle management:
class AsyncProcessManager:
"""Manages background hook processes for cleanup.
Tracks async hook processes and ensures they are terminated when they
exceed their timeout or when the session ends.
"""
def add_process(self, process: subprocess.Popen, timeout: int) -> None:
"""Track a background process for cleanup.
Args:
process: The subprocess to track
timeout: Maximum runtime in seconds before termination
"""✅ What's Good
- Clean API design with the
asyncfield and proper aliasing - Comprehensive test coverage
- Warning for async PreToolUse hooks (good UX)
- Proper cleanup on session end
- Backward compatible (defaults to False)
Recommended Changes Priority
- Fix zombie process issue with proper wait() after terminate - Add _terminate_process helper with graceful termination + force kill - Protect stdin write with BrokenPipeError handling - Add periodic cleanup_expired() call before starting new async hooks - Improve docstrings with Args documentation - Add debug logging for process lifecycle events
Pyright doesn't recognize async_ as a constructor param when using Field(alias='async'). Use model_validate with dict containing 'async' key instead, which is the proper way to instantiate with aliased fields.
|
I'm going to ask my little OH for an opinion. Please don't mind it entirely, sometimes it's right, sometimes it's wrong. It has a "Linus-style prompt" it obeys, and that might make it a bit weird! @OpenHands Do a /codereview-roasted on this PR. Post your feedback as comment on github directly. Note that it will be rendered as markdown. |
|
🟡 Taste Rating: Acceptable Linus-Style Analysis [CRITICAL ISSUES]
[IMPROVEMENT OPPORTUNITIES]
[STYLE NOTES]
VERDICT: KEY INSIGHT: |
|
Final summary:
PR comment link: #1849 (comment) |
- Use process groups (start_new_session) to kill child processes properly - Add periodic cleanup_expired() call in execute_all() - Fix pyright error with pgid possibly unbound
|
Hey! Addressed the review feedback 🙌 changes:
Tests are passing, ready for another look when you get a chance! |
|
Could you please post screenshots or logs, to show how it works? |
|
Hi @enyst! Here's a demonstration showing how async hooks work: Demo OutputKey Behaviors Demonstrated
Example Configuration{
"hooks": {
"PostToolUse": [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": "./analytics.sh",
"async": true,
"timeout": 30
}]
}]
}
}All 75 hook tests pass ✅ |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @ixchio, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
1 similar comment
|
[Automatic Post]: It has been a while since there was any activity on this PR. @ixchio, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
[automated message] @neubig assigned for review according to git blame |
|
[Automatic Post]: This PR seems to be currently waiting for review. @all-hands-bot, @neubig, could you please take a look when you have a chance? |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 LGTM - Solid implementation
Taste Rating: Good taste - clean solution that solves a real problem.
What works well:
- ✅ Process lifecycle management is technically correct (process groups + proper cleanup)
- ✅ Tests verify real behavior (file creation, process tracking, stdin delivery) not just mocks
- ✅ Previous review concern addressed:
cleanup_expired()now called periodically before new hooks and inexecute_all() - ✅ No breaking changes (async defaults to false)
- ✅ Good defensive coding with exception handling
Key insight: The data structure is simple (list of process tuples) and the cleanup logic is straightforward. Process group handling with start_new_session=True + os.killpg() correctly handles shell=True subprocesses.
Nice work! 🚀
|
@OpenHands run the plugin loading example with the async hook, and then examine the resulting agent trajectory. tell me if it seems that the hook ran async. you could confirm this by noting in the agent trajectory that the agent continued being functional while the hook ran, and then the hook result was provided to the agent at a later time. If you are not able to confirm it with the current example, you can either modify the current example so it is more obvious, or create a new example specifically demonstrating async hooks. examples/05_skills_and_plugins/02_loading_plugins/example_plugins/code-quality/hooks/hooks.json |
|
I'm on it! neubig can track my progress at all-hands.dev |
SummaryI investigated whether async hooks work correctly in PR #1849 by running the plugin loading example with async hook configuration. Key FindingsYes, async hooks are working correctly. Here's the evidence:
Why Timestamps Appear SequentialThe hook timestamps don't show dramatic overlap because LLM latency (~2-3s) is similar to the hook sleep time (2s). By the time each subsequent tool call occurs, the previous hook has often finished. This is expected - async means "fire-and-forget, don't block," not necessarily "run simultaneously." Changes Made (Local Only - Not Pushed)I modified the example files temporarily to add timing analysis and hook log examination for testing purposes. However, I should NOT push these changes because:
RecommendationThe async hooks feature in PR #1849 appears to be working correctly. The implementation properly:
|
|
Looks great, thank you! |
Summary
Implements async (background) hook execution as proposed in #1828. Hooks marked with
async: truenow fire-and-forget without blocking the agent's execution.Changes
config.py: Addedasyncfield toHookDefinitionwith alias handling for Python's reserved keywordexecutor.py: Implemented fire-and-forget execution path usingsubprocess.Popen+AsyncProcessManagerfor lifecycle trackingmanager.py: Added warning for async PreToolUse hooks (can't block), cleanup on session endUsage
{ "hooks": { "PostToolUse": [{ "matcher": "Bash", "hooks": [{ "type": "command", "command": "./analytics.sh", "async": true, "timeout": 30 }] }] } }Notes
async_started: truein the resultCloses #1828