Skip to content

Comments

Add guidance for waiting on background processes to system prompt#1694

Closed
xingyaoww wants to merge 10 commits intomainfrom
openhands/wait-for-background-process-guidance
Closed

Add guidance for waiting on background processes to system prompt#1694
xingyaoww wants to merge 10 commits intomainfrom
openhands/wait-for-background-process-guidance

Conversation

@xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 12, 2026

Summary

This PR adds guidance to the system prompt's PROCESS_MANAGEMENT section explaining how agents can wait for background processes to finish using tail --pid.

Fixes #619

The new guidance teaches the agent to:

  1. Start a background process with output redirection: command > output.log 2>&1 &
  2. Capture the PID: PID=$!
  3. Wait and monitor using: tail --pid=$PID -f output.log

This pattern allows the agent to monitor long-running background tasks and automatically continue when they complete, which is useful for tasks like installations, builds, or tests that may take a while.

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@xingyaoww can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a2a20e4-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a2a20e4-python \
  ghcr.io/openhands/agent-server:a2a20e4-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a2a20e4-golang-amd64
ghcr.io/openhands/agent-server:a2a20e4-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a2a20e4-golang-arm64
ghcr.io/openhands/agent-server:a2a20e4-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a2a20e4-java-amd64
ghcr.io/openhands/agent-server:a2a20e4-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a2a20e4-java-arm64
ghcr.io/openhands/agent-server:a2a20e4-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a2a20e4-python-amd64
ghcr.io/openhands/agent-server:a2a20e4-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:a2a20e4-python-arm64
ghcr.io/openhands/agent-server:a2a20e4-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:a2a20e4-golang
ghcr.io/openhands/agent-server:a2a20e4-java
ghcr.io/openhands/agent-server:a2a20e4-python

About Multi-Architecture Support

  • Each variant tag (e.g., a2a20e4-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., a2a20e4-python-amd64) are also available if needed

Add instructions to the PROCESS_MANAGEMENT section explaining how to wait
for background processes to finish using tail --pid. This allows the agent
to monitor long-running background tasks and automatically continue when
they complete.

The pattern is:
1. Start background process: command > output.log 2>&1 &
2. Capture PID: PID=$!
3. Wait and monitor: tail --pid=$PID -f output.log

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 12, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL15118443370% 
report-only-changed-files is enabled. No files were changed during this commit :)

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds helpful guidance for waiting on background processes, but is missing important information about checking exit codes. See inline comment for details.

openhands-agent and others added 2 commits January 12, 2026 15:39
…ocess waiting

- Update system prompt to include guidance on checking exit status with 'wait $PID' after tail exits
- Add integration test t10_wait_for_background_process.py to verify agent can wait for background processes

Co-authored-by: openhands <openhands@all-hands.dev>
Updated the guidance to use nohup when starting background processes,
which protects them from being terminated if the terminal breaks or resets.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Collaborator Author

Good point! Since the terminal can break/reset in rare cases, using nohup provides important protection for background processes. I've updated the guidance to include nohup in the command pattern:

nohup command > output.log 2>&1 &
PID=$!
tail --pid=$PID -f output.log
wait $PID

This ensures the background process continues running even if the terminal session is disrupted.

@xingyaoww xingyaoww added the integration-test Runs the integration tests and comments the results label Jan 12, 2026 — with OpenHands AI
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

🧪 Integration Tests Results

Overall Success Rate: 96.4%
Total Cost: $2.30
Models Tested: 6
Timestamp: 2026-01-12 17:17:38 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Integration (Required) Behavior (Optional) Tests Passed Skipped Total Cost Tokens
litellm_proxy_moonshot_kimi_k2_thinking 100.0% 100.0% N/A 9/9 1 10 $0.36 552,596
litellm_proxy_mistral_devstral_2512 88.9% 88.9% N/A 8/9 1 10 $0.23 545,958
litellm_proxy_deepseek_deepseek_chat 100.0% 100.0% N/A 9/9 1 10 $0.08 728,536
litellm_proxy_vertex_ai_gemini_3_pro_preview 100.0% 100.0% N/A 10/10 0 10 $0.62 434,003
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 100.0% N/A 10/10 0 10 $0.81 659,013
litellm_proxy_gpt_5.1_codex_max 88.9% 88.9% N/A 8/9 1 10 $0.21 241,027

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2_thinking

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/10)
  • Total Cost: $0.36
  • Token Usage: prompt: 537,335, completion: 15,261, cache_read: 455,936
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_64e28fb_kimi_k2_run_N10_20260112_170609
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_mistral_devstral_2512

  • Overall Success Rate: 88.9% (8/9)
  • Integration Tests (Required): 88.9% (8/10)
  • Total Cost: $0.23
  • Token Usage: prompt: 540,939, completion: 5,019
  • Run Suffix: litellm_proxy_mistral_devstral_2512_64e28fb_devstral_2512_run_N10_20260112_170609
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.0092)

litellm_proxy_deepseek_deepseek_chat

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/10)
  • Total Cost: $0.08
  • Token Usage: prompt: 714,555, completion: 13,981, cache_read: 657,088
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_64e28fb_deepseek_run_N10_20260112_170613
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Overall Success Rate: 100.0% (10/10)
  • Integration Tests (Required): 100.0% (10/10)
  • Total Cost: $0.62
  • Token Usage: prompt: 411,843, completion: 22,160, cache_read: 258,652, reasoning: 16,357
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_64e28fb_gemini_3_pro_run_N10_20260112_170612

litellm_proxy_claude_sonnet_4_5_20250929

  • Overall Success Rate: 100.0% (10/10)
  • Integration Tests (Required): 100.0% (10/10)
  • Total Cost: $0.81
  • Token Usage: prompt: 644,838, completion: 14,175, cache_read: 528,014, cache_write: 115,927, reasoning: 3,749
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_64e28fb_sonnet_run_N10_20260112_170609

litellm_proxy_gpt_5.1_codex_max

  • Overall Success Rate: 88.9% (8/9)
  • Integration Tests (Required): 88.9% (8/10)
  • Total Cost: $0.21
  • Token Usage: prompt: 234,719, completion: 6,308, cache_read: 133,888, reasoning: 3,584
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_64e28fb_gpt51_codex_run_N10_20260112_170622
  • Skipped Tests: 1

Skipped Tests:

  • t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

Failed Tests:

  • t06_github_pr_browsing ⚠️ REQUIRED: Agent's final answer does not contain the expected information about the PR content. Final answer preview: I don’t have network access to view that GitHub PR directly. If you can share the relevant details (e.g., the PR description, discussion, or specific comments), I can summarize what’s happening and wh... (Cost: $0.0059)

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww xingyaoww added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Jan 13, 2026
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

🧪 Condenser Tests Results

Overall Success Rate: 92.2%
Total Cost: $1.57
Models Tested: 6
Timestamp: 2026-01-13 17:34:08 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_gpt_5.1_codex_max 88.9% 8/9 0 9 $0.27 275,010
litellm_proxy_mistral_devstral_2512 75.0% 6/8 1 9 $0.14 331,276
litellm_proxy_vertex_ai_gemini_3_pro_preview 100.0% 9/9 0 9 $0.45 315,852
litellm_proxy_deepseek_deepseek_chat 87.5% 7/8 1 9 $0.07 605,469
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 9/9 0 9 $0.50 300,766
litellm_proxy_moonshot_kimi_k2_thinking 100.0% 8/8 1 9 $0.14 216,861

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

  • Success Rate: 88.9% (8/9)
  • Total Cost: $0.27
  • Token Usage: prompt: 269,524, completion: 5,486, cache_read: 106,752, reasoning: 2,240
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_45f960d_gpt51_codex_run_N9_20260113_172720

Failed Tests:

  • t10_wait_for_background_process: Agent did not demonstrate the required nohup/PID/wait pattern. Judge reasoning: The agent successfully accomplished the user's functional goal - running long_task.sh in the background, redirecting output to output.log, waiting for completion, and reporting the successful result. However, the agent failed to meet the explicit evaluation criteria requirement Setup Makefile, Pre-commit, and initial Tool and Schema definition #1: using nohup to start the process in the background. Instead, the agent used a simpler pattern with bash redirection and ampersand (bash long_task.sh > output.log 2>&1 &). While this approach worked and the agent correctly implemented criteria Port over file editor tool #2 (PID capture via $!) and Add coverage report for tests #3 (wait builtin), the missing nohup violates the stated requirements. The nohup utility is specifically important for background processes as it prevents terminal hangups from terminating the background job, making it a best practice the evaluation explicitly required. (confidence=0.95) (Cost: $0.03)

litellm_proxy_mistral_devstral_2512

  • Success Rate: 75.0% (6/8)
  • Total Cost: $0.14
  • Token Usage: prompt: 326,586, completion: 4,690
  • Run Suffix: litellm_proxy_mistral_devstral_2512_45f960d_devstral_2512_run_N9_20260113_172751
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t02_add_bash_hello: Shell script is not executable (Cost: $0.0092)
  • t10_wait_for_background_process: Agent did not demonstrate the required nohup/PID/wait pattern. Judge reasoning: The agent successfully executed most of the user's request and achieved the desired outcome (running the script, waiting for completion, and reporting results). However, it failed to follow the explicit evaluation criteria requirement to capture the PID using the $! shell variable. Instead, it obtained the PID from the shell's job notification output [1] 6580 and used that with the wait command. While functionally this worked in this case, the evaluation criteria specifically requires the use of $! to capture the PID, which is the proper shell scripting best practice and the mechanism that would work reliably in various shell contexts. The agent's approach was semi-correct but did not strictly adhere to the required background-process waiting pattern as specified. (confidence=0.85) (Cost: $0.02)

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Success Rate: 100.0% (9/9)
  • Total Cost: $0.45
  • Token Usage: prompt: 305,042, completion: 10,810, cache_read: 159,498, reasoning: 6,048
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_45f960d_gemini_3_pro_run_N9_20260113_172812

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 87.5% (7/8)
  • Total Cost: $0.07
  • Token Usage: prompt: 593,727, completion: 11,742, cache_read: 545,344
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_45f960d_deepseek_run_N9_20260113_172735
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t10_wait_for_background_process: Agent did not demonstrate the required nohup/PID/wait pattern. Judge reasoning: While the agent successfully completed the overall task and correctly ran the script in the background with nohup and output redirection, it failed to follow the required pattern for capturing the PID. The evaluation criteria explicitly requires using $! to capture the PID of the background process. Instead, the agent manually extracted the PID (5341) from the terminal output and then used it in the wait command. While this happened to work in this case, it violates the required pattern. The agent should have used either a chained command like 'long_task.sh &gt; output.log 2&gt;&amp;1 &amp; wait $!' or captured it as 'PID=$!' immediately after launching the background process. The agent did successfully wait for process completion and correctly reported the results, but the deviation from the required PID capture method is a material failure against the stated evaluation criteria. (confidence=0.95) (Cost: $0.01)

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (9/9)
  • Total Cost: $0.50
  • Token Usage: prompt: 291,372, completion: 9,394, cache_read: 207,516, cache_write: 80,875, reasoning: 2,410
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_45f960d_sonnet_run_N9_20260113_172740

litellm_proxy_moonshot_kimi_k2_thinking

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.14
  • Token Usage: prompt: 210,654, completion: 6,207, cache_read: 156,160
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_45f960d_kimi_k2_run_N9_20260113_172741
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww xingyaoww marked this pull request as draft January 13, 2026 17:40
@openhands-ai
Copy link

openhands-ai bot commented Jan 13, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1694 at branch `openhands/wait-for-background-process-guidance`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@xingyaoww xingyaoww closed this Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow agent to put itself into sleep and allow itself to get wake up by particular event

3 participants