Add guidance for waiting on background processes to system prompt by xingyaoww · Pull Request #1694 · OpenHands/software-agent-sdk

xingyaoww · 2026-01-12T15:33:15Z

Summary

This PR adds guidance to the system prompt's PROCESS_MANAGEMENT section explaining how agents can wait for background processes to finish using tail --pid.

Fixes #619

The new guidance teaches the agent to:

Start a background process with output redirection: command > output.log 2>&1 &
Capture the PID: PID=$!
Wait and monitor using: tail --pid=$PID -f output.log

This pattern allows the agent to monitor long-running background tasks and automatically continue when they complete, which is useful for tasks like installations, builds, or tests that may take a while.

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

@xingyaoww can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a2a20e4-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a2a20e4-python \
  ghcr.io/openhands/agent-server:a2a20e4-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a2a20e4-golang-amd64
ghcr.io/openhands/agent-server:a2a20e4-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a2a20e4-golang-arm64
ghcr.io/openhands/agent-server:a2a20e4-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a2a20e4-java-amd64
ghcr.io/openhands/agent-server:a2a20e4-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a2a20e4-java-arm64
ghcr.io/openhands/agent-server:a2a20e4-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a2a20e4-python-amd64
ghcr.io/openhands/agent-server:a2a20e4-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:a2a20e4-python-arm64
ghcr.io/openhands/agent-server:a2a20e4-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:a2a20e4-golang
ghcr.io/openhands/agent-server:a2a20e4-java
ghcr.io/openhands/agent-server:a2a20e4-python

About Multi-Architecture Support

Each variant tag (e.g., a2a20e4-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., a2a20e4-python-amd64) are also available if needed

Add instructions to the PROCESS_MANAGEMENT section explaining how to wait for background processes to finish using tail --pid. This allows the agent to monitor long-running background tasks and automatically continue when they complete. The pattern is: 1. Start background process: command > output.log 2>&1 & 2. Capture PID: PID=$! 3. Wait and monitor: tail --pid=$PID -f output.log Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-01-12T15:35:05Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	15118	4433	70%

report-only-changed-files is enabled. No files were changed during this commit :)

all-hands-bot

The PR adds helpful guidance for waiting on background processes, but is missing important information about checking exit codes. See inline comment for details.

openhands-sdk/openhands/sdk/agent/prompts/system_prompt.j2

…ocess waiting - Update system prompt to include guidance on checking exit status with 'wait $PID' after tail exits - Add integration test t10_wait_for_background_process.py to verify agent can wait for background processes Co-authored-by: openhands <openhands@all-hands.dev>

openhands-sdk/openhands/sdk/agent/prompts/system_prompt.j2

Updated the guidance to use nohup when starting background processes, which protects them from being terminated if the terminal breaks or resets. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-01-12T17:00:47Z

Good point! Since the terminal can break/reset in rare cases, using nohup provides important protection for background processes. I've updated the guidance to include nohup in the command pattern:

nohup command > output.log 2>&1 &
PID=$!
tail --pid=$PID -f output.log
wait $PID

This ensures the background process continues running even if the terminal session is disrupted.

github-actions · 2026-01-12T17:05:30Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2026-01-12T17:17:45Z

🧪 Integration Tests Results

Overall Success Rate: 96.4%
Total Cost: $2.30
Models Tested: 6
Timestamp: 2026-01-12 17:17:38 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs
litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs

📊 Summary

Model	Overall	Integration (Required)	Behavior (Optional)	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_moonshot_kimi_k2_thinking	100.0%	100.0%	N/A	9/9	1	10	$0.36	552,596
litellm_proxy_mistral_devstral_2512	88.9%	88.9%	N/A	8/9	1	10	$0.23	545,958
litellm_proxy_deepseek_deepseek_chat	100.0%	100.0%	N/A	9/9	1	10	$0.08	728,536
litellm_proxy_vertex_ai_gemini_3_pro_preview	100.0%	100.0%	N/A	10/10	0	10	$0.62	434,003
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	100.0%	N/A	10/10	0	10	$0.81	659,013
litellm_proxy_gpt_5.1_codex_max	88.9%	88.9%	N/A	8/9	1	10	$0.21	241,027

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2_thinking

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/10)
Total Cost: $0.36
Token Usage: prompt: 537,335, completion: 15,261, cache_read: 455,936
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_64e28fb_kimi_k2_run_N10_20260112_170609
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_mistral_devstral_2512

Overall Success Rate: 88.9% (8/9)
Integration Tests (Required): 88.9% (8/10)
Total Cost: $0.23
Token Usage: prompt: 540,939, completion: 5,019
Run Suffix: litellm_proxy_mistral_devstral_2512_64e28fb_devstral_2512_run_N10_20260112_170609
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.0092)

litellm_proxy_deepseek_deepseek_chat

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/10)
Total Cost: $0.08
Token Usage: prompt: 714,555, completion: 13,981, cache_read: 657,088
Run Suffix: litellm_proxy_deepseek_deepseek_chat_64e28fb_deepseek_run_N10_20260112_170613
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_vertex_ai_gemini_3_pro_preview

Overall Success Rate: 100.0% (10/10)
Integration Tests (Required): 100.0% (10/10)
Total Cost: $0.62
Token Usage: prompt: 411,843, completion: 22,160, cache_read: 258,652, reasoning: 16,357
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_64e28fb_gemini_3_pro_run_N10_20260112_170612

litellm_proxy_claude_sonnet_4_5_20250929

Overall Success Rate: 100.0% (10/10)
Integration Tests (Required): 100.0% (10/10)
Total Cost: $0.81
Token Usage: prompt: 644,838, completion: 14,175, cache_read: 528,014, cache_write: 115,927, reasoning: 3,749
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_64e28fb_sonnet_run_N10_20260112_170609

litellm_proxy_gpt_5.1_codex_max

Overall Success Rate: 88.9% (8/9)
Integration Tests (Required): 88.9% (8/10)
Total Cost: $0.21
Token Usage: prompt: 234,719, completion: 6,308, cache_read: 133,888, reasoning: 3,584
Run Suffix: litellm_proxy_gpt_5.1_codex_max_64e28fb_gpt51_codex_run_N10_20260112_170622
Skipped Tests: 1

Skipped Tests:

t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

Failed Tests:

t06_github_pr_browsing ⚠️ REQUIRED: Agent's final answer does not contain the expected information about the PR content. Final answer preview: I don’t have network access to view that GitHub PR directly. If you can share the relevant details (e.g., the PR description, discussion, or specific comments), I can summarize what’s happening and wh... (Cost: $0.0059)

tests/integration/tests/t10_wait_for_background_process.py

Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-01-13T17:26:50Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2026-01-13T17:34:15Z

🧪 Condenser Tests Results

Overall Success Rate: 92.2%
Total Cost: $1.57
Models Tested: 6
Timestamp: 2026-01-13 17:34:08 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs
litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs
litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs

📊 Summary

Model	Overall	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_gpt_5.1_codex_max	88.9%	8/9	0	9	$0.27	275,010
litellm_proxy_mistral_devstral_2512	75.0%	6/8	1	9	$0.14	331,276
litellm_proxy_vertex_ai_gemini_3_pro_preview	100.0%	9/9	0	9	$0.45	315,852
litellm_proxy_deepseek_deepseek_chat	87.5%	7/8	1	9	$0.07	605,469
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	9/9	0	9	$0.50	300,766
litellm_proxy_moonshot_kimi_k2_thinking	100.0%	8/8	1	9	$0.14	216,861

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

Success Rate: 88.9% (8/9)
Total Cost: $0.27
Token Usage: prompt: 269,524, completion: 5,486, cache_read: 106,752, reasoning: 2,240
Run Suffix: litellm_proxy_gpt_5.1_codex_max_45f960d_gpt51_codex_run_N9_20260113_172720

Failed Tests:

t10_wait_for_background_process: Agent did not demonstrate the required nohup/PID/wait pattern. Judge reasoning: The agent successfully accomplished the user's functional goal - running long_task.sh in the background, redirecting output to output.log, waiting for completion, and reporting the successful result. However, the agent failed to meet the explicit evaluation criteria requirement Setup Makefile, Pre-commit, and initial Tool and Schema definition #1: using nohup to start the process in the background. Instead, the agent used a simpler pattern with bash redirection and ampersand (bash long_task.sh > output.log 2>&1 &). While this approach worked and the agent correctly implemented criteria Port over file editor tool #2 (PID capture via $!) and Add coverage report for tests #3 (wait builtin), the missing nohup violates the stated requirements. The nohup utility is specifically important for background processes as it prevents terminal hangups from terminating the background job, making it a best practice the evaluation explicitly required. (confidence=0.95) (Cost: $0.03)

litellm_proxy_mistral_devstral_2512

Success Rate: 75.0% (6/8)
Total Cost: $0.14
Token Usage: prompt: 326,586, completion: 4,690
Run Suffix: litellm_proxy_mistral_devstral_2512_45f960d_devstral_2512_run_N9_20260113_172751
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

t02_add_bash_hello: Shell script is not executable (Cost: $0.0092)
t10_wait_for_background_process: Agent did not demonstrate the required nohup/PID/wait pattern. Judge reasoning: The agent successfully executed most of the user's request and achieved the desired outcome (running the script, waiting for completion, and reporting results). However, it failed to follow the explicit evaluation criteria requirement to capture the PID using the $! shell variable. Instead, it obtained the PID from the shell's job notification output [1] 6580 and used that with the wait command. While functionally this worked in this case, the evaluation criteria specifically requires the use of $! to capture the PID, which is the proper shell scripting best practice and the mechanism that would work reliably in various shell contexts. The agent's approach was semi-correct but did not strictly adhere to the required background-process waiting pattern as specified. (confidence=0.85) (Cost: $0.02)

litellm_proxy_vertex_ai_gemini_3_pro_preview

Success Rate: 100.0% (9/9)
Total Cost: $0.45
Token Usage: prompt: 305,042, completion: 10,810, cache_read: 159,498, reasoning: 6,048
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_45f960d_gemini_3_pro_run_N9_20260113_172812

litellm_proxy_deepseek_deepseek_chat

Success Rate: 87.5% (7/8)
Total Cost: $0.07
Token Usage: prompt: 593,727, completion: 11,742, cache_read: 545,344
Run Suffix: litellm_proxy_deepseek_deepseek_chat_45f960d_deepseek_run_N9_20260113_172735
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

t10_wait_for_background_process: Agent did not demonstrate the required nohup/PID/wait pattern. Judge reasoning: While the agent successfully completed the overall task and correctly ran the script in the background with nohup and output redirection, it failed to follow the required pattern for capturing the PID. The evaluation criteria explicitly requires using $! to capture the PID of the background process. Instead, the agent manually extracted the PID (5341) from the terminal output and then used it in the wait command. While this happened to work in this case, it violates the required pattern. The agent should have used either a chained command like 'long_task.sh > output.log 2>&1 & wait $!' or captured it as 'PID=$!' immediately after launching the background process. The agent did successfully wait for process completion and correctly reported the results, but the deviation from the required PID capture method is a material failure against the stated evaluation criteria. (confidence=0.95) (Cost: $0.01)

litellm_proxy_claude_sonnet_4_5_20250929

Success Rate: 100.0% (9/9)
Total Cost: $0.50
Token Usage: prompt: 291,372, completion: 9,394, cache_read: 207,516, cache_write: 80,875, reasoning: 2,410
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_45f960d_sonnet_run_N9_20260113_172740

litellm_proxy_moonshot_kimi_k2_thinking

Success Rate: 100.0% (8/8)
Total Cost: $0.14
Token Usage: prompt: 210,654, completion: 6,207, cache_read: 156,160
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_45f960d_kimi_k2_run_N9_20260113_172741
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-01-13T17:46:07Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1694 at branch `openhands/wait-for-background-process-guidance`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

openhands-ai bot mentioned this pull request Jan 12, 2026

Allow agent to put itself into sleep and allow itself to get wake up by particular event #619

Open

all-hands-bot reviewed Jan 12, 2026

View reviewed changes

openhands-sdk/openhands/sdk/agent/prompts/system_prompt.j2 Outdated Show resolved Hide resolved

openhands-agent and others added 2 commits January 12, 2026 15:39

Merge branch 'main' into openhands/wait-for-background-process-guidance

96a4186

xingyaoww commented Jan 12, 2026

View reviewed changes

openhands-sdk/openhands/sdk/agent/prompts/system_prompt.j2 Outdated Show resolved Hide resolved

Add nohup to protect background processes from terminal resets

64e28fb

Updated the guidance to use nohup when starting background processes, which protects them from being terminated if the terminal breaks or resets. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww added the integration-test Runs the integration tests and comments the results label Jan 12, 2026 — with OpenHands AI

Merge branch 'main' into openhands/wait-for-background-process-guidance

03f90ae

xingyaoww commented Jan 13, 2026

View reviewed changes

tests/integration/tests/t10_wait_for_background_process.py Show resolved Hide resolved

openhands-agent added 4 commits January 13, 2026 16:50

tests: require nohup + wait in background-process integration test

47a2ff9

Co-authored-by: openhands <openhands@all-hands.dev>

tests: fix system prompt process management assertion casing

dfe148f

Co-authored-by: openhands <openhands@all-hands.dev>

tests: use LLM judge to verify nohup/PID/wait behavior

1dd63c5

Co-authored-by: openhands <openhands@all-hands.dev>

prompt: prefer wait over tail --pid for background processes

45f960d

Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Jan 13, 2026

tests: relax judge criteria for background-process waiting

5c9ce33

Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww marked this pull request as draft January 13, 2026 17:40

xingyaoww closed this Jan 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add guidance for waiting on background processes to system prompt#1694

Add guidance for waiting on background processes to system prompt#1694
xingyaoww wants to merge 10 commits intomainfrom
openhands/wait-for-background-process-guidance

xingyaoww commented Jan 12, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Uh oh!

xingyaoww commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

openhands-ai bot commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

xingyaoww commented Jan 12, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Uh oh!

github-actions bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

xingyaoww commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2_thinking

litellm_proxy_mistral_devstral_2512

litellm_proxy_deepseek_deepseek_chat

litellm_proxy_vertex_ai_gemini_3_pro_preview

litellm_proxy_claude_sonnet_4_5_20250929

litellm_proxy_gpt_5.1_codex_max

Uh oh!

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

🧪 Condenser Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

litellm_proxy_mistral_devstral_2512

litellm_proxy_vertex_ai_gemini_3_pro_preview

litellm_proxy_deepseek_deepseek_chat

litellm_proxy_claude_sonnet_4_5_20250929

litellm_proxy_moonshot_kimi_k2_thinking

Uh oh!

openhands-ai bot commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xingyaoww commented Jan 12, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Jan 12, 2026 •

edited

Loading