Codex: move system prompts into user input for Responses API#1683
Conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
Found several issues that need attention: overly broad model matching, potential data loss bug, and missing test coverage. Details in inline comments.
all-hands-bot
left a comment
There was a problem hiding this comment.
Overall this PR addresses a real need for Codex subscription compatibility, but there are several issues that should be addressed before merging. The main concerns are around detection logic consistency and unsafe content manipulation.
| # Subscription transport gate: only apply this workaround when calling | ||
| # ChatGPT subscription Codex backend (not the standard OpenAI API). | ||
| base = (self.base_url or "").lower() | ||
| is_subscription_codex_transport = ( | ||
| "chatgpt.com" in base and "backend-api" in base and "codex" in base | ||
| ) |
There was a problem hiding this comment.
🔴 Critical: The detection logic only checks base_url but ignores the model name. This creates inconsistency with the PR description which states "Expanded Responses-model detection from only codex-mini-latest to any model containing codex".
Issues:
- A non-Codex model (e.g., "gpt-4") with
base_url="https://chatgpt.com/backend-api/codex"will get Codex handling (likely incorrect) - A Codex model (e.g., "gpt-5-codex") with standard OpenAI
base_urlwon't get Codex handling (may be intended, but contradicts PR description)
Suggestion: Include model name in the detection logic for consistency:
| # Subscription transport gate: only apply this workaround when calling | |
| # ChatGPT subscription Codex backend (not the standard OpenAI API). | |
| base = (self.base_url or "").lower() | |
| is_subscription_codex_transport = ( | |
| "chatgpt.com" in base and "backend-api" in base and "codex" in base | |
| ) | |
| # Subscription transport gate: only apply this workaround when calling | |
| # ChatGPT subscription Codex backend (not the standard OpenAI API). | |
| base = (self.base_url or "").lower() | |
| model_name = (self.model or "").lower() | |
| is_subscription_codex_transport = ( | |
| "chatgpt.com" in base and "backend-api" in base and "codex" in base | |
| and "codex" in model_name | |
| ) |
Or clarify in documentation that only the base_url matters, not the model name.
| content = item.get("content") | ||
| if not isinstance(content, list): | ||
| content = [content] if content else [] | ||
| item["content"] = ( | ||
| [{"type": "input_text", "text": prefix}] + content | ||
| ) |
There was a problem hiding this comment.
🟠 Important: Unsafe content manipulation. The code assumes that if content is not a list, it can be safely wrapped as [content]. However, this doesn't validate what type content actually is.
Issues:
- If
contentis a dict (which is possible fromto_responses_dict), wrapping it as[dict]creates[{...}]instead of the expected list of content items - If
contentis already a string or other type, this could create incorrect structure
Suggestion: Add type validation and proper handling:
| content = item.get("content") | |
| if not isinstance(content, list): | |
| content = [content] if content else [] | |
| item["content"] = ( | |
| [{"type": "input_text", "text": prefix}] + content | |
| ) | |
| if item.get("type") == "message" and item.get("role") == "user": | |
| content = item.get("content") | |
| if not isinstance(content, list): | |
| # Content should already be a list from to_responses_dict | |
| # If not, this indicates an unexpected state | |
| content = [] if not content else [{"type": "input_text", "text": str(content)}] | |
| item["content"] = ( | |
| [{"type": "input_text", "text": prefix}] + content | |
| ) |
| normalized.append( | ||
| { | ||
| "role": item.get("role"), | ||
| "content": item.get("content") or [], |
There was a problem hiding this comment.
🟠 Important: Inconsistent empty content handling. Compare this line with line 1069:
- Line 1069:
content = [content] if content else []- preserves truthy content, uses empty list for falsy - Line 1096:
item.get("content") or []- replaces ANY falsy value (including empty string "") with empty list
This inconsistency could cause bugs if content is legitimately an empty string or 0.
Suggestion: Use consistent handling:
| "content": item.get("content") or [], | |
| "content": item.get("content") if item.get("content") is not None else [], |
| if is_subscription_codex_transport and system_chunks: | ||
| merged_system = "\n\n---\n\n".join(system_chunks).strip() | ||
| if merged_system: | ||
| prefix = f"Context (system prompt):\n{merged_system}\n\n" | ||
| injected = False | ||
| for item in input_items: | ||
| if item.get("type") == "message" and item.get("role") == "user": | ||
| content = item.get("content") | ||
| if not isinstance(content, list): | ||
| content = [content] if content else [] | ||
| item["content"] = ( | ||
| [{"type": "input_text", "text": prefix}] + content | ||
| ) | ||
| injected = True | ||
| break | ||
|
|
||
| if not injected: | ||
| input_items.insert( | ||
| 0, | ||
| { | ||
| "role": "user", | ||
| "content": [{"type": "input_text", "text": prefix}], | ||
| }, | ||
| ) |
There was a problem hiding this comment.
🟡 Suggestion: These two operations (system prompt injection + normalization) iterate through input_items twice. For better performance, consider combining them into a single loop:
if is_subscription_codex_transport and system_chunks:
merged_system = "\n\n---\n\n".join(system_chunks).strip()
prefix = f"Context (system prompt):\n{merged_system}\n\n" if merged_system else None
normalized: list[dict[str, Any]] = []
injected = False
for item in input_items:
# Normalize first
if item.get("type") == "message":
normalized_item = {
"role": item.get("role"),
"content": item.get("content") or [],
}
else:
normalized_item = item
# Then inject prefix if applicable
if prefix and not injected and normalized_item.get("role") == "user":
content = normalized_item["content"] if isinstance(normalized_item["content"], list) else []
normalized_item["content"] = [{"type": "input_text", "text": prefix}] + content
injected = True
normalized.append(normalized_item)
# If we never found a user message, add synthetic one
if prefix and not injected:
normalized.insert(0, {
"role": "user",
"content": [{"type": "input_text", "text": prefix}],
})
input_items = normalized| DEFAULT_CODEX_INSTRUCTIONS = ( | ||
| "You are OpenHands agent, a helpful AI assistant that can interact " | ||
| "with a computer to solve tasks." | ||
| ) |
There was a problem hiding this comment.
🟡 Suggestion: The hardcoded default instructions should be documented. Add a comment explaining:
- Why this specific text is used
- Why it's required by the endpoint
- Whether it could/should be configurable
Alternatively, consider making it a class constant with better visibility:
| DEFAULT_CODEX_INSTRUCTIONS = ( | |
| "You are OpenHands agent, a helpful AI assistant that can interact " | |
| "with a computer to solve tasks." | |
| ) | |
| # Codex subscription endpoints require a simple, stable instructions string. | |
| # Complex system prompts are moved to user content to avoid validation errors. | |
| DEFAULT_CODEX_INSTRUCTIONS = ( | |
| "You are OpenHands agent, a helpful AI assistant that can interact " | |
| "with a computer to solve tasks." | |
| ) |
| m_sys = Message(role="system", content=[TextContent(text="SYS")]) | ||
| m_user = Message(role="user", content=[TextContent(text="USER")]) | ||
|
|
||
| llm = LLM(model="gpt-5.1-codex", base_url="https://chatgpt.com/backend-api/codex") |
There was a problem hiding this comment.
🟢 Nit: The test uses model="gpt-5.1-codex" but the actual implementation doesn't check the model name at all - only the base_url. This makes the test somewhat misleading.
Either:
- Update the implementation to check model name (see comment on llm.py:1033)
- Or update this test to make it clear that only the base_url matters:
| llm = LLM(model="gpt-5.1-codex", base_url="https://chatgpt.com/backend-api/codex") | |
| llm = LLM(model="any-model", base_url="https://chatgpt.com/backend-api/codex") |
| assert instr is not None | ||
| assert "OpenHands agent" in instr |
There was a problem hiding this comment.
🟢 Nit: The assertion is too weak. It should explicitly verify that the default instructions are used, not the system prompt content:
| assert instr is not None | |
| assert "OpenHands agent" in instr | |
| # Verify default instructions are used, not the system prompt | |
| assert instr == "You are OpenHands agent, a helpful AI assistant that can interact with a computer to solve tasks." | |
| assert "SYS" not in instr # System prompt should NOT be in instructions |
xingyaoww
left a comment
There was a problem hiding this comment.
Thank you! Gonna merge this into my PR first so we can do clean up later!
57cea16
into
OpenHands:feat/openai-subscription-auth
Summary
This PR fixes OpenHands Codex subscription model support by making the Responses API payload Codex-compatible. Codex subscription endpoints can reject complex/long top-level instructions (often returning 400 / “Instructions are not valid”), which breaks agent flows that rely on a large system prompt.
What changed
Why this is needed
Codex subscription endpoints require server-validated instructions and can reject the complex, tool-heavy system prompts typically used by agent frameworks. Moving that content into user input avoids the strict validation while preserving the same guidance/context for the model.
Tests