Skip to content

Conversation

@michiosw
Copy link

Summary

  • integrate kontext-dev SDK for single MCP gateway (SEARCH/EXECUTE)
  • wire config + token handling for the gateway
  • document config and examples
  • stabilize tests and locale-sensitive formatting

Testing

  • just fmt
  • just fix -p codex-core
  • just fix -p codex-exec-server
  • just fix -p codex-protocol
  • cargo test -p codex-core
  • cargo test -p codex-exec-server
  • cargo test -p codex-protocol
  • cargo test --all-features

michiosw pushed a commit that referenced this pull request Jan 10, 2026
…ai#8950)

Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>

In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).

We also tweak the view_file tool description which seemed to help a bit

Results on a simple eval set of images:

Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>

After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>

```json
[
  {
    "id": "single_describe",
    "prompt": "Describe the attached image in one sentence.",
    "images": ["image_a.png"]
  },
  {
    "id": "single_color",
    "prompt": "What is the dominant color in the image? Answer with a single color word.",
    "images": ["image_b.png"]
  },
  {
    "id": "orientation_check",
    "prompt": "Is the image portrait or landscape? Answer in one sentence.",
    "images": ["image_c.png"]
  },
  {
    "id": "detail_request",
    "prompt": "Look closely at the image and call out any small details you notice.",
    "images": ["image_d.png"]
  },
  {
    "id": "two_images_compare",
    "prompt": "I attached two images. Are they the same or different? Briefly explain.",
    "images": ["image_a.png", "image_b.png"]
  },
  {
    "id": "two_images_captions",
    "prompt": "Provide a short caption for each image (Image 1, Image 2).",
    "images": ["image_c.png", "image_d.png"]
  },
  {
    "id": "multi_image_rank",
    "prompt": "Rank the attached images from most colorful to least colorful.",
    "images": ["image_a.png", "image_b.png", "image_c.png"]
  },
  {
    "id": "multi_image_choice",
    "prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
    "images": ["image_b.png", "image_d.png"]
  }
]
```
michiosw pushed a commit that referenced this pull request Feb 7, 2026
I've seen this test fail with:

```
 - Mock #1.
        	Expected range of matching incoming requests: == 2
        	Number of matched incoming requests: 1
```

This is because we pop the wrong task_complete events and then the test
exits. I think this is because the MCP events are now buffered after
openai#8874.

So:
1. clear the buffer before we do any user message sending
2. additionally listen for task start before task complete
3. use the ID from task start to find the correct task complete event.
@michiosw michiosw closed this Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant