Integrate kontext-dev SDK #1

michiosw · 2025-12-27T11:27:34Z

Summary

integrate kontext-dev SDK for single MCP gateway (SEARCH/EXECUTE)
wire config + token handling for the gateway
document config and examples
stabilize tests and locale-sensitive formatting

Testing

just fmt
just fix -p codex-core
just fix -p codex-exec-server
just fix -p codex-protocol
cargo test -p codex-core
cargo test -p codex-exec-server
cargo test -p codex-protocol
cargo test --all-features

…ai#8950) Agent wouldn't "see" attached images and would instead try to use the view_file tool: <img width="1516" height="504" alt="image" src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12" /> In this PR, we wrap image content items in XML tags with the name of each image (now just a numbered name like `[Image #1]`), so that the model can understand inline image references (based on name). We also put the image content items above the user message which the model seems to prefer (maybe it's more used to definitions being before references). We also tweak the view_file tool description which seemed to help a bit Results on a simple eval set of images: Before <img width="980" height="310" alt="image" src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86" /> After <img width="918" height="322" alt="image" src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f" /> ```json [ { "id": "single_describe", "prompt": "Describe the attached image in one sentence.", "images": ["image_a.png"] }, { "id": "single_color", "prompt": "What is the dominant color in the image? Answer with a single color word.", "images": ["image_b.png"] }, { "id": "orientation_check", "prompt": "Is the image portrait or landscape? Answer in one sentence.", "images": ["image_c.png"] }, { "id": "detail_request", "prompt": "Look closely at the image and call out any small details you notice.", "images": ["image_d.png"] }, { "id": "two_images_compare", "prompt": "I attached two images. Are they the same or different? Briefly explain.", "images": ["image_a.png", "image_b.png"] }, { "id": "two_images_captions", "prompt": "Provide a short caption for each image (Image 1, Image 2).", "images": ["image_c.png", "image_d.png"] }, { "id": "multi_image_rank", "prompt": "Rank the attached images from most colorful to least colorful.", "images": ["image_a.png", "image_b.png", "image_c.png"] }, { "id": "multi_image_choice", "prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.", "images": ["image_b.png", "image_d.png"] } ] ```

I've seen this test fail with: ``` - Mock #1. Expected range of matching incoming requests: == 2 Number of matched incoming requests: 1 ``` This is because we pop the wrong task_complete events and then the test exits. I think this is because the MCP events are now buffered after openai#8874. So: 1. clear the buffer before we do any user message sending 2. additionally listen for task start before task complete 3. use the ID from task start to find the correct task complete event.

michiosw force-pushed the kontext-dev branch from 2b7b3d1 to 49bd073 Compare December 31, 2025 12:39

michiosw added 4 commits January 10, 2026 16:06

Integrate kontext-dev SDK and stabilize tests

2cea497

Update kontext-dev SDK and stabilize image test

1212d27

Document Kontext-Dev fork setup

59d93bb

Add local run instructions

5dca009

michiosw force-pushed the kontext-dev branch from b24d41e to 5dca009 Compare January 10, 2026 07:08

michiosw closed this Feb 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate kontext-dev SDK #1

Integrate kontext-dev SDK #1

Uh oh!

michiosw commented Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Integrate kontext-dev SDK #1

Integrate kontext-dev SDK #1

Uh oh!

Conversation

michiosw commented Dec 27, 2025

Summary

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant