-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Description
What feature would you like to see?
Multimodal Image Support via @ File Search
Problem: Codex CLI has full backend infrastructure for OpenAI Vision API but no way for users to send images in interactive mode.
Solution: Extend the existing @ file search system to detect image files and display them as [Image #1] placeholders while sending actual image data to the Vision API.
Usage Example:
user> @screenshot.png explain this UI
[TAB to select image]
user> [Image #1] explain this UI
codex> This interface shows a terminal with...
Benefits:
- Unlocks existing unused multimodal capabilities
- Uses familiar @ syntax with TAB completion
- Clean visual feedback with numbered placeholders
- Zero breaking changes to existing functionality
- Maintains chat history readability
Are you interested in implementing this feature?
Yes, I have a working prototype that extends the existing file search system. I will wait for acknowledgement before opening a PR.
Additional information
The implementation builds on existing infrastructure:
- InputItem::LocalImage already exists in protocol
- OpenAI Vision API integration already works
- @ file search popup system already exists
- Only missing piece is image detection in file search UI
Implementation preserves all existing functionality while adding seamless image support.
Technical Details:
- Supports jpg, jpeg, png, gif, webp, bmp formats
- Extends ChatComposer::insert_selected_path() to detect images
- Creates numbered placeholders
[Image #1],[Image #2]for display - Sends cleaned text + actual image data to AI processing
- Maintains backward compatibility with existing @ file search