Skip to content

Vision support: image input for local and cloud LLM providers #490

@bug-ops

Description

@bug-ops

Epic

Add image/vision support across the Zeph stack: accept images from TUI, CLI, and Telegram channels, flow them through the content model, and send to LLM providers that support vision APIs.

Architecture overview

Content model changes (zeph-llm)

  • Add MessagePart::Image { data: Vec<u8>, mime_type: String } variant
  • Add LlmProvider::supports_vision() -> bool method
  • When provider doesn't support vision — skip image parts with a warning

Provider-specific handling

  • Ollama: ollama-rs already supports .with_images() on ChatMessage. Config gets vision_model field for dedicated image-to-text model (e.g. llava, bakllava)
  • Claude: Add AnthropicContentBlock::Image variant, switch from plain string to structured content format when images present
  • OpenAI: Switch content from string to array format [{type: "text"}, {type: "image_url"}] when images present

Channel/input changes

  • ChannelMessage gets images: Vec<ImageData> field
  • TUI: /image <path> command (crossterm lacks drag-drop/clipboard image support)
  • CLI: --image <path> flag or /image command
  • Telegram: handle msg.photo() via teloxide, download and attach

Config

  • [llm.ollama] gets vision_model optional field for dedicated vision model
  • Cloud providers (Claude, OpenAI) use the same model — vision is implicit in the API

Sub-issues

Tracked below. Implementation order matches dependency chain.

Acceptance criteria

  • User can attach an image in TUI/CLI/Telegram
  • Image is sent to vision-capable provider and response displayed
  • Ollama can use a separate vision model for image processing
  • Non-vision providers gracefully skip images with a log warning

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicMilestone-level tracking issuellmLLM provider related

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions