feat(llm): vision (image input) support across all providers#569
Merged
feat(llm): vision (image input) support across all providers#569
Conversation
1e1edf7 to
a717f38
Compare
Introduce MessagePart::Image variant with base64 serde, LlmProvider::supports_vision() trait method, and per-provider implementations: - Claude: structured content path with base64 ImageSource blocks - OpenAI: array content format with data-URI image_url entries - Ollama: with_images() call on user messages, optional separate vision_model Agent layer extracts AttachmentKind::Image from ChannelMessage into MessagePart::Image (capped at 20 MB), constructs Message::from_parts when provider supports vision. Channels: - CLI: /image <path> command reads local file into image attachment - Telegram: downloads largest PhotoSize and attaches as Image Config: vision_model field on LlmConfig; --init wizard prompts for it (Ollama only). Bootstrap wires vision_model into OllamaProvider. Closes #490
PERF: replace iter().clone() with consuming partition in resolve_message so image data is moved rather than cloned (up to 20 MB saved per message). SEC: reject path traversal (Component::ParentDir) in /image command before calling std::fs::read. SEC: add pre-download size guard in Telegram photo handler; skip download if photo.file.size exceeds 20 MB and pass capacity hint to Vec::with_capacity. BUG: remove redundant to_llm_content() wrapping in convert_messages_vision; collect text parts directly to avoid false-empty check on image messages. TEST: add unit tests for OpenAI has_image_parts, convert_messages_vision (data-URI format, text-only, image-only cases). TEST: add unit tests for Claude AnthropicContentBlock::Image serialization, split_messages_structured with image parts, has_image_parts detection. TEST: add unit tests for Ollama with_vision_model builder, convert_message with Image parts (base64 encoding), model selection switch. TEST: add unit tests for handle_image_command path traversal rejection, missing file error path, and successful file load with mime detection.
46665eb to
f9fb4fd
Compare
…nd photo handling
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add vision (image input) support across the Zeph stack for local and cloud LLM providers.
MessagePart::Imagevariant with base64 serde andLlmProvider::supports_vision()trait methodAnthropicContentBlock::Image(base64 source)image_urldata-URI encodingwith_images()on user messages, optionalvision_modelconfig for dedicated model routingChannelMessageattachments, 20 MB limit, path traversal protection/image <path>command in CLI/TUI, Telegrammsg.photo()download with pre-download size guardvision_modelfield inLlmConfig,--initwizard updateCloses #490,Closes #491,Closes #492,Closes #493,Closes #494,Closes #495,Closes #496
Test plan
cargo +nightly fmt --checkpassescargo clippy --workspace -- -D warningspasses (zero warnings)cargo nextest run --workspace --lib --binspasses (1811 tests, 0 failures)/image <path>in CLI, verify vision-capable provider processes it