An AI-powered image analysis extension for SwarmUI that generates detailed image descriptions for your prompts. Options to use local LLM with Ollama, OpenAI API, or OpenRouter API.
- Features
- Prerequisites
- Installation
- Usage Guide
- LLM Toys Guide
- Example Outputs
- Planned Features
- Acknowledgments
- Multiple backend options:
- Local LLM with Ollama (including remote Ollama installations)
- OpenAI API integration
- OpenRouter API support
- Advanced model settings:
- Ollama: temperature, top_p, top_k, max_tokens, repeat_penalty, seed
- OpenAI: temperature, max_tokens, top_p, frequency_penalty, presence_penalty
- OpenRouter: temperature, max_tokens, top_p, frequency_penalty, presence_penalty, repetition_penalty, top_k, min_p, top_a, seed
- Multiple preset User Prompts (Artistic Style, Facial Features, Color Palette, etc.)
- Creative LLM Toys:
- Image Fusion: Combine separate analyses of style, subject, and setting into cohesive prompts
- Object + Subject Fusion: Transform objects with character designs and create unique combinations
- Story Time: Transform images into detailed narratives with beginning, middle, and end
- Character Creator: Generate detailed character profiles for stories, games, or roleplay
- System Prompt support for all backends to customize model behavior
- Prompt Prepending for adding instructions in the front of all requests
- Custom preset support with reordering capability
- Direct-to-prompt generation
- Zero impact on VRAM when not in use (when using unload model setting)
- Image paste/upload support
- Image Drag and drop support
- Remote Ollama server connection support
- Image compression option to prevent memory issues
- Analysis history with thumbnails and parameter reuse
- Enhanced error handling with helpful troubleshooting suggestions
-
Make sure you have SwarmUI installed and setup on your system.
- Ollama with a vision model installed
- For remote connections:
- Ollama server must be accessible on your network
- Port 11434 must be open on the server
- Server must be properly configured for remote access
- Valid OpenAI API key with access to vision models. Sign up and create an API key here: OpenAI
- Valid OpenRouter API key. Sign up and create an API key here: OpenRouter
- Optional: Custom site name for API requests
- Follow the Prerequisites section for your chosen backend
- Open SwarmUI
- Click on "Server" at the top of the page
- Click on "Extensions"
- Find "OllamaVision" in the list of available extensions
- Click the "Install" button
- A message will appear and click on "Restart Now"
- SwarmUI will restart and OllamaVision will be installed into the Utilities tab
- Open SwarmUI and navigate to the "Utilities" tab
- Click the "OllamaVision" tab
- Click the settings button to configure your preferred backend:
- Ollama (local or remote)
- OpenAI
- OpenRouter
- Click "Connect" to establish connection
- Select your preferred vision model from the dropdown list
- Configure model settings (optional):
- Adjust temperature for creativity level
- Fine-tune Top P and Top K for response diversity
- Set maximum tokens for response length
- Adjust repeat penalty to prevent repetition
- Set custom seed for reproducible results
- Choose your User Prompt:
- Use the default preset
- Select from included presets
- Create and manage custom presets
- Load your image:
- Quick Paste: Click paste button +
CTRL+V
- File Upload: Click upload button to select local file
- Drag and Drop: Drag and Drop your image directly into the preview area.
- Quick Paste: Click paste button +
- Image preview will appear
- Click "Analyze Image" to begin processing
โ ๏ธ Processing time varies based on your setup. If no error appears, analysis is in progress.
- Once analysis completes, click "Send to Prompt"
- The AI-generated description will appear in the Generate tab
- Use the description as-is or customize it for your needs directly inside OllamaVision
- If you're using local LLM ensure Ollama is running BEFORE trying connect
- Larger images may take longer to process use compression if running into memory errors
- Custom presets are saved between sessions
- You can edit descriptions before generating images directly in the Analysis Results text area
- For best results in LLM toys keep MAXTOKENS at -1 (set by default)
- Load your images using paste, upload, or drag & drop
- Analyze each image separately
- Edit the descriptions to your liking
- Click "Combine Analyses" to create a single prompt
- Edit the prompt to your liking
- Click "Send to Prompt" to generate an image
- Perfect for creating rich, multi-layered image generation prompts
- Click the "Fusion" button
- Select "Object + Subject" mode
- Load your object image (paste, upload, or drag & drop)
- Analyze and edit result as needed
- Load your subject image
- Analyze and edit result as needed
- Click "Combine" to generate fusion prompt
- Edit final prompt if desired
- Click "Send to Prompt" to generate
Perfect for:
- Creating custom designs on products (t-shirts, mugs, skateboards)
- Transforming furniture into character-themed pieces
- Designing custom figurines, sculptures, or plush toys
Create detailed characters with customizable attributes:
- Name, Sex, Species, Setting, Alignment, Class/Role
- Editable input fields for custom characters
- Editable response field to edit character before saving
- Smart controls with field locking and randomization
- Detailed output including personality, physical description, abilities, and backstory
- Export characters to text files
- Creates an AI image prompt to create a profile picture for your character
- An "Export Prompt" button that will extract the image prompt from the results and send it to the generation page for instant generation of your new character
NOTE: If your creations are getting cut off make sure MAXTOKENS is set to -1 (set by default)
- Load your image using paste, upload, or drag & drop
- Click "Tell me a story"
- Stories are displayed in a wide-format reading area for comfort
- For best results ensure MAXTOKENS is set to -1 (set by default)
Here's a showcase of OllamaVision's capabilities using different presets. Each example includes the source image, AI-generated description, and the final generated output.
View Example
This image features a vibrant array of rainbow-colored umbrellas suspended in the air, creating a visually stunning display against the backdrop of a clear blue sky. The color palette consists of:
- Red
- Orange
- Yellow
- Green
- Blue
- Purple
Each umbrella is distinctly colored, with no discernible pattern. The umbrellas appear evenly spaced throughout the frame, creating a sense of harmony and balance within the composition.
View Example
Facial Characteristics:
- Eyes: Brown
- Eyebrows: Thick and well-groomed
- Nose: Straight and moderately sized
- Mouth: Shaped like a smile with full teeth showing
- Chin: Rounded in shape
- Skin tone: Light brown
- Facial hair: Well-groomed beard
- Hair color: Brown
- Ears: Small, located just below the jawline
- Distinguishing features:
- Numerous freckles across face and neck
- 2 small moles under eyes
- Lora captioner that saves captions in a .txt file with name of image for datasets
- mcmonkey for making OllamaVision official and for giving us SwarmUI
- SouthbayJay for testing and feedback and all the late nights!