OllamaVision

An AI-powered image analysis extension for SwarmUI that generates detailed image descriptions for your prompts. Options to use local LLM with Ollama, OpenAI API, or OpenRouter API.

🌟 Table of Contents

Features
Prerequisites
Installation
Usage Guide
LLM Toys Guide
Example Outputs
- Color Palette Analysis
- Facial Features Analysis
Planned Features
Acknowledgments

🌟 Features

Multiple backend options:
- Local LLM with Ollama (including remote Ollama installations)
- OpenAI API integration
- OpenRouter API support
Advanced model settings:
- Ollama: temperature, top_p, top_k, max_tokens, repeat_penalty, seed
- OpenAI: temperature, max_tokens, top_p, frequency_penalty, presence_penalty
- OpenRouter: temperature, max_tokens, top_p, frequency_penalty, presence_penalty, repetition_penalty, top_k, min_p, top_a, seed
Multiple preset User Prompts (Artistic Style, Facial Features, Color Palette, etc.)
Creative LLM Toys:
- Image Fusion: Combine separate analyses of style, subject, and setting into cohesive prompts
- Object + Subject Fusion: Transform objects with character designs and create unique combinations
- Story Time: Transform images into detailed narratives with beginning, middle, and end
- Character Creator: Generate detailed character profiles for stories, games, or roleplay
System Prompt support for all backends to customize model behavior
Prompt Prepending for adding instructions in the front of all requests
Custom preset support with reordering capability
Direct-to-prompt generation
Zero impact on VRAM when not in use (when using unload model setting)
Image paste/upload support
Image Drag and drop support
Remote Ollama server connection support
Image compression option to prevent memory issues
Analysis history with thumbnails and parameter reuse
Enhanced error handling with helpful troubleshooting suggestions

📋 Prerequisites

First and foremost:

Make sure you have SwarmUI installed and setup on your system.

For Ollama:

Ollama with a vision model installed
For remote connections:
- Ollama server must be accessible on your network
- Port 11434 must be open on the server
- Server must be properly configured for remote access

For OpenAI:

Valid OpenAI API key with access to vision models. Sign up and create an API key here: OpenAI

For OpenRouter:

Valid OpenRouter API key. Sign up and create an API key here: OpenRouter
Optional: Custom site name for API requests

🛠️ Installation

Follow the Prerequisites section for your chosen backend
Open SwarmUI
Click on "Server" at the top of the page
Click on "Extensions"
Find "OllamaVision" in the list of available extensions
Click the "Install" button
A message will appear and click on "Restart Now"
SwarmUI will restart and OllamaVision will be installed into the Utilities tab

💡 Usage Guide

🚀 Getting Started

Open SwarmUI and navigate to the "Utilities" tab
Click the "OllamaVision" tab
Click the settings button to configure your preferred backend:
- Ollama (local or remote)
- OpenAI
- OpenRouter
Click "Connect" to establish connection

🎯 Setup & Configuration

Select your preferred vision model from the dropdown list
Configure model settings (optional):
- Adjust temperature for creativity level
- Fine-tune Top P and Top K for response diversity
- Set maximum tokens for response length
- Adjust repeat penalty to prevent repetition
- Set custom seed for reproducible results
Choose your User Prompt:
- Use the default preset
- Select from included presets
- Create and manage custom presets

📸 Image Analysis

Load your image:
- Quick Paste: Click paste button + CTRL+V
- File Upload: Click upload button to select local file
- Drag and Drop: Drag and Drop your image directly into the preview area.
Image preview will appear
Click "Analyze Image" to begin processing

⚠️ Processing time varies based on your setup. If no error appears, analysis is in progress.

🎨 Using the Results

Once analysis completes, click "Send to Prompt"
The AI-generated description will appear in the Generate tab
Use the description as-is or customize it for your needs directly inside OllamaVision

🔑 Quick Tips

If you're using local LLM ensure Ollama is running BEFORE trying connect
Larger images may take longer to process use compression if running into memory errors
Custom presets are saved between sessions
You can edit descriptions before generating images directly in the Analysis Results text area
For best results in LLM toys keep MAXTOKENS at -1 (set by default)

🎮 LLM Toys Guide

🎨 Image Fusion

Load your images using paste, upload, or drag & drop
Analyze each image separately
Edit the descriptions to your liking
Click "Combine Analyses" to create a single prompt
Edit the prompt to your liking
Click "Send to Prompt" to generate an image
Perfect for creating rich, multi-layered image generation prompts

🔄 Object + Subject Fusion

Click the "Fusion" button
Select "Object + Subject" mode
Load your object image (paste, upload, or drag & drop)
Analyze and edit result as needed
Load your subject image
Analyze and edit result as needed
Click "Combine" to generate fusion prompt
Edit final prompt if desired
Click "Send to Prompt" to generate

Perfect for:

Creating custom designs on products (t-shirts, mugs, skateboards)
Transforming furniture into character-themed pieces
Designing custom figurines, sculptures, or plush toys

🎭 Character Creator

Create detailed characters with customizable attributes:

Name, Sex, Species, Setting, Alignment, Class/Role
Editable input fields for custom characters
Editable response field to edit character before saving
Smart controls with field locking and randomization
Detailed output including personality, physical description, abilities, and backstory
Export characters to text files
Creates an AI image prompt to create a profile picture for your character
An "Export Prompt" button that will extract the image prompt from the results and send it to the generation page for instant generation of your new character

NOTE: If your creations are getting cut off make sure MAXTOKENS is set to -1 (set by default)

📚 Story Time

Load your image using paste, upload, or drag & drop
Click "Tell me a story"
Stories are displayed in a wide-format reading area for comfort
For best results ensure MAXTOKENS is set to -1 (set by default)

🎯 Example Outputs

Here's a showcase of OllamaVision's capabilities using different presets. Each example includes the source image, AI-generated description, and the final generated output.

🌈 Color Palette Analysis

View Example

Source Image

AI-Generated Description

This image features a vibrant array of rainbow-colored umbrellas suspended in the air, creating a visually stunning display against the backdrop of a clear blue sky. The color palette consists of:

Red

Orange

Yellow

Green

Blue

Purple

Each umbrella is distinctly colored, with no discernible pattern. The umbrellas appear evenly spaced throughout the frame, creating a sense of harmony and balance within the composition.

Generated Result

👤 Facial Features Analysis

View Example

Source Image

AI-Generated Description

Facial Characteristics:

Eyes: Brown

Eyebrows: Thick and well-groomed

Nose: Straight and moderately sized

Mouth: Shaped like a smile with full teeth showing

Chin: Rounded in shape

Skin tone: Light brown

Facial hair: Well-groomed beard

Hair color: Brown

Ears: Small, located just below the jawline

Distinguishing features:

Numerous freckles across face and neck

2 small moles under eyes

Generated Result

🔮 Planned Features

Lora captioner that saves captions in a .txt file with name of image for datasets

🙏 Acknowledgments

mcmonkey for making OllamaVision official and for giving us SwarmUI
SouthbayJay for testing and feedback and all the late nights!

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.github		.github
Assets		Assets
WebAPI		WebAPI
.gitignore		.gitignore
BackendSchema.cs		BackendSchema.cs
LICENSE		LICENSE
OllamaVisionExtension.cs		OllamaVisionExtension.cs
README.md		README.md

License

Urabewe/OllamaVision

Folders and files

Latest commit

History

Repository files navigation

OllamaVision

🌟 Table of Contents

🌟 Features

📋 Prerequisites

First and foremost:

Make sure you have SwarmUI installed and setup on your system.

For Ollama:

For OpenAI:

For OpenRouter:

🛠️ Installation

💡 Usage Guide

🚀 Getting Started

🎯 Setup & Configuration

📸 Image Analysis

🎨 Using the Results

🔑 Quick Tips

🎮 LLM Toys Guide

🎨 Image Fusion

🔄 Object + Subject Fusion

🎭 Character Creator

📚 Story Time

🎯 Example Outputs

🌈 Color Palette Analysis

Source Image

AI-Generated Description

Generated Result

👤 Facial Features Analysis

Source Image

AI-Generated Description

Generated Result

🔮 Planned Features

🙏 Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases 12

Sponsor this project

Packages 0

Contributors 2

Languages

Packages