Skip to content

An extension for SwarmUI that allows you to connect to Ollama, OpenAI, and OpenRouter to use vision models for image analysis to create image prompts.

License

Notifications You must be signed in to change notification settings

Urabewe/OllamaVision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

OllamaVision

An AI-powered image analysis extension for SwarmUI that generates detailed image descriptions for your prompts. Options to use local LLM with Ollama, OpenAI API, or OpenRouter API.

logo

๐ŸŒŸ Table of Contents

๐ŸŒŸ Features

  • Multiple backend options:
    • Local LLM with Ollama (including remote Ollama installations)
    • OpenAI API integration
    • OpenRouter API support
  • Advanced model settings:
    • Ollama: temperature, top_p, top_k, max_tokens, repeat_penalty, seed
    • OpenAI: temperature, max_tokens, top_p, frequency_penalty, presence_penalty
    • OpenRouter: temperature, max_tokens, top_p, frequency_penalty, presence_penalty, repetition_penalty, top_k, min_p, top_a, seed
  • Multiple preset User Prompts (Artistic Style, Facial Features, Color Palette, etc.)
  • Creative LLM Toys:
    • Image Fusion: Combine separate analyses of style, subject, and setting into cohesive prompts
    • Object + Subject Fusion: Transform objects with character designs and create unique combinations
    • Story Time: Transform images into detailed narratives with beginning, middle, and end
    • Character Creator: Generate detailed character profiles for stories, games, or roleplay
  • System Prompt support for all backends to customize model behavior
  • Prompt Prepending for adding instructions in the front of all requests
  • Custom preset support with reordering capability
  • Direct-to-prompt generation
  • Zero impact on VRAM when not in use (when using unload model setting)
  • Image paste/upload support
  • Image Drag and drop support
  • Remote Ollama server connection support
  • Image compression option to prevent memory issues
  • Analysis history with thumbnails and parameter reuse
  • Enhanced error handling with helpful troubleshooting suggestions

๐Ÿ“‹ Prerequisites

First and foremost:

  • Make sure you have SwarmUI installed and setup on your system.

For Ollama:

  • Ollama with a vision model installed
  • For remote connections:
    • Ollama server must be accessible on your network
    • Port 11434 must be open on the server
    • Server must be properly configured for remote access

For OpenAI:

  • Valid OpenAI API key with access to vision models. Sign up and create an API key here: OpenAI

For OpenRouter:

  • Valid OpenRouter API key. Sign up and create an API key here: OpenRouter
  • Optional: Custom site name for API requests

๐Ÿ› ๏ธ Installation

  1. Follow the Prerequisites section for your chosen backend
  2. Open SwarmUI
  3. Click on "Server" at the top of the page
  4. Click on "Extensions"
  5. Find "OllamaVision" in the list of available extensions
  6. Click the "Install" button
  7. A message will appear and click on "Restart Now"
  8. SwarmUI will restart and OllamaVision will be installed into the Utilities tab

๐Ÿ’ก Usage Guide

๐Ÿš€ Getting Started

  1. Open SwarmUI and navigate to the "Utilities" tab
  2. Click the "OllamaVision" tab
  3. Click the settings button to configure your preferred backend:
    • Ollama (local or remote)
    • OpenAI
    • OpenRouter
  4. Click "Connect" to establish connection

๐ŸŽฏ Setup & Configuration

  1. Select your preferred vision model from the dropdown list
  2. Configure model settings (optional):
    • Adjust temperature for creativity level
    • Fine-tune Top P and Top K for response diversity
    • Set maximum tokens for response length
    • Adjust repeat penalty to prevent repetition
    • Set custom seed for reproducible results
  3. Choose your User Prompt:
    • Use the default preset
    • Select from included presets
    • Create and manage custom presets

๐Ÿ“ธ Image Analysis

  1. Load your image:
    • Quick Paste: Click paste button + CTRL+V
    • File Upload: Click upload button to select local file
    • Drag and Drop: Drag and Drop your image directly into the preview area.
  2. Image preview will appear
  3. Click "Analyze Image" to begin processing

    โš ๏ธ Processing time varies based on your setup. If no error appears, analysis is in progress.

๐ŸŽจ Using the Results

  1. Once analysis completes, click "Send to Prompt"
  2. The AI-generated description will appear in the Generate tab
  3. Use the description as-is or customize it for your needs directly inside OllamaVision

๐Ÿ”‘ Quick Tips

  • If you're using local LLM ensure Ollama is running BEFORE trying connect
  • Larger images may take longer to process use compression if running into memory errors
  • Custom presets are saved between sessions
  • You can edit descriptions before generating images directly in the Analysis Results text area
  • For best results in LLM toys keep MAXTOKENS at -1 (set by default)

๐ŸŽฎ LLM Toys Guide

๐ŸŽจ Image Fusion

  1. Load your images using paste, upload, or drag & drop
  2. Analyze each image separately
  3. Edit the descriptions to your liking
  4. Click "Combine Analyses" to create a single prompt
  5. Edit the prompt to your liking
  6. Click "Send to Prompt" to generate an image
  7. Perfect for creating rich, multi-layered image generation prompts

๐Ÿ”„ Object + Subject Fusion

  1. Click the "Fusion" button
  2. Select "Object + Subject" mode
  3. Load your object image (paste, upload, or drag & drop)
  4. Analyze and edit result as needed
  5. Load your subject image
  6. Analyze and edit result as needed
  7. Click "Combine" to generate fusion prompt
  8. Edit final prompt if desired
  9. Click "Send to Prompt" to generate

Perfect for:

  • Creating custom designs on products (t-shirts, mugs, skateboards)
  • Transforming furniture into character-themed pieces
  • Designing custom figurines, sculptures, or plush toys

๐ŸŽญ Character Creator

Create detailed characters with customizable attributes:

  • Name, Sex, Species, Setting, Alignment, Class/Role
  • Editable input fields for custom characters
  • Editable response field to edit character before saving
  • Smart controls with field locking and randomization
  • Detailed output including personality, physical description, abilities, and backstory
  • Export characters to text files
  • Creates an AI image prompt to create a profile picture for your character
  • An "Export Prompt" button that will extract the image prompt from the results and send it to the generation page for instant generation of your new character

NOTE: If your creations are getting cut off make sure MAXTOKENS is set to -1 (set by default)

๐Ÿ“š Story Time

  1. Load your image using paste, upload, or drag & drop
  2. Click "Tell me a story"
  3. Stories are displayed in a wide-format reading area for comfort
  4. For best results ensure MAXTOKENS is set to -1 (set by default)

๐ŸŽฏ Example Outputs

Here's a showcase of OllamaVision's capabilities using different presets. Each example includes the source image, AI-generated description, and the final generated output.

๐ŸŒˆ Color Palette Analysis

View Example

Source Image

Rainbow Umbrellas

AI-Generated Description

This image features a vibrant array of rainbow-colored umbrellas suspended in the air, creating a visually stunning display against the backdrop of a clear blue sky. The color palette consists of:

  • Red
  • Orange
  • Yellow
  • Green
  • Blue
  • Purple

Each umbrella is distinctly colored, with no discernible pattern. The umbrellas appear evenly spaced throughout the frame, creating a sense of harmony and balance within the composition.

Generated Result

Generated Umbrellas

๐Ÿ‘ค Facial Features Analysis

View Example

Source Image

Portrait

AI-Generated Description

Facial Characteristics:

  • Eyes: Brown
  • Eyebrows: Thick and well-groomed
  • Nose: Straight and moderately sized
  • Mouth: Shaped like a smile with full teeth showing
  • Chin: Rounded in shape
  • Skin tone: Light brown
  • Facial hair: Well-groomed beard
  • Hair color: Brown
  • Ears: Small, located just below the jawline
  • Distinguishing features:
    • Numerous freckles across face and neck
    • 2 small moles under eyes

Generated Result

Generated Portrait

๐Ÿ”ฎ Planned Features

  • Lora captioner that saves captions in a .txt file with name of image for datasets

๐Ÿ™ Acknowledgments

  • mcmonkey for making OllamaVision official and for giving us SwarmUI
  • SouthbayJay for testing and feedback and all the late nights!

About

An extension for SwarmUI that allows you to connect to Ollama, OpenAI, and OpenRouter to use vision models for image analysis to create image prompts.

Resources

License

Stars

Watchers

Forks

Packages

No packages published