SJTU:SPATIAL_JUDGMENTS_IN_MULTIMODAL_MODELS___TOWARDS_UNIFIED_SEGMENTATION_THROUGH_COORDINATE_DETECTION

This project implements image segmentation using multiple vision-language models (Qwen2-VL, LLaVA-Next, GPT-4V) combined with SAM2. It provides interfaces for processing both static images and real-time webcam feeds.

Links

Features

Support for multiple vision-language models:
- Qwen2-VL
- LLaVA-Next
- GPT-4V (via API)
Integration with SAM2 for precise segmentation
Multiple interfaces:
- Static image processing
- Folder batch processing
- Real-time webcam segmentation
Memory-optimized implementation

Example Results

Cars Detection

Prompt: "segment all cars in this image"

Input	Output

Object Detection in Store

Prompt: "segment groceries items and products on the shelves"

Input	Output

Logo Detection

Prompt: "segment the Google logo"

Input	Output

Product Detection

Prompt: "segment the cognac bottle"

Input	Output

Animal Detection

Prompt: "segment the dog in the image"

Input	Output

Installation

Using Conda (Recommended)

# Clone the repository
git clone [repository-url]
cd [repository-name]

# Create conda environment from yml file
conda env create -f py311_environment.yml
conda activate [environment-name]

Using pip

# Clone the repository
git clone [repository-url]
cd [repository-name]

# Install requirements
pip install -r requirements.txt

Downloading sam2 Checkpoints

# Run the following script to download the required model checkpoints:
bash /your path /checkpoints/download_ckpts.sh

Usage

Process Static Images

python run_image.py

This will launch a Gradio interface where you can upload and process individual images.

Process Images in Folder

python run_folder.py

This will process all supported images in the specified folder.

Real-time Webcam Processing

python webcam.py

This launches a real-time webcam interface for immediate segmentation.

Notes

The core implementation (core.py) supports all three models (Qwen2-VL, LLaVA-Next, GPT-4V), but demo interfaces are optimized for single-model usage due to memory constraints.
For GPT-4V usage, an OpenAI API key is required.
Memory management has been optimized for typical computer configurations.
sam2 model checkpoints must be downloaded via download_ckpts.sh prior to usage.

System Requirements

CUDA-capable GPU (recommended)
Minimum 8GB RAM
Python 3.11
CUDA toolkit (if using GPU)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
algorithm		algorithm
checkpoints		checkpoints
data		data
iou_evaluation		iou_evaluation
output		output
sam2		sam2
.gitignore		.gitignore
README.md		README.md
check_iou.ipynb		check_iou.ipynb
py311_environment.yml		py311_environment.yml
requirements.txt		requirements.txt
run_folder.py		run_folder.py
run_image.py		run_image.py
setup.py		setup.py
train.py		train.py
webcam.py		webcam.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SJTU:SPATIAL_JUDGMENTS_IN_MULTIMODAL_MODELS___TOWARDS_UNIFIED_SEGMENTATION_THROUGH_COORDINATE_DETECTION

Links

Features

Example Results

Cars Detection

Object Detection in Store

Logo Detection

Product Detection

Animal Detection

Installation

Using Conda (Recommended)

Using pip

Downloading sam2 Checkpoints

Usage

Process Static Images

Process Images in Folder

Real-time Webcam Processing

Notes

System Requirements

Citation

About

Releases

Packages

Languages

jw-chae/SJTU

Folders and files

Latest commit

History

Repository files navigation

SJTU:SPATIAL_JUDGMENTS_IN_MULTIMODAL_MODELS___TOWARDS_UNIFIED_SEGMENTATION_THROUGH_COORDINATE_DETECTION

Links

Features

Example Results

Cars Detection

Object Detection in Store

Logo Detection

Product Detection

Animal Detection

Installation

Using Conda (Recommended)

Using pip

Downloading sam2 Checkpoints

Usage

Process Static Images

Process Images in Folder

Real-time Webcam Processing

Notes

System Requirements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages