Skip to content

A simplified visual backbone for feature extraction, bounding boxes, and object detection using VinVL.

License

Notifications You must be signed in to change notification settings

Mahmood-Anaam/vinvl

Repository files navigation

VinVL VisualBackbone

Open In Colab

Overview

VinVL Visual Backbone provides a simplified API for feature extraction, bounding boxes, and object detection, enabling you to achieve these tasks with minimal code. This implementation is based on microsoft/scene_graph_benchmark. Refer to their repository for additional details.

Installation

Option 1: Install via Colab

package_name = "vinvl-0.1.0-cp310-cp310-linux_x86_64.whl"
!pip install https://github.com/Mahmood-Anaam/vinvl/raw/main/{package_name} --quiet

Option 2: Install Directly via pip

pip install git+https://github.com/Mahmood-Anaam/vinvl.git

Option 3: Clone Repository and Install in Editable Mode

!git clone https://github.com/Mahmood-Anaam/vinvl.git
%cd vinvl
!pip install -e .

Option 4: Use Conda Environment

conda env create -f environment.yml
conda activate sg_benchmark

!git clone https://github.com/Mahmood-Anaam/vinvl.git
%cd vinvl
!pip install -e .

Features

  • Simplified feature extraction with pretrained VinVL models.
  • Support for multiple input types (file path, URL, PIL.Image, NumPy array, or tensor).
  • Scalable batch processing and seamless PyTorch integration.
  • Predefined configurations for fast setup and customization.
  • High performance on GPUs and CPUs.

Quick Start

Code Example

import torch
from PIL import Image
import requests
from vinvl.scene_graph_benchmark.wrappers import VinVLVisualBackbone

# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
feature_extractor = VinVLVisualBackbone(device=device, config_file=None, opts=None)

# Single Image Feature Extraction
img_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(img_url, stream=True).raw)

image_features = feature_extractor(image)
# Output: List of dictionaries with keys:
"boxes", "classes", "scores", "img_feats", "spatial_features".

# Batch Image Feature Extraction
batch = [
    "http://images.cocodataset.org/val2017/000000039769.jpg",
    "https://farm1.staticflickr.com/26/53573290_1d167223e8_z.jpg"
]

batch_features = feature_extractor(batch)
for feature in batch_features:
    print("\n", feature['classes'])

About

A simplified visual backbone for feature extraction, bounding boxes, and object detection using VinVL.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published