Skip to content

Latest commit

 

History

History
37 lines (28 loc) · 997 Bytes

README.md

File metadata and controls

37 lines (28 loc) · 997 Bytes

ViTB16 Clustering

Exploring using K-Means clustering and cosine similarity matrices on image features from a vision transformer.

animated

Requirements

  • CUDA 11.7
  • Python 3.10

Getting Started

git clone https://github.com/tsugg/ViTB16-Clustering.git
cd ViTB16-Clustering
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Extra

Used visualization and cosine similarity code from here: https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/image_embeddings_analysis_part_1.ipynb
Found ViTB16 model card on hugging face: https://huggingface.co/facebook/dino-vitb16
K-Means strategy found in DINO v2 paper: https://arxiv.org/pdf/2304.07193v1.pdf \