Exploring using K-Means clustering and cosine similarity matrices on image features from a vision transformer.
- CUDA 11.7
- Python 3.10
git clone https://github.com/tsugg/ViTB16-Clustering.git
cd ViTB16-Clustering
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Used visualization and cosine similarity code from here: https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/image_embeddings_analysis_part_1.ipynb
Found ViTB16 model card on hugging face: https://huggingface.co/facebook/dino-vitb16
K-Means strategy found in DINO v2 paper: https://arxiv.org/pdf/2304.07193v1.pdf \