Image Classification with Vision Transformer

The Vision Transformer, introduced in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" by Dosovitskiy et al., presents a novel approach to image understanding by leveraging the Transformer architecture, which has achieved significant success in natural language processing tasks.

Overview

In this repository, we delve into the implementation and experimentation of Vision Transformer models for image classification. The Vision Transformer breaks from the traditional Convolutional Neural Network (CNN) paradigm by casting image classification as a sequence-to-sequence problem. Instead of using convolutions, it processes image patches as input tokens and applies self-attention mechanisms to capture global and local dependencies within the image.

Key Features

Vision Transformer Architecture: Explore the inner workings of the Vision Transformer model, including its self-attention mechanism and multi-head attention layers.

Training Pipelines: Dive into training pipelines tailored for Vision Transformer models, including data preprocessing, augmentation, and fine-tuning strategies.

Evaluation Metrics: Evaluate model performance using standard image classification metrics such as accuracy, precision, recall, and F1-score.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
image_classification_ViT.ipynb		image_classification_ViT.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Classification with Vision Transformer

Overview

Key Features

About

Releases

Packages

Languages

AjayKrishna76/Image-Classification-ViT

Folders and files

Latest commit

History

Repository files navigation

Image Classification with Vision Transformer

Overview

Key Features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages