Skip to content

Project developed during my postdoctoral fellowship at the University of Oxford.

Notifications You must be signed in to change notification settings

harshit4311/FineGrained-ImageClassification

Repository files navigation

Topic: Fine-grained Image Classification using ConvNext

Abstract:

Image classification is the task of categorizing images into predefined classes or categories based on their visual content. In this study, we focus on fine-grained image classification, which involves distinguishing between visually similar subcategories within a broader class. Specifically, we address the problem of classifying dog breeds using the ‘ConvNext’ architecture - a convolutional neural network (CNN) model optimized for fine-grained classification tasks.

The problem of fine-grained image classification is very important in various fields, including biology, agriculture, and surveillance. However, it poses challenges due to the subtle differences and intricate details that define each category, making it harder for traditional classification models to achieve accurate results.

To overcome this challenge, we employ ‘ConvNext’, which is a CNN architecture designed to capture fine-grained features effectively. We preprocess the images to standardize their size and enhance their features, and then train the ‘ConvNext’ model using the Stanford Dogs dataset. Additionally, we utilize data augmentation techniques to augment the training dataset, improving the model's ability to generalize unseen data.

Our experiments demonstrate the effectiveness of ‘ConvNext’ in accurately classifying dog breeds, achieving high classification accuracy on the test dataset. Also, we observe that the model can identify subtle visual cues unique to each breed, showcasing its robustness in fine-grained classification tasks.

To summarize, the study highlights the importance of ‘ConvNext’ in addressing the challenges of fine-grained image classification and highlights its relevance in various real-world applications.

The observed performance improvements signify the potential of deep learning models in advancing the field of computer vision, particularly in tasks requiring precise classification of visually similar categories.

Introduction and Background:

Fine-grained image classification involves categorizing images into highly specific classes, often within a broader category. This task holds significance in various domains such as wildlife conservation, medical imaging, and fashion.

Previous research in this area has predominantly relied on convolutional neural networks (CNNs) due to their effectiveness in learning intricate features from images.

In this study, we focus on the ConvNext architecture for fine-grained image classification and utilize the Stanford Dogs dataset for evaluation.

Materials and Methods:

In our study, we begin by leveraging the ‘EfficientNet’ architecture, a state-of-the-art CNN model known for its efficiency and effectiveness in image classification tasks. Initially, we explore the capabilities of ‘EfficientNet’ in classifying dog breeds using the Stanford Dogs dataset, which provides a diverse collection of images representing various breeds.

While EfficientNet demonstrates notable performance in general image classification tasks, we find that it may not fully capture the intricate details necessary for fine-grained classification, especially when distinguishing between visually similar dog breeds.
To address this limitation, we transition to the ‘ConvNext’ architecture--a specialized CNN model explicitly designed for fine-grained classification tasks.

‘ConvNext’ comprises a series of convolutional layers followed by max-pooling and dense layers, enabling it to extract and learn intricate features inherent to fine-grained categories.
By leveraging ConvNext, we aim to enhance our model's ability to discern subtle visual cues unique to each dog breed, thereby improving classification accuracy.

In addition to adopting ConvNext, we employ various preprocessing techniques, including resizing and normalization, to standardize the input images and facilitate effective feature extraction.

To summarize, this transition from EfficientNet to ConvNext and the incorporation of preprocessing and augmentation techniques, we aim to overcome the challenges associated with fine-grained image classification and achieve more accurate and robust classification results, particularly in the context of classifying diverse dog breeds.

Experiments and Results:

The experimental setup involved partitioning the dataset into training, validation, and test sets. The model was trained on the training set, and its performance was evaluated on the validation and test sets to assess its generalization capability.

We report several key metrics to evaluate the classification performance of the ConvNext model, including accuracy, precision, recall, and F1 score. These metrics provide insights into the model's ability to correctly classify images into their respective dog breeds.

Discussion and Conclusion:

The results of our experiments highlight the efficiency of the ConvNext architecture in fine-grained image classification tasks. We discuss the implications of our findings in the context of existing literature and identify areas for future research.
Despite achieving promising results, our study is not without limitations, such as dataset bias and model complexity.

We conclude by emphasizing the importance of continued research in fine-grained image classification to address real-world challenges effectively.

References:

1.) TensorFlow Datasets @https://www.tensorflow.org/datasets

2.) Deep Learning for Computer Vision by Smith, J.D

3.) Stack Overflow

4.) https://www.mdpi.com/2076-3417/12/18/9016

5.) Cornell University @https://arxiv.org/abs/2001.04732

About

Project developed during my postdoctoral fellowship at the University of Oxford.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published