The repo for my final project, "Semi-supervised Learning and Robustness on a Smaller Scale" for CSCI 2952-C.
The best image recognition models as of 2020 are exceptionally good at classifying images. Since 2010, the premier benchmark for evaluating image classification performance has been ImageNet-1k. Traditionally, researchers would train their models to classify these images from ImageNet via a process called supervised learning. Here, the model tries to predict what an image's label is and adjusts itself once it receives the correct label.
Increasingly, the best models have relied on semi-supervised learning, a process which leverages hundreds of millions or billions of unlabeled images as extra training data to boost performance. These studies have largely been done by Google Brain or Facebook AI, which have access to storage and computational resources most researchers do not have. Thus, in this study, I examine whether the approach from one of these studies, "Self-training with Noisy Student", can boost a model’s ability to classify images and robustness to noise even when applied at a smaller scale.
All code used to train and evaluate models is in the src/
folder. Training run output files are saved in the runs/
folder. Experimental exploration and other
notebooks are kept in the experiments/
and notebooks/
folders.
Report can be found here.
Credit to the Brown University Center for Computation and Visualization (CCV) for computational resources for this project.