Official Codebase for A Tale Of Two Long Tails
If you use this software, please consider citing:
title={A Tale Of Two Long Tails},
author={D'souza, Daniel and Nussbaum, Zach and Agarwal, Chirag and Hooker, Sara},
journal={arXiv preprint arXiv:2107.13098},
year={2021}}
🚨 tldr: Examples of Atypical and Noisy Error. The former is reducible with the introduction of information and the other is not! 🚨
This repository is built using PyTorch:fire:. You can install the necessary libraries by
pip install -r requirements.txt
- Download CIFAR-10/CIFAR-100 LongTail Datasets
- Unzip above files in folder "datasets" in main directory
The scripts to train CIFAR-10/CIFAR-100 models on all datasets is train_c10.py/train_c100.py.
Training
-
Set Variable MSP_AUG_PCT to a value between (0,1). This controls how much of the dataset to augment based on the MSP.Default is 0.2 ( Targeted Augment Variant )
-
Set Variable TRAIN_DATASET to either 'cifar10'(Original), 'N20_A20_T60'(C-Score), 'N20_A20_TX2'(Frequency)
-
Run
python train_c10.py
to train CIFAR-10 models
The above steps can be repeated for CIFAR-100 by using train_c100.py
Visualization Code will be added shortly.
Note that the code in this repository is licensed under MIT License. Please carefully check them before use.
If you have questions/suggestions, please feel free to email or create github issues.