by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi
[arXiv] [Code] [pip Package] [Video]
- π£ Easy Usage (Recommended way to use our method)
- π§ͺ Advanced Usage
- ποΈββοΈ Trained weights
- πͺ Results on a Toy Dataset
- π΄ Directory Tree
- π Citation
- π Contributing
- β€ About me
- β¨ Extras
- π License
β Caution: TailCalibX is just TailCalib employed multiple times. Specifically, we generate a set of features once every epoch and use them to train the classifier. In order to mimic that, three things must be done at every epoch in the following order:
- Collect all the features from your dataloader.
- Use the
tailcalib
package to make the features balanced by generating samples. - Train the classifier.
- Repeat.
Use the package manager pip to install tailcalib.
pip install tailcalib
Check the instruction here for a much more detailed python package information.
# Import
from tailcalib import tailcalib
# Initialize
a = tailcalib(base_engine="numpy") # Options: "numpy", "pytorch"
# Imbalanced random fake data
import numpy as np
X = np.random.rand(200,100)
y = np.random.randint(0,10, (200,))
# Balancing the data using "tailcalib"
feat, lab, gen = a.generate(X=X, y=y)
# Output comparison
print(f"Before: {np.unique(y, return_counts=True)}")
print(f"After: {np.unique(lab, return_counts=True)}")
- Change the
data_root
for your dataset inmain.py
. - If you are using wandb logging (Weights & Biases), make sure to change the
wandb.init
inmain.py
accordingly.
- For just the methods proposed in this paper :
- For CIFAR100-LT:
run_TailCalibX_CIFAR100-LT.sh
- For mini-ImageNet-LT :
run_TailCalibX_mini-ImageNet-LT.sh
- For CIFAR100-LT:
- For all the results show in the paper :
- For CIFAR100-LT:
run_all_CIFAR100-LT.sh
- For mini-ImageNet-LT :
run_all_mini-ImageNet-LT.sh
- For CIFAR100-LT:
Check Notebooks/Create_mini-ImageNet-LT.ipynb
for the script that generates the mini-ImageNet-LT dataset with varying imbalance ratios and train-test-val splits.
-
--seed
: Select seed for fixing it.- Default :
1
- Default :
-
--gpu
: Select the GPUs to be used.- Default :
"0,1,2,3"
- Default :
-
--experiment
: Experiment number (Check 'libs/utils/experiment_maker.py').- Default :
0.1
- Default :
-
--dataset
: Dataset number.- Choices :
0 - CIFAR100, 1 - mini-imagenet
- Default :
0
- Choices :
-
--imbalance
: Select Imbalance factor.- Choices :
0: 1, 1: 100, 2: 50, 3: 10
- Default :
1
- Choices :
-
--type_of_val
: Choose which dataset split to use.- Choices:
"vt": val_from_test, "vtr": val_from_train, "vit": val_is_test
- Default :
"vit"
- Choices:
-
--cv1
to--cv9
: Custom variable to use in experiments - purpose changes according to the experiment.- Default :
"1"
- Default :
-
--train
: Run training sequence- Default :
False
- Default :
-
--generate
: Run generation sequence- Default :
False
- Default :
-
--retraining
: Run retraining sequence- Default :
False
- Default :
-
--resume
: Will resume from the 'latest_model_checkpoint.pth' and wandb if applicable.- Default :
False
- Default :
-
--save_features
: Collect feature representations.- Default :
False
- Default :
-
--save_features_phase
: Dataset split of representations to collect.- Choices :
"train", "val", "test"
- Default :
"train"
- Choices :
-
--config
: If you have a yaml file with appropriate config, provide the path here. Will override the 'experiment_maker'.- Default :
None
- Default :
Experiment | CIFAR100-LT (ResNet32, seed 1, Imb 100) | mini-ImageNet-LT (ResNeXt50) |
---|---|---|
TailCalib | Git-LFS | Git-LFS |
TailCalibX | Git-LFS | Git-LFS |
CBD + TailCalibX | Git-LFS | Git-LFS |
The higher the Imb ratio
, the more imbalanced the dataset is.
Imb ratio = maximum_sample_count / minimum_sample_count
.
Check this notebook to play with the toy example from which the plot below was generated.
TailCalibX
βββ libs
β βββ core
β β βββ ce.py
β β βββ core_base.py
β β βββ ecbd.py
β β βββ modals.py
β β βββ TailCalib.py
β β βββ TailCalibX.py
β βββ data
β β βββ dataloader.py
β β βββ ImbalanceCIFAR.py
β β βββ mini-imagenet
β β βββ 0.01_test.txt
β β βββ 0.01_train.txt
β β βββ 0.01_val.txt
β βββ loss
β β βββ CosineDistill.py
β β βββ SoftmaxLoss.py
β βββ models
β β βββ CosineDotProductClassifier.py
β β βββ DotProductClassifier.py
β β βββ ecbd_converter.py
β β βββ ResNet32Feature.py
β β βββ ResNext50Feature.py
β β βββ ResNextFeature.py
β βββ samplers
β β βββ ClassAwareSampler.py
β βββ utils
β βββ Default_config.yaml
β βββ experiments_maker.py
β βββ globals.py
β βββ logger.py
β βββ utils.py
βββ LICENSE
βββ main.py
βββ Notebooks
β βββ Create_mini-ImageNet-LT.ipynb
β βββ toy_example.ipynb
βββ readme_assets
β βββ method.svg
β βββ toy_example_output.svg
βββ README.md
βββ run_all_CIFAR100-LT.sh
βββ run_all_mini-ImageNet-LT.sh
βββ run_TailCalibX_CIFAR100-LT.sh
βββ run_TailCalibX_mini-imagenet-LT.sh
Ignored tailcalib_pip
as it is for the tailcalib
pip package.
@inproceedings{rahul2021tailcalibX,
title = {{Feature Generation for Long-tail Classification}},
author = {Rahul Vigneswaran and Marc T. Law and Vineeth N. Balasubramanian and Makarand Tapaswi},
booktitle = {ICVGIP},
year = {2021}
}
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
π Long-tail buzz : If you are interested in deep learning research which involves long-tailed / imbalanced dataset, take a look at Long-tail buzz to learn about the recent trending papers in this field.