Skip to content

cleanlab/label-error-detection-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Label Error Detection Benchmarks

Code to reproduce results from the paper:

Model-Agnostic Label Quality Scoring to Detect Real-World Label Errors. ICML DataPerf Workshop 2022

This repository is only for intended for scientific purposes. To find label issues in your own classification data, you should instead use the official cleanlab library.

Download Datasets

Dataset Links
1 roman-numeral Dataset: Codalab
Verified Labels: andrew-ng-dcai-comp-2021-manual-review-for-label-errors.xlsx
2 food-101n Dataset and Verified Labels: https://kuanghuei.github.io/Food-101N/
File: Food-101N_release.zip
Training dataset: ./Food-101N_release/train
Verified training labels (subset of training dataset): ./Food-101N_release/meta/verified_train.tsv
3 cifar-10n-agg
cifar-10n-worst
https://github.com/UCSC-REAL/cifar-10-100n
http://ucsc-real.soe.ucsc.edu:1995/Home.html
4 cifar-10s Dataset: Download Cifar as PNG files
Noisy Labels: cifar10_train_dataset_noise_amount_0.2_sparsity_0.4_20220326055753.csv

The roman-numeral dataset contain duplicate images (exact same image with different file names). We use the following script to dedupe: src/preprocess/remove_dupes.py

(Optional) Run cross-validation for each dataset to train models and generate predicted probabilities

Running cross-validation is optional because we've conveniently provided pre-computed out-of-sample predicted probabilities for each dataset and model.

Prerequisite

1. Run docker-compose to build the docker image and run the container

Clone this repo and run below commands:

sudo docker-compose build
sudo docker-compose run --rm --service-port dcai

2. Start Jupyter Lab

make jupyter-lab

3. Run training notebooks for each dataset

Each dataset will have its own folder in ./src/eperiments with a notebook to:

  1. 1_Run_Cross_Val_Noisy_Labels.ipynb: For each model, run k-fold cross-validation with noisy labels to generated out-of-sample predicted probabilities.
  2. 2_Save_Cross_Val_Results_To_Numpy.ipynb: For each model, save predicted probabilities to a Numpy file.

Evaluate Label Quality Scores

The above step is optional because pre-computed predicted probabilities from all of our models are available for you to utilize in the /src/experiments folder (except for Food-101n, due to large file size). For Food-101n, download the pre-computed predicted probabilities (pred_probs.npy) here.

Once we have the out-of-sample predicted probabilities for all datasets and models, we evaluate their performance for detecting label errors using the following notebook:

src/experiments/Evaluate_All_Experiments.ipynb

Raw tables of all performance numbers for each method+dataset can be found in this Google sheet.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages