Skip to content

Latest commit

 

History

History
251 lines (208 loc) · 19.4 KB

deep_AL.md

File metadata and controls

251 lines (208 loc) · 19.4 KB

Deep Active Learning

Deep neural network is popular in machine learning community. Many researchers focus on how to utilize AL specifically with neural networks or deep models in many tasks.

Survey/Review

There already are several surveys for the DeepAL.

What to study on AL with DNN?

  1. To perform active learning, a model has to be able to learn from small amounts of data, while it is hard to train a deep model with small amount labeled data.
  2. Many AL acquisition functions rely on model uncertainty. But in deep learning we rarely represent such model uncertainty.
  3. Current semi/self-supervised learning has already achieved well performance improvement, is it still necessary to use AL?

Furthermore, several researches study on the effectiveness of AL on DNN, and provide several criticisms. The relative works could be found here.

Current works

There are many works about Deep AL in the literature. Here we only list the ones focused on the strategies or the framework design.

The taxonomy here is similar to the taxonomy in pool-based classification. However, due to the outstanding performance of semi/self-supervised learning in the deep learning literature, there are works include semi/self-supervised learning into the AL framework.

SL or SSL Strategy Types Description Famous Works
SL-based Informativeness Measure the informativeness of each instance EGL/MC-Dropout/ENS/BAIT
Representativeness-impart Represent the underlying distribution Coreset/BatchBALD/BADGE/VAAL
Learn to score Learn a evaluation function directly. LL
SemiSL-based (We didn't specify the strategy types here.) The unlabeled instances are handled by SemiSL.
SelfSL-based (We didn't specify the strategy types here.) The unlabeled instances are handled by SelfSL.

Supervised Learning Based

Supervised learning based methods means the model only use the labeled instances to train the model. Here the taxonomy is similar to the one without using neural networks.

1. Informativeness

The works under this category are focusing on how to evaluate the uncertainty of an instance for a neural network.

Uncertainty-based:

Disagreement-based:

  • The power of ensembles for active learning in image classification [2018, CVPR]: ENS.
  • ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal Self-Ensemble for Active Learning [2022]
  • Temporal Output Discrepancy for Loss Estimation-Based Active Learning [2022, TNNLS]: Estimates the sample uncertainty by measuring the difference of model outputs between two consecutive active learning cycles.

Fisher information:

Performance gain:

  • Deep Active Learning by Leveraging Training Dynamics [2021]
  • Boosting Active Learning via Improving Test Performance [2022, AAAI]: Through expected loss or the output entropy of the output instances. This work is still in the form of gradient length, and it is similar to EGL. The difference is that this work calculates the gradients of all the parameters instead of the last FC layer. Besides, it calculates the gradient by the expectation of losses, while EGL takes the expectations of gradient.
  • Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels [2022]

2. Representativeness-impart

Take into account the representativeness of the data. This could be evaluated on the distribution. Besides, the diversity of a batch could also make the selection more representative (batch mode selection).

There are following sub-categories:

  • Density-based sampling
  • Diversity-based sampling (batch mode)
  • Discriminator guided Sampling
  • Expected loss on unlabeled data
  • Graph-based

Density-based sampling:

  • Active learning for convolutional neural networks: A core-set approach [ICLR, 2018]: Define the problem of active learning as core-set selection, i.e. choosing set of points such that a model learned over the selected subset is competitive for the remaining data points. The empirical analysis suggests that they (Deep Bayesian Active Learning with Image Data [ICML, 2017]) do not scale to large-scale datasets because of batch sampling. Provide a rigorous bound between an average loss over any given subset of the dataset and the remaining data points via the geometry of the data points. And choose a subset such that this bound is minimized (loss minimization). Try to find a set of points to query labels such that when we learn a model, the performance of the model on the labelled subset and that on the whole dataset will be as close as possible. Batch active learning.
  • Ask-n-Learn: Active Learning via Reliable Gradient Representations for Image Classification [2020, Arxiv]: Use kmeans++ on the learned gradient embeddings to select instances.
  • Deep Active Learning by Model Interpretability [2020, Arxiv]: In this paper, inspired by piece-wise linear interpretability in DNN, they introduce the linear separable regions of samples to the problem of active learning, and propose a novel Deep Active learning approach by Model Interpretability (DAMI). They use the local piece-wise interpretation in MLP as the representation of each sample, and directly run K-Center clustering to select and label samples
  • Diffusion-based Deep Active Learning[2020, Arxiv]: Build graph by the first hidden layer in DNN. The selection is performed on the graph. Consider a random walk as a mean to assign a label.
  • DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning [2022]: From their experiments on CIFAR10, in small fractions of 0.1%-1%, the advantage of submodular function based methods is obvious. However, its superiority disappears when the coreset size increases, especially when selecting more than 30% training data.

Diversity-based sampling (batch mode):

  • BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning [NeurIPS, 2019]: Batch acquisition with diverse but informative instances for deep bayesian network. Use a tractable approximation to the mutual information between a batch of points and model parameters as an acquisition function
  • Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds [2020, ICLR]BADGE. Good paper, compared many previous Deep AL. Very representative for Deep AL. Capture uncertainty and diversity. Measure uncertainty through the magnitude of the resulting gradient with respect to parameters of the final (output) layer. To capture diversity, we collect a batch of examples where these gradients span a diverse set of directions by use k-means++ (made to produce a good initialization for k-means clustering).
  • Density Weighted Diversity Based Query Strategy for Active Learning [2021, CSCWD]
  • Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning [2023, TPAMI]

Discriminator guided Sampling:

Expected loss on unlabeled data:

Graph-based:

Submodular-based:

  • SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios [2021, Arxiv]

3. Learn to score

Semi-Supervised Learning Imparted

Most of works only use the labeled instances to train the model. There are several works utilize the unlabeled instances and use a semi-supervised training paradigm to build the framework. Besides, semi-supervised learning would also bring a huge performance improvement. Here the works are categorized by how the unlabeled instances are used.

General work:

Pseudo-labels:

Data Augmentation:

Labeled-unlabeled data indistinguishable:

Consistency (stay same after a distortion):

Graph-based:

  • Active Learning and Uncertainty in Graph-Based Semi-Supervised Learning [2022, PhD Dissertation]

Self-Supervised Learning Imparted

Self-supervised learning is popular to extract a good feature representation. Its effectiveness has been proved in many fields.

Contrastive Loss:

  • Mitigating Sampling Bias and Improving Robustness in Active Learning [2021, Arxiv]: Select instances least similar to the labeled ones with the same class.
  • Highly Efficient Representation and Active Learning Framework and Its Application to Imbalanced Medical Image Classification [2021]: Self-supervised representation learning with a multi-class GP classifier.
  • One-bit Active Query with Contrastive Pairs [2022, CVPR]

Pretrain:

  • PT4AL: Using Self-Supervised Pretext Tasks for Active Learning [2022, ECCV]:Use Self-SL to pretrain the model and sort instances by the loss and split them into batches. Then use uncertainty to select from the batches.
  • NTKCPL: Active Learning on Top of Self-Supervised Model by Estimating True Coverage [2023]

Both self and semi-supervised imparted:

  • Self-supervised Semi-supervised Learning for Data Labeling and Quality Evaluation [2021, Arxiv]: Use random selection with BYOL and Label Propagation.
  • Representation-Based Time Series Label Propagation for Active Learning [2023, CSCWD]