Skip to content

tdhock/max-generalized-auc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jonathan Hillman and Toby Dylan Hocking, Optimizing ROC Curves with a Sort-Based Surrogate Loss Function for Binary Classification and Changepoint Detection, arXiv:2107.01285

Replication materials

This paper proposed a new surrogate loss function, Area Under Min of FP and FN (AUM). Minimizing AUM was shown to result in maximizing Area Under the ROC Curve (AUC). Reference implementation of Algorithm 1 (AUM and directional derivative computation) in C++ can be found in R package aum. AUM implementation in pytorch can be found in figure-aum-neural-networks-data.py.

Before running some of the replication scripts below, you will need to clone the data repositories: https://github.com/tdhock/feature-learning-benchmark and https://github.com/tdhock/neuroblastoma-data.

Interactive figures

figure-aum-convexity-interactive.png

Interactive version: http://ml.nau.edu/viz/2021-11-12-aum-convexity/

Source: figure-aum-convexity-interactive.R

figure-auc-improved-interactive.R

Title, abstract, slides

Title: Efficient line search for optimizing Area Under the ROC Curve in gradient descent

Abstract: Receiver Operating Characteristic (ROC) curves are useful for evaluation in binary classification and changepoint detection, but difficult to use for learning since the Area Under the Curve (AUC) is piecewise constant (gradient zero almost everywhere). Recently the Area Under Min (AUM) of false positive and false negative rates has been proposed as a differentiable surrogate for AUC. In this paper we study the piecewise linear/constant nature of the AUM/AUC, and propose new efficient path-following algorithms for choosing the learning rate which is optimal for each step of gradient descent (line search), when optimizing a linear model. Remarkably, our proposed line search algorithm has the same log-linear asymptotic time complexity as gradient descent with constant step size, but it computes a complete representation of the AUM/AUC as a function of step size. In our empirical study of binary classification problems, we verify that our proposed algorithm is fast and exact; in changepoint detection problems we show that the proposed algorithm is just as accurate as grid search, but faster.

Slides PDF

See also https://github.com/tdhock/two-new-algos-sci-ml

5 Aug 2024

data_Classif_batchtools_best_valid.R makes

data_Classif_batchtools_best_valid.png

7 May 2024

data_Classif.R makes data_Classif.RData

data_Classif_figure.R makes

data_Classif_figure_units.png

data_Classif_figure_subtrain_validation.png

6 May 2024

figure-aum-convexity-new.R makes

figure-aum-convexity-new-profiles.png

figure-aum-convexity-new.png

figure-line-search-example-binary.R makes

figure-line-search-example-binary-roc.png

figure-line-search-example-binary-error.png

figure-line-search-example-binary.png

3 May 2024

figure-more-than-one-new.R makes new figures for AUM line search paper:

figure-more-than-one-new-binary-heat.png

figure-more-than-one-new-binary-heat-segs.png

figure-more-than-one-new-binary-aum-rate.png

figure-line-search-example.R make figure/demo for new paper/slides.

figure-line-search-example.png

21 Nov 2023

figure-more-than-one.R modified to make heat map version of SM figure, and rate version of AUM:

figure-more-than-one-binary-heat.png

figure-more-than-one-binary-aum-rate.png

4 May 2023

JSM Slides “Efficient line search optimization of penalty functions in supervised changepoint detection,” PDF, LaTeX source.

figure-line-search-example.R make figure/demo for new paper/slides.

figure-line-search-interactive.R makes new interactive figure showing line search.

figure-line-search-complexity-compare.R studies asymptotic complexity of different versions of line search.

figure-line-search-complexity-compare-steps-refs.png

figure-line-search-complexity-compare-iterations-refs.png

figure-line-search-complexity-compare-iterations.png

figure-line-search-complexity-compare-seconds-refs.png

figure-line-search-complexity-compare-seconds.png

figure-line-search-complexity-compare-H3K4me3_TDH_immune-equal_labels-rate-IntervalRegressionCV.png

figure-line-search-complexity-compare-H3K4me3_TDH_immune-equal_labels-rate-IntervalRegressionCV-initial.png

figure-line-search-complexity-compare-H3K4me3_TDH_immune-equal_labels-rate-IntervalRegressionCV-seconds.png

10 Apr 2023

figure-line-search-complexity.R is a more systematic version of this experiment, computing how many iterations it takes to get to a min aum step size, tdhock/aum#5

result CSV: figure-line-search-complexity.csv and figure below,

figure-line-search-complexity.png

figure-line-grid-search-interactive.R compares grid search to line search with linear number of iterations (same as number of input diffs/breakpoints B).

7 Nov 2022

figure-auc-improved.R makes figure below which shows one way to measure irregularity of ROC curves, via difference with AUC of monotonic version.

figure-auc-improved.png

19 Aug 2022

HOCKING-slides-prescott.tex makes HOCKING-slides-prescott.pdf

with new figure code figure-aum-grad-speed-binary.R which makes

figure-aum-grad-speed-binary.png

19 July 2022

figure-compare-hinge-loss.R makes

figure-compare-hinge-loss-pairwise-line.png

figure-compare-hinge-loss-squared-pairwise-relative.png

figure-compare-hinge-loss-squared-pairwise.png

figure-compare-hinge-loss-hinge-pairwise-relative.png

figure-compare-hinge-loss-hinge-pairwise.png

19 May 2022

New image classification experiment figure-aum-neural-networks-data.py adapted from torch AUM code, https://tdhock.github.io/blog/2022/aum-learning/

figure-aum-neural-networks.R makes

figure-aum-neural-networks-test-auc.png

2 May 2022

Slides for London tex, pdf.

Additional figures in figure-more-than-one.R

figure-more-than-one-binary-errors.png

figure-more-than-one-binary-dots.png

figure-more-than-one-binary-aum.png

3 Feb 2022

Figure below from aum package accuracy comparison vignette suggests that experiments on sonar data could provide convincing evidence of superior accuracy.

figure-from-vignette.png

figure-sonar-comparisons-data.R makes figure-sonar-comparisons.csv

figure-sonar-comparisons.R reads that and makes

figure-sonar-comparisons.png

figure-sonar-comparisons-iterations.png

12 Nov 2021

figure-aum-convexity-interactive.R makes interactive figure

figure-aum-convexity-interactive.png

Interactive versions:

3 Nov 2021

HOCKING-slides.tex makes HOCKING-slides.pdf for ML lab / Math colloq.

24 June 2021

figure-aum-grad-speed-binary-cpp-data.R makes binary classification timing data, figure-aum-grad-speed-binary-cpp-data.csv

figure-aum-grad-speed-binary-cpp.R makes

figure-aum-grad-speed-binary-cpp-algos.png

figure-aum-grad-speed-binary-cpp.png

figure-aum-grad-speed.R updated to make

figure-aum-grad-speed-both.png

16 June 2021

figure-unbalanced-grad-desc.R updated to make new figure (useful for slides probly)

figure-unbalanced-grad-desc-logistic.png

11 June 2021

Updated figure-aum-convexity.R new figures

figure-aum-convexity-thresholds.png

figure-aum-convexity-emph.png

Updated figure-aum-grad-speed.R new figure

figure-aum-grad-speed-random.png

7 June 2021

figure-aum-grad-speed-binary.R makes

figure-aum-grad-speed-binary.png

figure above shows time differences between sorted (linear) and unsorted (log-linear) predictions.

figure below shows differences between algos (aum comparable to logistic, whether or not predictions are sorted).

figure-aum-grad-speed-binary-algos.png

31 May 2021

figure-aum-grad-speed-data.R makes figure-aum-grad-speed-data.csv

figure-aum-grad-speed.R reads that and makes

figure-aum-grad-speed.png

26 May 2021

figure-unbalanced-grad-desc-data.R makes figure-unbalanced-grad-desc-data.rds

figure-unbalanced-grad-desc.R reads that and makes

figure-unbalanced-grad-desc-aum.png

The figure above shows that the AUM variant which uses total number of errors (count) is more accurate than the AUM variant which uses the normalized error (rate).

figure-unbalanced-grad-desc.png

The figure above shows that the AUM is at least as accurate as squared.hinge.all.pairs, whereas logistic.weighted is less accurate.

25 May 2021

figure-logistic-weights.R makes

figure-logistic-weights.png

This figure shows that cv.glmnet does fine with 5% positive labels, but stops learning when we get down to 1% positive labels. This suggests that we should try 1% for comparing aum.rate and aum.count.

10 Mar 2021

figure-DNA-Sonar-subtrain-valid-data.R makes

figure-DNA-Sonar-subtrain-valid-data.csv.gz

figure-DNA-Sonar-subtrain-valid.R analyzes those data.

9 Mar 2021

figure-binary-test-auc-data.R makes figure-binary-test-auc-data.rds

figure-binary-test-auc.R makes

figure-binary-test-auc.png

3 Jan 2021

figure-test-fold-monotonic.R makes

> meta.dt[, .(data.name, test.fold, features, n.train, mean.breaks)]
                  data.name test.fold features n.train mean.breaks
1:          ATAC_JV_adipose         4       29     341    6.665689
2: H3K27ac-H3K4me3_TDHAM_BP         2       26    1865    4.145845
3:        H3K4me3_XJ_immune         2       28     216    5.902778
4:        H3K4me3_XJ_immune         4       28     216    6.134259
5:               systematic         1      117    3322    1.010235
> (meta.stats <- meta.tall[, .(
+   min=min(value),
+   max=max(value)
+ ), by=variable])
      variable        min         max
1:    features  26.000000  117.000000
2:     n.train 216.000000 3322.000000
3: mean.breaks   1.010235    6.665689

21 Jan 2021

figure-aum-train-both.R makes

figure-aum-train-both.png

figure-aum-train-data.R makes figure-aum-train-data.rds

figure-aum-train.R makes

figure-aum-train-iterations.png

figure-aum-train.png

figure-aum-optimized-data.R makes figure-aum-optimized-data.rds

figure-aum-optimized.R reads those data and makes

figure-aum-optimized.png

figure-aum-optimized-iterations.png

This shows N=54 predicted values with min error, then predicted values optimized via aum gradient descent.

  • TODO do same with linear model, train error/auc.
  • TODO aum figs?

13 Jan 2021

figure-binary-class.R makes a figure showing what fp/fn curves look like for binary class,

figure-binary-class.png

12 Jan 2021

figure-aum-convexity.R makes

figure-aum-convexity.png

figure-aum-convexity-profiles.png

8 Jan 2021

figure-fn-not-monotonic.R makes

figure-fn-not-monotonic.png

figure-fn-not-monotonic-error.png

figure-more-than-one.R makes

figure-more-than-one-less-aum.png

figure-more-than-one-less-auc.png

figure-more-than-one-more-aum.png

figure-more-than-one-more-auc.png

2 Sept 2020

figure-linear-model-test-analyze.R makes

figure-linear-model-test-analyze.png

25 Aug 2020

Some R scripts for interactive experimentation with grad desc algo for learning linear model that minimizes AUM:

R script with OneFold function that computes train/valid/test error, can be parallelized over 198 test folds on the cluster:

Initial results on two data sets (ATAC, CTCF) show that

  • Train AUM decreases as a function of iterations (each iteration does line search so that is expected).

figure-linear-model-test-aum-train-decreases.png

  • IntervalRegressionCV init is much more accurate (in terms of test AUM, AUC, errors) than zero init. Best linear model is not as accurate as best predictions, after running gradient descent on just the predicted values (without linear model).

figure-linear-model-test-compare-init.png

  • Using early stopping regularization (select number of iterations with min AUM on validation set) does not decrease test AUM using IntervalRegressionCV initialization.

figure-linear-model-test-initial-selected.png

  • The linear model which is best in terms of test AUM, over all iterations, is not much better than the initial iteration, for these two data sets.

figure-linear-model-test-initial-best.png

  • Do we see any improvement on other test folds / data sets?

16 June 2020

figure-compare-hinge-loss-data.R makes figure-compare-hinge-loss-data.csv

figure-compare-hinge-loss.R makes

figure-compare-hinge-loss.png

figure-compare-hinge-loss-contours.png

18 May 2020

figure-neuroblastomaProcessed-combinations.R makes new figure that highlights counter-examples for the proposition (AUC=1 implies AUM=0) and shows that there are no counter-examples for the converse.

figure-neuroblastomaProcessed-combinations-points.png

2 Oct 2019

auc.improved.R copied from https://github.com/tdhock/feature-learning-benchmark/blob/master/auc.improved.R

19 Aug 2019

figure-curveAlignment.R computes derivative of area under min(fp,fn), updated viz: http://ml.nau.edu/viz/2019-08-19-curveAlignment-aub-deriv/

16 Aug 2019

figure-neuroblastomaProcessed-combinations-interactive.R makes

http://ml.nau.edu/viz/2019-08-16-generalized-roc/

6 June 2019

curveAlignment.R and figure-curveAlignment.R

http://members.cbio.mines-paristech.fr/~thocking/figure-max-auc/

4 June 2019

figure-aub-convexity.R creates figures which show that the aub function is continuous but not convex:

figure-aub-convexity-heatmap.png

figure-aub-convexity.png

3 June 2019

figure-neuroblastomaProcessed-complex-loon.R has code for an interactive plot using loon.

31 May 2019

figure-neuroblastomaProcessed-combinations.R creates the following figure which plots auc vs aub:

figure-neuroblastomaProcessed-combinations-scatter.png

Note that the min AUM=0 has AUC=1, and the points with AUC>1 have AUM>0. Thus minimizing AUM seems like a reasonable criterion.

30 May 2019

figure-neuroblastomaProcessed-complex.R creates http://members.cbio.mines-paristech.fr/~thocking/figure-neuroblastomaProcessed-complex/ which shows 8 labeled neuroblastoma data sequences with two different ROC curves / predictions. Strangely both achieve 0 errors, but the one with predictions in the finite interval has a highly non-monotonic ROC curve, and much smaller area inside the ROC polygon.

figure-neuroblastomaProcessed-combinations.R creates the following figure which shows the auc values for all of the 2^8 unique combinations of predicted values for 8 labeled profiles.

figure-neuroblastomaProcessed-combinations.png

Each labeled profiles has two minima: one in an infinite interval, and one in a finite interval. The panel titles show the difference d from the infinite interval limit to the predicted value, e.g. (-Inf, 1.2) with d=1 results in a predicted value of 0.2. The overall pattern is that d is relevant for AUC, in a range 0.001 to 10, but it has no effect outside that range. Surprisingly there are AUC values greater than zero, which happens when there are cycles. One example is highlighted with a circle in the plot above, and the ROC curves are shown below.

figure-neuroblastomaProcessed-combinations-worst.png

29 May 2019

https://github.com/tdhock/neuroblastoma-data/blob/master/figure-max-auc.R creates http://members.cbio.mines-paristech.fr/~thocking/figure-max-auc/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published