GitHub - tdhock/changepoint-data-structure

Source code for figures in paper

For reproducibility purposes a Makefile has been included. Typing the following commands should download/clone this repo, then re-create the figures from the paper.

git clone https://github.com/tdhock/changepoint-data-structure
cd changepoint-data-structure
rm -f *.rds *.csv
make

Result figures will be compiled into figures.pdf which has corresponding source file figures.tex.

Figure 1: demonstration of iterations 2 and 3 of proposed algorithm.
Figure 2: empirical number of iterations.
Figure 3 left: empirical timings for linear/quadratic algorithms, right: empirical timings for linear/quadratic/grid search.
Figure 4: empirical timings in different simulations.
Figure 5: accuracy in 4-fold cross-validation experiments.
Figure 6: timing comparison with CROPS and PDPA.
Figure 7: Kmeans, PCA, Regression.

Note that each figure script linked above depends on csv/rds data files which are pre-computed/cached by other R scripts (see details below or in Makefile). To redo all computations from scratch you can delete these files via “rm *.csv *.rds” and then re-run “make”.

Also note there is a packages.R script which takes care of installing/attaching packages, and is run at the start of each of these R scripts.

TODOs

Efficient data structure for computing Optimal Partitioning models.

figure-four-models.R and figure-three-iterations-new.R have some ideas for a figure for this.

Basic structure is a binary search tree, probably best to implement on top of a C++ STL map.

See for related work http://wcipeg.com/wiki/Convex_hull_trick#Fully_dynamic_variant

Assume data set of size d.

Assume we have a solver OP(lambda) -> segments, loss. O(d log d)

Let N be the number of elements in the data structure.

Public Methods

getModelForPenalty(lambda): find l1<=lambda<l2 in O(log N). If l1==lambda then return stored segments for l1 in O(1). If stored segments2=segments1-1 then return segments in O(1). Otherwise run OP(lambda) and store result for lambda in O(1).

Private Methods

insertModel(lambda, segments, loss): O(log N) – useful if we have a bunch of pre-computed models on disk.

Application: compute the full path on the benchmark data, see if the min error that we get from the target interval algo is the same as the actual min error. Analyze computation time: using data structure we should be log-linear rather than quadratic.

Application: compare O(N^2) Segment Neighborhood algo (jointseg::Fpsn) with our algo. 1-we can do parallel eval with OP. 2- only 73% of SN models are actually computable via OP. 3-modelSelection code is linear time.

Parallel full path algo.

14 Apr 2021

figure-crops-compare-data.R makes figure-crops-compare-data.rds

figure-crops-compare.R makes

??? these files are old…

binseg.only.timings.R

tikzTest.R

figure-loss-changepoint.R

13 Apr 2021

figure-regression-simple-data.R computes figure-regression-simple-data.csv

figure-regression-simple.R makes

figure-pca-simple-data.R computes figure-pca-simple-data.csv

figure-pca-simple.R makes

figure-kmeans-simple-data.R makes figure-kmeans-simple-data.csv

figure-kmeans-simple.R makes

21 Apr 2020

Advanced R chapter on Rcpp shows the following example of an exported Rcpp function that returns a C++ std::map,

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
std::map<double, int> tableC(NumericVector x) {
  std::map<double, int> counts;

  int n = x.size();
  for (int i = 0; i < n; i++) {
    counts[x[i]]++;
  }

  return counts;
}

but what is returned to R in this case?

Rcpp modules vignette explains how to expose a C++ class/methods to R (implemented internally using an external pointer to an instance of the class). Get started via

Rcpp::Rcpp.package.skeleton("testmod", module=TRUE)

27 Mar 2020

figure-three-iterations-new.R makes figure-three-iterations-new.tex TODO highlight sure/unsure regions and stuff that is stored by algo. see also figure-four-models.R

11 Sep 2019

binseg.timing.R simulations.
fullpath.db.binseg.R binseg on loss values from simulated and real data.
figure-fullpath-db-binseg.R makes

22 May 2019

figure-chipseq-cv.R makes

10 May 2019

figure-fullpath-grid-timing.R makes

3 May 2019

figure-fullpath-db-timing.R makes

figure-loss-small-evals.R

figure-loss-small-evals.tex and figures.pdf

6 Feb 2019

no.intermediates.selected.R exhibits a set of valid loss/complexity values for which no intermediates are selected – how many pops does this cause?

4 Feb 2019

loss.small.R computes full path of loss values for all 13,000+ neuroblastoma data sets with less than 1000 data points.

figure-loss-small.R visualizes the corresponding model selection functions. viz

figure-loss-small-data.R also shows the data sets and segmentation models. viz

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.gitignore		.gitignore
Makefile		Makefile
README.org		README.org
binseg.bug.R		binseg.bug.R
binseg.bug.valgrind.txt		binseg.bug.valgrind.txt
binseg.noisy.R		binseg.noisy.R
binseg.only.timings.R		binseg.only.timings.R
binseg.quadratic.rigaill.R		binseg.quadratic.rigaill.R
binseg.timing.R		binseg.timing.R
chipseq.cv.R		chipseq.cv.R
chipseq.grid.R		chipseq.grid.R
diff.quadratic.rigaill.R		diff.quadratic.rigaill.R
figure-binseg-quadratic-rigaill.R		figure-binseg-quadratic-rigaill.R
figure-binseg-quadratic-rigaill.tex		figure-binseg-quadratic-rigaill.tex
figure-chipseq-cv-diff.png		figure-chipseq-cv-diff.png
figure-chipseq-cv.R		figure-chipseq-cv.R
figure-chipseq-cv.png		figure-chipseq-cv.png
figure-crops-compare-data.R		figure-crops-compare-data.R
figure-crops-compare.R		figure-crops-compare.R
figure-crops-compare.png		figure-crops-compare.png
figure-cv.R		figure-cv.R
figure-four-models-random.png		figure-four-models-random.png
figure-four-models-systematic.png		figure-four-models-systematic.png
figure-four-models.R		figure-four-models.R
figure-four-models.pdf		figure-four-models.pdf
figure-four-models.tex		figure-four-models.tex
figure-fullpath-both-timing.png		figure-fullpath-both-timing.png
figure-fullpath-db-binseg.R		figure-fullpath-db-binseg.R
figure-fullpath-db-timing.R		figure-fullpath-db-timing.R
figure-fullpath-db-timing.png		figure-fullpath-db-timing.png
figure-fullpath-grid-timing.R		figure-fullpath-grid-timing.R
figure-fullpath-grid-timing.png		figure-fullpath-grid-timing.png
figure-kmeans-simple-data.R		figure-kmeans-simple-data.R
figure-kmeans-simple-loss.png		figure-kmeans-simple-loss.png
figure-kmeans-simple-size.png		figure-kmeans-simple-size.png
figure-kmeans-simple.R		figure-kmeans-simple.R
figure-loss-changepoint.R		figure-loss-changepoint.R
figure-loss-small-data.R		figure-loss-small-data.R
figure-loss-small-evals.R		figure-loss-small-evals.R
figure-loss-small-evals.tex		figure-loss-small-evals.tex
figure-loss-small.R		figure-loss-small.R
figure-loss-small.png		figure-loss-small.png
figure-pca-simple-data.R		figure-pca-simple-data.R
figure-pca-simple-loss.png		figure-pca-simple-loss.png
figure-pca-simple-size.png		figure-pca-simple-size.png
figure-pca-simple.R		figure-pca-simple.R
figure-pca-simple.png		figure-pca-simple.png
figure-pooling-3-2.png		figure-pooling-3-2.png
figure-pooling-6-3.png		figure-pooling-6-3.png
figure-regression-simple-data.R		figure-regression-simple-data.R
figure-regression-simple-loss-selected.png		figure-regression-simple-loss-selected.png
figure-regression-simple-loss.png		figure-regression-simple-loss.png
figure-regression-simple-size.png		figure-regression-simple-size.png
figure-regression-simple.R		figure-regression-simple.R
figure-regression-simple.png		figure-regression-simple.png
figure-regression.R		figure-regression.R
figure-three-iterations-new.R		figure-three-iterations-new.R
figure-three-iterations-new.pdf		figure-three-iterations-new.pdf
figure-three-iterations-new.tex		figure-three-iterations-new.tex
figure-three-iterations.R		figure-three-iterations.R
figure-three-iterations.tex		figure-three-iterations.tex
figures.pdf		figures.pdf
figures.tex		figures.tex
fullpath.db.binseg.R		fullpath.db.binseg.R
fullpath.db.loss.R		fullpath.db.loss.R
fullpath.db.timing.R		fullpath.db.timing.R
fullpath.grid.timing.R		fullpath.grid.timing.R
fullpath.grid.timing.rds		fullpath.grid.timing.rds
loss.big.R		loss.big.R
loss.big.binseg.R		loss.big.binseg.R
loss.big.timings.R		loss.big.timings.R
loss.small.R		loss.small.R
loss.small.evals.R		loss.small.evals.R
loss.small.evals.rds		loss.small.evals.rds
loss.small.rds		loss.small.rds
loss.small.timings.R		loss.small.timings.R
neuroblastoma.cv.R		neuroblastoma.cv.R
neuroblastoma.grid.R		neuroblastoma.grid.R
no.intermediates.selected.R		no.intermediates.selected.R
packages.R		packages.R
reproducibility-checklist.pdf		reproducibility-checklist.pdf
tikzTest.R		tikzTest.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Source code for figures in paper

TODOs

14 Apr 2021

13 Apr 2021

21 Apr 2020

27 Mar 2020

11 Sep 2019

22 May 2019

10 May 2019

3 May 2019

6 Feb 2019

4 Feb 2019

About

Releases

Packages

Languages

tdhock/changepoint-data-structure

Folders and files

Latest commit

History

Repository files navigation

Source code for figures in paper

TODOs

14 Apr 2021

13 Apr 2021

21 Apr 2020

27 Mar 2020

11 Sep 2019

22 May 2019

10 May 2019

3 May 2019

6 Feb 2019

4 Feb 2019

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages