Skip to content

tdhock/changepoint-data-structure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Source code for figures in paper

For reproducibility purposes a Makefile has been included. Typing the following commands should download/clone this repo, then re-create the figures from the paper.

git clone https://github.com/tdhock/changepoint-data-structure
cd changepoint-data-structure
rm -f *.rds *.csv
make

Result figures will be compiled into figures.pdf which has corresponding source file figures.tex.

  • Figure 1: demonstration of iterations 2 and 3 of proposed algorithm.
  • Figure 2: empirical number of iterations.
  • Figure 3 left: empirical timings for linear/quadratic algorithms, right: empirical timings for linear/quadratic/grid search.
  • Figure 4: empirical timings in different simulations.
  • Figure 5: accuracy in 4-fold cross-validation experiments.
  • Figure 6: timing comparison with CROPS and PDPA.
  • Figure 7: Kmeans, PCA, Regression.

Note that each figure script linked above depends on csv/rds data files which are pre-computed/cached by other R scripts (see details below or in Makefile). To redo all computations from scratch you can delete these files via “rm *.csv *.rds” and then re-run “make”.

Also note there is a packages.R script which takes care of installing/attaching packages, and is run at the start of each of these R scripts.

TODOs

Efficient data structure for computing Optimal Partitioning models.

figure-four-models.R and figure-three-iterations-new.R have some ideas for a figure for this.

Basic structure is a binary search tree, probably best to implement on top of a C++ STL map.

See for related work http://wcipeg.com/wiki/Convex_hull_trick#Fully_dynamic_variant

Assume data set of size d.

Assume we have a solver OP(lambda) -> segments, loss. O(d log d)

Let N be the number of elements in the data structure.

Public Methods

  • getModelForPenalty(lambda): find l1<=lambda<l2 in O(log N). If l1==lambda then return stored segments for l1 in O(1). If stored segments2=segments1-1 then return segments in O(1). Otherwise run OP(lambda) and store result for lambda in O(1).

Private Methods

  • insertModel(lambda, segments, loss): O(log N) – useful if we have a bunch of pre-computed models on disk.

Application: compute the full path on the benchmark data, see if the min error that we get from the target interval algo is the same as the actual min error. Analyze computation time: using data structure we should be log-linear rather than quadratic.

Application: compare O(N^2) Segment Neighborhood algo (jointseg::Fpsn) with our algo. 1-we can do parallel eval with OP. 2- only 73% of SN models are actually computable via OP. 3-modelSelection code is linear time.

Parallel full path algo.

14 Apr 2021

figure-crops-compare-data.R makes figure-crops-compare-data.rds

figure-crops-compare.R makes

figure-crops-compare.png

??? these files are old…

binseg.only.timings.R

tikzTest.R

figure-loss-changepoint.R

13 Apr 2021

figure-regression-simple-data.R computes figure-regression-simple-data.csv

figure-regression-simple.R makes

figure-regression-simple-loss.png

figure-regression-simple-loss-selected.png

figure-regression-simple-size.png

figure-regression-simple.png

figure-pca-simple-data.R computes figure-pca-simple-data.csv

figure-pca-simple.R makes

figure-pca-simple-loss.png

figure-pca-simple-size.png

figure-pca-simple.png

figure-kmeans-simple-data.R makes figure-kmeans-simple-data.csv

figure-kmeans-simple.R makes

figure-kmeans-simple-loss.png

figure-kmeans-simple-size.png

21 Apr 2020

Advanced R chapter on Rcpp shows the following example of an exported Rcpp function that returns a C++ std::map,

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
std::map<double, int> tableC(NumericVector x) {
  std::map<double, int> counts;

  int n = x.size();
  for (int i = 0; i < n; i++) {
    counts[x[i]]++;
  }

  return counts;
}

but what is returned to R in this case?

Rcpp modules vignette explains how to expose a C++ class/methods to R (implemented internally using an external pointer to an instance of the class). Get started via

Rcpp::Rcpp.package.skeleton("testmod", module=TRUE)

27 Mar 2020

figure-three-iterations-new.R makes figure-three-iterations-new.tex TODO highlight sure/unsure regions and stuff that is stored by algo. see also figure-four-models.R

11 Sep 2019

figure-fullpath-db-binseg.png

22 May 2019

figure-chipseq-cv.R makes

figure-chipseq-cv-diff.png

10 May 2019

figure-fullpath-grid-timing.R makes figure-fullpath-grid-timing.png

3 May 2019

figure-fullpath-db-timing.R makes

figure-fullpath-db-timing.png

figure-loss-small-evals.R

figure-loss-small-evals.tex and figures.pdf

6 Feb 2019

no.intermediates.selected.R exhibits a set of valid loss/complexity values for which no intermediates are selected – how many pops does this cause?

4 Feb 2019

loss.small.R computes full path of loss values for all 13,000+ neuroblastoma data sets with less than 1000 data points.

figure-loss-small.R visualizes the corresponding model selection functions. viz

figure-loss-small.png

figure-loss-small-data.R also shows the data sets and segmentation models. viz

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published