Skip to content

Latest commit

 

History

History
109 lines (87 loc) · 6.49 KB

CONTRIBUTING.md

File metadata and controls

109 lines (87 loc) · 6.49 KB

Contributing a distance measure

Thanks to the proxy package, implementing a custom distance, either entirely in R or using C/C++, is relatively easy. However, thanks to RcppParallel, implementing a C++ distance with support for multi-threading is also possible. The framework used by dtwclust will be explained here, assuming familiarity with R's C/C++ interface and C++ itself.

Foreword

One of the packages that is linked against is RcppArmadillo. When one works with it, the Rcpp header should not be included, since its functionality is included automatically by RcppArmadillo. However, the RcppArmadillo header increases compilation times considerably, so it should only be included when necessary. This is why many dtwclust source files contain several classes in the same file, and it will also explain some of the design choices below; this reduces compilation times significantly.

Stand-alone function

Stand-alone means that it can be used directly without going through proxy. On the R side there are pretty much no restrictions, the function can have any number of parameters, but it's probably better if consistency checks are done in R. See for example dtw_basic.

On the C++ side the distance should be declared in the corresponding header, registered in the initialization, and defined. Importantly, if the stand-alone version will serve as a basis for the proxy version, the core calculations should be done independently of any R/Rcpp API, depending either on raw pointers, or on some kind of wrapper; such a wrapper is available in this template. These core functions should be declared in the internal distances header. Since they don't include many third-party headers, they can be implemented in different files; see for example dtw_basic.

proxy function

A DistanceCalculator has to be implemented. The concrete class should be declared there, and added to the factory. Since all time series are passed from R as a list of vectors, matrices, or complex vectors, there is the TSTSList templated class that works with Rcpp's NumericVector, NumericMatrix and ComplexVector, saving them as Armadillo's mat and cx_mat so that they are thread-safe. Univariate series are saved as matrices with 1 column.

Constructor

It is expected that the DistanceCalculator's constructor will take 3 SEXP parameters that will all contain Rcpp::Lists with: any arguments for the distance, the time series in x (from proxy), and the time series in y (from proxy). See for example the DtwBasicCalculator, and note that x and y are simply given to the TSTSList template.

Important: if the concrete class has any private members that should be unique to each thread, they should not be defined in the constructor. See next part.

Clone

All concrete implementations should have a clone method that returns a pointer to a new instance. This method will be called from the different threads, and each thread will delete the clone when it's done. If there are private members that should be unique to each thread (e.g. dynamically allocated memory), they should be set-up during cloning. For example, see what DtwBasicCalculator does.

Calculate

The factory method calculate takes 2 integers i and j, and it is expected that it returns the distance between x[i] and y[j]. Some calculators dispatch to the appropriate method directly, but others can pass more parameters as needed. Also note how in most cases the core calculations are further delegated to the core functions described above, but it's not always necessary. For instance, the SbdCalculator does the calculations directly, since the stand-alone version is implemented entirely in R.

R side

All functions have a similar structure (dtw_basic used as an example):

They are registered with proxy during loading, and unregistered during unload.

Final details

The distance should be added to the internal globals, and there it can also specify if it supports series of different length and/or multivariate series.

There should be unit tests for stand-alone functions, as well as integration tests for proxy distances.