DSArray ("desiree") provides efficient in-memory representation of 3-dimensional arrays that contain many duplicate slices via the DSArray (Duplicate Slice Array) S4 class. A basic array-like API is provided for instantiating, subsetting, and combining DSArray objects.
You can get the development version from GitHub:
devtools::install_github("PeteHaitch/DSArray")
This package serves a niche purpose. However, since I've found it useful, I'm making it publicly available. Here is the problem in words and a picture illustrating the solution that DSArray offers.
Suppose you have data on a set of n
samples where each sample's data can be represented as a matrix (x1
, ..., xn
) where dim(x1) = ... = dim(xn) = c(nrow, ncol)
. We can combine these matrices along a given dimension to form a 3-dimensional array, x
. DSArray is designed for the special case where there are many duplicate slices of x
. Continuing our example, if each of the x1
, ..., xn
have duplicate rows and we combine x1
, ..., xn
to form x
such that x[, j, ]
represents xj
, then for this special case we can efficiently represent the data by storing only the unique rows of the x1
, ..., xn
and an associated index. A picture will hopefully help make this clearer:
In this example we have n = 3
matrices, each shown as a slice of x
(x[, 1, ]
, x[, 2, ]
, x[, 3, ]
) with nrow = 20
and ncol = 8
, where the colour of the row identifies identical rows. Note that the same row may be found multiple times within a sample and may also be common to multiple samples. We can construct the DSArray representation of x
by calling DSArray(x)
. The DSArray representation has a key and a val, much like an associative array, map, or dictionary. The j-th column of the key is the key for the j-th sample (note the colour ordering of each sample). The val contains all unique rows found in the n
samples.
We can reconstruct the data for a particular sample by expanding the val by the relevant column of the key. We can often compute the required summaries of the data while retaining this sparse representation. In this way, a DSArray is similar to using a run length encoding of a vector or a sparse matrix representation to leverage the additional structure in the object.
The aim is to allow a DSArray to be used as a drop-in replacement for an array from the base package when the need arises. The DSArray API is therefore written to mimic the array API so that DSArray objects behave as if they were 3-dimensional array objects.
DSArray includes extensive documentation available through the R help system:
# See all documentation for the package
help(package = "DSArray")
# See documentation for the DSArray class
?`DSArray-class`
While DSArray implements many methods that allow DSArray objects to be used as drop-in replacements for array objects, the coverage is not 100% complete. I am adding these as needed, so if something you require is missing then please get in touch by filing a feature request at https://github.com/PeteHaitch/DSArray/issues.
Of course, code contributions and bug reports (and fixes!) are most welcome. Please make any pull requests against the master branch at https://github.com/PeteHaitch/DSArray and file issues at https://github.com/PeteHaitch/DSArray/issues.