tatami_r is an header-only library for reading abstract R matrices in tatami. This allows tatami-based C++ functions to accept and operate on any matrix-like R object containing numeric data. Usage is as simple as:
#include "tatami_r/tatami_r.hpp"
SEXP some_typical_rcpp_function(Rcpp::RObject x) {
auto ptr = std::make_shared<tatami_r::UnknownMatrix<double, int> >(x);
// Do stuff with the tatami::Matrix.
ptr->nrow();
auto row_extractor = ptr->dense_row();
auto first_row = row_extractor->fetch(0);
}
And that's it, really. If you want more details, you can check out the reference documentation.
tatami_r assumes that the hosting R instance has loaded the DelayedArray package.
The UnknownMatrix
getters will then use the extract_array()
and extract_sparse_array()
R functions to retrieve data from the abstract R matrix.
Note that this involves calling into R from C++, so high performance should not be expected here.
Rather, the purpose of tatami_r is to ensure that tatami-based functions keep working when a native representation cannot be found for a particular matrix-like object.
It is worth mentioning that the UnknownMatrix
will always call the extract_*_array()
functions, even when a native representation exists in tatami or one of its extension libraries.
R package developers should use the initializeCpp()
function from the beachmat package to map an arbitrary matrix to its appropriate representation.
When such mappings exist, this allows the C++ code to operate without calling back into R for maximum efficiency.
If no mapping is known, beachmat will gracefully fall back to an UnknownMatrix
to keep things running.
Given a tatami_r::UnknownMatrix
or a tatami::Matrix*
that might refer to one, we can easily parallelize operations with the tatami_r::parallelize()
function.
This accepts a lambda/functor with the thread ID and the range of jobs (in the example below, rows) to be processed.
tatami_r::parallelize([&](size_t thread_id, int start, int len) -> void {
// Do something with the UnknownMatrix.
auto ext = ptr->dense_row();
std::vector<double> buffer(ptr->ncol());
for (int r = start, end = start + len; start < end; ++r) {
auto out = ext->fetch(r, buffer.data());
// Do something with each row.
}
}, ptr->nrow(), num_threads);
Any calls to the extract_*_array()
R functions are made thread-safe by the manticore library.
Developers can also access the manticore executor to safely perform their own R API calls from each thread.
auto& mexec = tatami_r::executor();
tatami_r::parallelize([&](size_t thread_id, int start, int len) -> void {
mexec.run([&]() -> void {
// Do something that touches the R API.
});
}, ptr->nrow(), num_threads);
Check out the comments about safe parallelization for more gory details.
tatami_r is intended to be compiled with other relevant C++ code inside an R package using Rcpp.
This is most easily done by modifying the package DESCRIPTION
with:
LinkingTo: beachmat, assorthead, Rcpp
which will automatically use the vendored copies of tatami_r (and tatami) inside the assorthead package,
along with some of pre-configured macro definitions for safe parallelization in beachmat's Rtatami.h
header.
Note that C++17 is required.
If assorthead or beachmat cannot be used, the R package developer will need to copy the tatami_r and tatami include/
directories into the package's inst/include
,
and then add a Makevars
file like:
PKG_CPPFLAGS = -I../inst/include