Skip to content
Jan Gorecki edited this page Apr 17, 2020 · 5 revisions

This wiki page is meant to collect information that are useful for efficient R C api use, which itself is not very well documented.

finding C source of base R function

pryr package is doing that very well.

#install.packages("pryr")

print(sum) # take the body and paste into pryr::show_c_source
pryr::show_c_source(.Primitive("sum"))

difference of length and truelength (by @mattdowle)

truelength is the allocated length. length is amount used. truelength was an unused field in R until recently. Now, finally, truelength is used as it was intended by Ross originally (allocated length).

releasing memory after setting new truelength (by @mattdowle)

You can't set a new truelength. That's the actual allocation on R's heap / or allocated using malloc by R (R can do both depending on the size of the vector and how it has been configured/compiled). If a length is set smaller than truelength, though, which we do in data.table (e.g. at the end of fread) then the memory leak can be solved. I was told by an R core member there is a new 'growable' bit that can be set. When growable is set, gc() releases truelength rather than length, so the workarounds at the top of assign.c can be removed. It should have been like that in the first place in R, but for whatever reason they didn't use truelength at all

lazy evaluation handling in C (by @2005m)

Very good example is the code contributed in fcaseR by @2005m, related lines are https://github.com/Rdatatable/data.table/pull/4021/files#diff-25cd0b0c089d5976de15097388ff5683R153-R162

when should I NOT use restrict when declaring C pointer? (by @mattdowle)

We should not use restrict when two threads update a shared variable, for example from within an atomic or critical, iiuc. I even found something online somewhere that even const together with restrict is beneficial too.

[x]length vs [x]LENGTH (by @mattdowle)

LENGTH only when you're sure it's a vector. xlength not length to support long vectors as intended by the int64_t type. xlength returns R_xlen_t, length returns R_len_t.

printing and raising exceptions from openmp parallel region (by @jangorecki)

We use OpenMP for making many routines parallelized. Special care has to be taken inside the regions that uses OpenMP. One of the restrictions is that you must not print to console, or raise exceptions. One way to deal with it is to defer those, and emit outside of parallel region. If the exception is happening then you can set own flag variable, then based on that flag escape all further computations (from all threads). Once outside of parallel region, raise the exception, or emit print.

In data.table we have a dedicated structure, that meant to carry results of the computation together with console output, messages, warnings, errors. Then it is easy to pass all those informations between functions, as a single object. This structure, named ans_t, defined in src/types.h, has been used in rolling function, and NA fill function. If you would like to use ans_t please see usage of it in those functions.

measure time (by @st-pasha)

The easy way to measure time in a platform independent way is to use OpenMP omp_get_wtime function.

const bool verbose = GetVerbose();
double tic, toc;
if (verbose)
  tic = omp_get_wtime();
/* my processing */
if (verbose)
  toc = omp_get_wtime();
if (verbose)
  Rprintf("My processing took %.3fs\n", toc - tic);

code coverage

We use codecov R package for code coverage. It works for a C code as well, but not that precisely as for the R code. Because of that to have a proper code coverage, the if branches should have their body in a new line.

if (verbose) Rprintf("this message will never be properly checked");
if (verbose)
  Rprintf("but this message will");

It is because if (verbose) already marks the first line as covered.

using nocov in C code

If there is an internal error, so the error that should not be reachable by the normal use of the package, we are likely to be unable to test such error. Then nocov keyword should be used to exclude specific line from the codecoverage report. To make it work in C we have to still keep the R's comment sign # before the nocov keyword.

error("the comment of the right is not the proper one"); // nocov
error("this is the proper way to use nocov in C"); // # nocov