Skip to content

Latest commit

 

History

History
367 lines (300 loc) · 12.3 KB

README.org

File metadata and controls

367 lines (300 loc) · 12.3 KB

rnim - A bridge between R ⇔ Nim

Currently this is a barely working prototype.

Calling R functions from Nim works reasonably well, if basic Nim types are used. Both named and unnamed function arguments are supported.

The R SEXP object can be converted into all Nim types, which are supported in the other direction.

Interfacing with shared libraries written in Nim works for basic types. See the tNimFromR.nim and tCallNimFromR.R files for an example in tests.

Basic syntax to call R from Nim

Intefacing with R from Nim works by making use of the Rembedded.h functionality, which effectively launches a silent, embedded R repl.

This repl is then fed with S expressions to be evaluated. The S expression is the basic data type on the C side of R. Essentially everything is mapped to different kinds of S expressions, be it symbols, functions, simple data types, vectors etc.

This library aims to hide both the data conversions and memory handling from the user.

This means that typically one sets up the R repl, does some calls to R and finally shuts down the R repl again:

let R = setupR()
# some or many calls to R functions
teardown(R)

The returned R object is essentially just a dummy object, which is used to help with overload resolution (we want untyped templates to allow calling and R function by ident without having to manually wrap them) and it keeps track of the state of the repl.

In order to not have to call the teardown procedure manually, there are two options:

  • a withR template, which takes a block of code and injects a variable R into its calling scope. The repl will be shut down when leaving its scope
  • by compiling with --gc:arc or --gc:orc. In that case we can define a proper destructor, which will be automatically called when the R variable runs out of scope and is destroyed.

Note two things:

  1. in principle there is a finalizer defined for the non ARC / ORC case, which performs the same duty. However, at least according to my understanding, it’s run whenever the GC decides to collect the R variable. This might not be very convenient.
  2. I don’t know whether it’s an inherent limitation of the embedded R repl, but it seems like one cannot destroy an R repl and construct a new one. If one tries, one is greeted by
R is already initialized

message.

Simple usage example

The above out of the way, let’s look at the basic things currently possible.

For clarity I will annotate the types even where not required.

import rnim
let R = setupR()
# perform a call to the R stdlib function `sum`, by using 
# the `.()` dot call template and handing a normal Nim seq
let res: SEXP = R.sum(@[1, 2, 3])
# the result is a `SEXP`, the basic R data type. We can now
# use the `to` proc to get a Nim type from it:
doAssert res.to(int) == 6

Some functions, which have atypical names may not be possible to call via the dot call template. In that case, we can call the underlying macro directly, called callEval (possibly name change incoming…):

doAssert callEval(`+`, 4.5, 10.5).to(float) == 15.0

This also showcases that functions taking multiple arguments work as expected. At the moment we’re limited to 6 arguments (there’s specific C functions to construct calls up to 6 arguments. Need to implement arbitrary numbers manually).

Also named arguments are supported. Let’s use the seq function as an example, the more general version of the : operator in R (e.g. 1:5):

check R.seq(1, 10, by = 2).to(seq[int]) == toSeq(countup(1, 10, 2))

As we can see, we can also convert SEXPs containing vectors back to Nim sequences.

Finally, we can also source from arbitrary R files. Assuming we have some R file foo.R:

hello <- function(name) {
  return(paste(c("Hello", name), sep = " ", collapse = " "))
}

From Nim we can then call it via:

import rnim
# first set up an R interpreter
let R = setupR()
# now source the file
R.source("foo.R")
# and now we can call R functions defined in the sourced file
doAssert R.hello("User").to(string) == "Hello User"

That covers the most basic functionality in place so far.

Vectors (data arrays)

Arrays are always a special case, as they are usually the main source of computational work. Avoiding unnecessary copies of arrays is important to keep performance high.

To provide a no-copy interface to data arrays (R vectors) from R, there are two types to help: NumericVector[T] and RawVector[T]. They provide a nice Nim interface to work with such numerical data.

Any R SEXP can be converted to either of these two types. If the corresponding SEXP does not correspond to a vector, an exception will be thrown at runtime.

These types internally simply keep a copy of the underlying data array in the SEXP.

From a usability standpoint NumericVector[T] is the main type that should be used. RawVector[T] simply provides a slightly lower wrapper, which is however more restrictive.

A RawVector[T] can only be constructed for: cint, int32, float, cdouble. This is because the underlying R SEXP come only in two types: INTSXP and REALSXP, the former stores 32-bit integers and the latter 64-bit floats (technically afaik the platform specific size, so 32-bit floats on a 32-bit machine. The inverse is not the case for INTSXP though!). There is no way to treat a REALSXP vector as a RawVector[int32] for instance.

This is where NumericVector[T] comes in. It can be constructed for all numerical types larger or equal to 32-bit in size (to avoid loss of information when constructing from a SEXP). Unsigned integers so far are also not supported.

A short example:

import rnim
let R = setupR()

let x = @[1, 2, 3]
let xR: SEXP = x.nimToR # types for clarity
var nv = initNumericVector[int](xR)
# `nv` is now a vector pointing to the same data as `xR`
# we can access individual elements:
echo nv[1] # 2
# modify elements:
nv[2] = 5
# check its length
doAssert nv.len == 3
# iterate over it
for i in 0 .. nv.high:
  echo nv[i]
for x in nv:
  echo x
for i, x in nv:
  echo "Index ", i, " contains ", x
# compare them:
doAssert nv == nv
# and print them:
echo nv # NumericVector[int](len: 3, kind: vkFloat, data: [1, 2, 5])
# as `xR` contains the same memory location, constructing another vector
# and comparing them yields `true`, even though we modified `nv`
let nv2 = initNumericVector[int](xR)
doAssert nv == nv2
# finally we can also construct a `NumericVector` straight from a Nim sequence
let nv3 = @[1.5, 2.5, 3.5].toNumericVector()
echo nv3

If you ran this code you will see a message:

Interpreting input vector of type `REALSXP` as int loses information!

This is because we first constructed a SEXP from a 64-bit integer sequence in Nim. As mentioned before, 64-bit integers do not exist. Therefore, the xR SEXP above is actually stored in a REALSXP. By constructing a NumericVector[int] we tell the Nim compiler we wish to convert from and to int, no matter the underlying type of the SEXP array, i.e. INTSXP or REALSXP. The message simply makes you aware that this is happening (it may be taken out in the future).

The fact that this conversion happens internally is the reason for the existence of RawVector, which explicitly disallows this.

Further, NumericVector is actually a variant object. Depending on the runtime type of the SEXP from which we construct a SEXP the correct branch of the variant object will be filled. For extremely performance sensitive application it may thus be preferable to have a type where variant kind checks and possible type conversions do not happen.

Rctx macro

As mentioned in the previous secton, some function names are weird and require the user to use callEval directly.

To make calling such functions a bit nicer, there is an Rctx macro, which allows for directly calling R functions with e.g. dots in their names, and also allows for assignments.

let x = @[5, 10, 15]
let y = @[2.0, 4.0, 6.0]

var df: SEXP
Rctx:
  df = data.frame(Col1 = x, Col2 = y)
  let df2 = data.frame(Col1 = x, Col2 = y)
  print("Hello from R")

where both df as well as df2 will then store an equivalent data frame. The last line shows that it’s also possible to use this macro to avoid the need to discard all R calls.

Calling Nim code from R

Nim can be used to write extensions for R. This is done by compiling a Nim file as a shared library and calling it in R using the .Call interface.

An example can be seen from the tests:

In the near future the latter R file will be auto generated by the Nim code at compile time.

The basic idea is as follows. Assume you want to write an extension that adds two numbers in Nim to be called from R.

You write a Nim file with the desired procedure and attach the {.exportR.} pragma as follows:

myRmodule.nim:

import rnim

proc addNumbers*(x, y: SEXP): SEXP {.exportR.} =
  ## adds two numbers. We will treat them as floats
  let xNim = x.to(float)
  let yNim = y.to(float)
  result = (x + y).nimToR

Note the usage of SEXP as the input and output types. In the future the conversions (and possibly non copy access) will be automated. For now we have to convert manually to and from Nim types.

This file is compiled as follows:

nim c (-d:danger) --app:lib (--gc:arc) myRModule.nim

where the danger and ARC usage are of course optional (but ARC/ORC is recommended).

This will generate a libmyRmodule.so. The resulting shared library in principle needs to be manually loaded via dyn.load in R and each procedure in it needs to be called using the .Call interface.

Fortunately, this can be automated easily. Therefore, when compiling such a shared library, we automatically emit an R wrapper, that has the same name as the input Nim file. So the following file is generated:

myRmodule.R:

dyn.load("libmyRmodule.so")

addNumbers <- function(a, b) {
    return(.Call("addNumbers", a, b))
}

This file can now be sourced from the R interpreter (using the source function) or in an R script and then addNumbers is usable and will execute the compiled Nim code!

Note that the autogeneration logic assumes the shared library and the generated R script will live in the same directory. If you wish to move one, you might have to adjust the paths that perform the dyn.load command!

Trying it out

To try out the functionality of calling R from Nim, you need to meet a few prerequisites.

Setup on Linux

  • a working R installation with a libR.so shared library
  • the shell environment variable R_HOME needs to be defined and has to point to the directory which contains the full R directory structure. That is not the path where the R binary lies! Finally, the libR.so has to be findable for dynamic loading. On my machine the path of it by default isn’t added to ld via /etc/ld.so.conf.d (for the time being I just define LD_LIBRARY_PATH Setup on my machine:
    which R
    echo $R_HOME
    echo $LD_LIBRARY_PATH
        
    /usr/bin/R
    /usr/lib/R
    /usr/lib/R/lib
        

An easy way to set the R_HOME variable is by asking R about it:

R RHOME

returns the correct path. We can use that to set the R_HOME variable:

export R_HOME=`R RHOME`
export LD_LIBRARY_PATH=$R_HOME/lib # maybe not required on your system

Setup on Windows

  • a working R installation with a R.dll shared library
  • the shell environment variable R_HOME needs to be defined and has to point to the directory which contains the full R directory structure. That is not the path where the R binary lies! Example setup:
    where R.dll
    set R_HOME
        
    C:\Program Files\R\R-4.0.4\bin\x64\R.dll
    R_HOME=C:\Program Files\R\R-4.0.4
        

Test your setup

Run the test file:

nim c -r tests/tRfromNim.nim