Romain François
R is an amazing interpreted language, giving a flexible and agile foundation for data science. Efforts such as Rcpp and reticulate have established that it can be an advantage to pair R with another programming language. Sometimes for speed, but most importantly to have alternative options of expression.
Go is an open source programming language that makes it easy to build simple, reliable and efficient software. It is sometimes said to be the language C++ should have been, in particular if it did not carry a strong commitment to backwards compatibility to C and a taste for complexity.
Go is a beautiful and simple, its standard library is one of the most impressive for a programming language. It comes with concurrency built in, which makes it trivial to write concurrent code, that includes (but is not limited to) running code in parallel. The static site generator hugo and the containerization plaform docker are examples of systems that are built on Go.
There currently is no end to end solution to easily connect R and Go,
i.e. invoke Go code from R, and this is what the ergo
project is
about, the ability for R packages to leverage existing or original Go
code. There admittedly are no specific use cases in mind, but at the
same time it would have been impossible to imagine the importance of
Rcpp when it was first developped.
Having Go as an alternative high performance language will open interesting avenues for R package developpment.
As opposed to Rcpp
which is a dependency in all stages (development,
build, runtime), ergo
will only be development time dependency that
facilitates the generation of code to interface R and Go via their
respective C apis. From the point of view of the user of ergo
, the
workflow will be:
- write code in idiomatic Go
- generate boiler plate with functions from
ergo
- call that code from R
The role of ergo
is to hide the C layer entirely, so that users can
focus on writing Go and R code. In a way, this is similar to the feature
of Rcpp
attributes
but it goes further. Once ergo
has generated the code, the target
package is autonomous.
Prior work is our proof of concept, and
even though the code in the blog posts has been written manually, it is
explicitely divided in two different categories: - code that the user
would write. This is typical Go code using Go data structures such as Go
strings and slices. - code that uses both cgo
and R
internal C
apis. This code eventually is supposed to be generated automatically at
development time. This previous effort gives us enough confidence about
the feasability of the project.
The first step: Automatic generation of boiler plate code to connect all basic R vector types and their associated scalar types in both function inputs and return types. The Go standard library includes a parser package, this gives an abstract syntax tree of Go code that can drive the code generation.
At that stage, the project will need community adoption. The second step
will involve promotion and development of use cases that demonstrate the
use of ergo
, this will without doubt reveal needs that were not
planned for.
A dedicated blogdown/hugo powered site (initially https://go.rbind.io
)
will be used throughout the various phases of the project, perhaps with
separate sections to isolate the technical issues and feedback related
to the development of ergo
itself, from use case material, perhaps
featuring invited posts from the community.
Go is currently not one of the languages supported by R, which might
create friction down the line. In early phases of development, we can
get away with having installation instructions about the tools needed to
use ergo
.
But ultimately a package with Go code should be as easy to install as
any other R package. The third step of the plan will consider the
distribution of such packages, can we use CRAN? If not, what else? Do we
need code inside the base R distribution, i.e. something similar to R CMD javareconf
to help mitigate these issues.
I am a co-author of Rcpp that now has more than 1000 downstream dependencies, the experience of having written Rcpp is extremely valuable in the pursuit of this project. Curiosity about Go led me to inception this project in the summer of 2017. I have more than 12 years professional experience with R and a strong commitment to open source development.
I will take the lead on the project and spend a one day per week on this project (typically Friday).
The time will be used for implementation, testing, documentation and community engagement.
This is the main lump of technical work on this project. This is when
ergo
starts to materialise as a development time R package.
By design, ergo
is only needed by package developers. End users of a
package that contains code generated by ergo
do not need to have
ergo
installed.
The second iteration is about reacting to community feedback, apply
lessons learned, and tighten the distribution of packages with code
generated by ergo
.
The main resource I need is funding for my time. With the milestones described above, I can serenely spend the time this project needs.
This probably will require promotion in various meetups and conferences for which I’d like to request funding.
Go is distributed under a BSD-style
license, it makes sense to release ergo
under a similar license, if not exactly the same. Feedback from the
consortium on this matter is welcome.
The project will most likely require several public github repositories. The rstats-go organisation is setup to manage these. The community will be encouraged to engage with these repos.
A blogdown/hugo powered website (https://go.rbind.io
) will be setup to
host blog posts related to the development, case studies, and
documentation. In addition, I plan to document the progress from a
bird’s eye view on the consortium’s blog. Depending on the community’s
need for instant interraction, we can setup a slack team, or a gitter
community.
It is unclear at the time of writing this proposal if ergo
and the
packages containing ergo
generated code can be hosted on CRAN. Both
situations can be considered, but CRAN delivered packages are
preferable.