Skip to content

Latest commit

 

History

History
165 lines (126 loc) · 6.67 KB

proposal.md

File metadata and controls

165 lines (126 loc) · 6.67 KB

ergo proposal

Romain François

The problem

R is an amazing interpreted language, giving a flexible and agile foundation for data science. Efforts such as Rcpp and reticulate have established that it can be an advantage to pair R with another programming language. Sometimes for speed, but most importantly to have alternative options of expression.

Go is an open source programming language that makes it easy to build simple, reliable and efficient software. It is sometimes said to be the language C++ should have been, in particular if it did not carry a strong commitment to backwards compatibility to C and a taste for complexity.

Go is a beautiful and simple, its standard library is one of the most impressive for a programming language. It comes with concurrency built in, which makes it trivial to write concurrent code, that includes (but is not limited to) running code in parallel. The static site generator hugo and the containerization plaform docker are examples of systems that are built on Go.

There currently is no end to end solution to easily connect R and Go, i.e. invoke Go code from R, and this is what the ergo project is about, the ability for R packages to leverage existing or original Go code. There admittedly are no specific use cases in mind, but at the same time it would have been impossible to imagine the importance of Rcpp when it was first developped.

Having Go as an alternative high performance language will open interesting avenues for R package developpment.

The plan

As opposed to Rcpp which is a dependency in all stages (development, build, runtime), ergo will only be development time dependency that facilitates the generation of code to interface R and Go via their respective C apis. From the point of view of the user of ergo, the workflow will be:

  • write code in idiomatic Go
  • generate boiler plate with functions from ergo
  • call that code from R

The role of ergo is to hide the C layer entirely, so that users can focus on writing Go and R code. In a way, this is similar to the feature of Rcpp attributes but it goes further. Once ergo has generated the code, the target package is autonomous.

Prior work is our proof of concept, and even though the code in the blog posts has been written manually, it is explicitely divided in two different categories: - code that the user would write. This is typical Go code using Go data structures such as Go strings and slices. - code that uses both cgo and R internal C apis. This code eventually is supposed to be generated automatically at development time. This previous effort gives us enough confidence about the feasability of the project.

The first step: Automatic generation of boiler plate code to connect all basic R vector types and their associated scalar types in both function inputs and return types. The Go standard library includes a parser package, this gives an abstract syntax tree of Go code that can drive the code generation.

At that stage, the project will need community adoption. The second step will involve promotion and development of use cases that demonstrate the use of ergo, this will without doubt reveal needs that were not planned for.

A dedicated blogdown/hugo powered site (initially https://go.rbind.io) will be used throughout the various phases of the project, perhaps with separate sections to isolate the technical issues and feedback related to the development of ergo itself, from use case material, perhaps featuring invited posts from the community.

Failure modes

Go is currently not one of the languages supported by R, which might create friction down the line. In early phases of development, we can get away with having installation instructions about the tools needed to use ergo.

But ultimately a package with Go code should be as easy to install as any other R package. The third step of the plan will consider the distribution of such packages, can we use CRAN? If not, what else? Do we need code inside the base R distribution, i.e. something similar to R CMD javareconf to help mitigate these issues.

About the author

I am a co-author of Rcpp that now has more than 1000 downstream dependencies, the experience of having written Rcpp is extremely valuable in the pursuit of this project. Curiosity about Go led me to inception this project in the summer of 2017. I have more than 12 years professional experience with R and a strong commitment to open source development.

Commitment

I will take the lead on the project and spend a one day per week on this project (typically Friday).

The time will be used for implementation, testing, documentation and community engagement.

Project Milestones

First iteration. $25k + expenses

This is the main lump of technical work on this project. This is when ergo starts to materialise as a development time R package.

By design, ergo is only needed by package developers. End users of a package that contains code generated by ergo do not need to have ergo installed.

Second iteration. $10k + expenses

The second iteration is about reacting to community feedback, apply lessons learned, and tighten the distribution of packages with code generated by ergo.

How Can the ISC help

Funding

The main resource I need is funding for my time. With the milestones described above, I can serenely spend the time this project needs.

Travel

This probably will require promotion in various meetups and conferences for which I’d like to request funding.

Advice on Licensing

Go is distributed under a BSD-style license, it makes sense to release ergo under a similar license, if not exactly the same. Feedback from the consortium on this matter is welcome.

Dissemination

The project will most likely require several public github repositories. The rstats-go organisation is setup to manage these. The community will be encouraged to engage with these repos.

A blogdown/hugo powered website (https://go.rbind.io) will be setup to host blog posts related to the development, case studies, and documentation. In addition, I plan to document the progress from a bird’s eye view on the consortium’s blog. Depending on the community’s need for instant interraction, we can setup a slack team, or a gitter community.

It is unclear at the time of writing this proposal if ergo and the packages containing ergo generated code can be hosted on CRAN. Both situations can be considered, but CRAN delivered packages are preferable.