-
Notifications
You must be signed in to change notification settings - Fork 7
/
proposal.Rmd
196 lines (142 loc) · 8.81 KB
/
proposal.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
title: "ergo: high level interface between R and Go"
author: "Romain François"
date: 2018/03/31
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# The problem
R is an amazing interpreted language, giving a flexible and agile foundation for Data Science.
Efforts such as Rcpp and reticulate have established that it can be an advantage
to pair R with another programming language. Sometimes for speed, sometimes to
have alternative options of expression, sometimes to have access to existing
libraries.
Go (`https://golang.org`) is an open source programming language that makes it easy to
build simple, reliable and efficient software. It is sometimes said to be the language
C++ should have been, in particular if it did not carry a strong commitment to backwards
compatibility to C and a taste for complexity.
Go is beautiful and simple, its standard library is one of the most impressive
for a programming language. It comes with concurrency built in, which includes
(but is not limited to) running code in parallel. The static site generator [hugo](https://gohugo.io),
the containerization plaform [docker](https://www.docker.com/), and the profiling
utility [pprof](https://github.com/google/pprof) are examples
of systems that are built with Go.
There currently is no end to end solution to easily connect R and Go, i.e. invoke Go code
from R, and this is what the `ergo` project is about, the ability for R packages to
leverage existing or original Go code. Having Go as an alternative high performance language will open
interesting avenues for R package development.
# Prior art
## Rcpp
The Rcpp package (`https://github.com/RcppCore/Rcpp`) is the current state of the practice to connect C++ code with R,
used by over 1300 CRAN packages. With Rcpp it is very easy to make a C++ function callable from R: mark the function with [attributes](https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-attributes.pdf), and Rcpp generates the glue code that takes care of converting input and output data and propagating errors.
Rcpp offers classes such as `NumericVector`, and `List`,
that allow accessing most R data structures in a way idiomatic to C++.
## rmq
The rmq package (`https://github.com/glycerine/rmq`) is scoped
as a proof of concept to embed a Go library in an R package.
The code uses msgpack, a serialization protocol, to pass data between R and Go,
and implements a client-server system using websockets.
## r-go proof of concept
Independently of rmq, I have written a series of blog posts at `https://purrple.cat/tags/go/` that describe how to
embed Go code in an R package so that it is compiled when the R package is installed.
I show how to call functions in the library and how to pass data to and from Go.
Even though the code in the blog posts has been written manually, it is explicitely
divided in two different categories:
- Code that the user would write. This is typical Go code using Go data structures such
as Go strings and slices.
- Code that uses both `cgo` and `R` internal `C` apis. This code eventually is supposed to
be generated automatically at development time.
# The plan
The previous efforts to connect C++ and Go to R give me enough confidence about the feasability of the project.
As opposed to `Rcpp` which is a dependency in all stages (development, build, runtime),
`ergo` will only be *development time dependency* that facilitates the generation of
code to interface R and Go via their respective C apis. From the point of view of the user of `ergo`,
the workflow will be:
- Write code in idiomatic Go somewhere standard in the package
- Generate boiler plate (Go, R, C) with functions from `ergo`
- Call that code from R functions of the package
The role of `ergo` is to hide the C layer entirely, so that users can focus on writing Go and R code.
In a way, this is similar to the feature of Rcpp attributes,
but it goes further. Once `ergo` has generated the code, the target package is autonomous. This gives
more control to the package authors and eliminate a set of issues related to dependency.
## First iteration
Automatic generation
of boiler plate code to connect all basic R vector types and their associated
scalar types in both function inputs and return types.
The Go standard library
includes a [parser](https://golang.org/pkg/go/parser/)
package, this gives an abstract syntax tree of Go code that
can drive the code generation.
The details of the connection implemented by the boiler plate are subject to
further research, I will also consider a msgpack-based connection like in rmq.
At that stage, the project will need community adoption.
## Second iteration
The second step will
involve promotion and development of use cases that demonstrate the use of `ergo`,
this will without doubt reveal needs that were not planned for.
A dedicated blogdown/hugo powered site (initially `https://go.rbind.io`)
will be used throughout the various phases of the project, perhaps with
separate sections to isolate the technical
issues and feedback related to the development of `ergo` itself, from use case
material, perhaps featuring invited posts from the community.
The third step of the plan will consider the distribution of such packages,
can we use CRAN? If not, what else? Do we need code inside the base R distribution,
i.e. something similar to `R CMD javareconf` to help mitigate these issues?
## Failure modes
Go is currently not one of the languages supported by R, which might create friction
down the line. In early phases of development, we can get away with
having installation instructions about the tools needed to use `ergo`.
But ultimately a package with Go code should be as easy to install as any other R package,
in all the supported platforms.
Admittedly, this project does not have
specific use cases in mind, but at the same time it would have been impossible
to imagine the importance of Rcpp when it was first developed.
## About the author
I have more than 12 years professional experience with R and a strong commitment
to open source development.
I am a co-author of Rcpp that now has more than 1000 downstream dependencies,
the experience of having written Rcpp is extremely valuable in the pursuit of this project.
Curiosity about Go led me to incept this project in the summer of 2017.
## Commitment
I will take the lead on the project and spend one day per week on this project (typically Friday).
The time will be used for design, implementation, testing, documentation and community engagement.
# Project Milestones
## First iteration. $25k
This is the main lump of technical work on this project. This is when `ergo` starts to
materialise as a development time R package. I will take the necessary time to
produce frequent updates on the progress as blog articles, and engage the
community as much as possible to ensure that `ergo` is aligned with the
expectations of features and simplicity.
Sometimes the development will have to pause so that example use case can
be discussed broadly with the community. Go is a simple language, I don't want
ergo to be complicated to use.
## Second iteration. $15k
The second iteration is about reacting to community feedback, apply lessons learned,
and tighten the distribution of packages with code generated by `ergo`.
# How Can the ISC help
## Funding
The main resource I need is funding for my time. With the milestones described above,
I can serenely spend the time this project needs to be a success.
## Advice on Licensing
Go is distributed under a [BSD-style license](https://golang.org/LICENSE), it makes sense
to release `ergo` under a similar license, if not exactly the same. Feedback from the
consortium on this matter is welcome.
Since `ergo` is a development time dependency, are there issues to be concerned about
in terms of the generated code. Is it licensed the same as its host project, or licensed
as the `ergo` package.
# Dissemination
The project will most likely require several public github repositories.
I have set up the [rstats-go](https://github.com/rstats-go) organisation to manage these.
The community will be encouraged to engage with these repos.
A blogdown/hugo website (`https://go.rbind.io`) is in place
to host blog posts related to the development, case studies, and documentation.
In addition, I plan to document the progress from a bird's eye view on the consortium's blog.
Depending on the community's need for instant interraction, we can set up
a slack team, or a gitter community.
A more formal scientific article in e.g. R Journal or the Journal of Statistical Software
will be considered once the project is stable enough.
It is unclear at the time of writing this proposal if `ergo` and the packages
containing `ergo` generated code can be hosted on CRAN. Both situations
can be considered, but CRAN delivered packages are preferable.