Establish syntax for defining a translation/binding #194

paleolimbot · 2022-08-31T19:07:43Z

For #182, we are going to need to define mappings between R syntax and functions that are part of the Substrait spec. The functions we have now are defined in an ad-hoc way to make the tests pass, but the current syntax is not well-suited to defining a lot of these.

The Arrow package uses syntax like:

register_binding("base::as.logical", function(x) {
  build_expr("some_fun", all, the, args)
})

...we should maximize compatibility with that syntax because we'll need to copy those bindings at some point. On the flip side, we now have (will shortly have) the ability to use bindings within functions so that we can do stuff like:

register_binding_that_contains_bindings("some_binding", function(x) {
  nchar(x) + 1L
})

With the ability to define bindings/translations, we also need the ability to do the translation. Right now our evaluation strategy is a bit complicated and that could be revisited here.

paleolimbot · 2022-09-12T14:09:52Z

I had to look into this a little in #181 because the substrait.Expression.ScalarFunction definition changed a little.

In Arrow, we define alternative versions of functions in an environment and use that environment to "mask" calls to functions we support. A simplified version is here:

library(rlang)

substrait_funcs <- new.env(parent = emptyenv())
substrait_funcs$nchar <- function(x) {
  # something returning a substrait.Expression
  message("The special function is getting called")
  nchar(x)
}

current_columns <- list(col1 = "some value")

# without our function redefines
eval_tidy(quo(nchar(col1)), data = current_columns)
#> [1] 10

# with our function redefines
eval_mask <- c(current_columns, as.list(substrait_funcs))
eval_tidy(quo(nchar(col1)), data = eval_mask)
#> The special function is getting called
#> [1] 10

Currently in Substrait, we do a custom evaluation strategy where we walk the syntax tree ourselves whilst carefully replacing function calls. This is much more complicated and more error-prone on our end, and I'd like to move to something more like Arrow does. Every time I've tried refactoring in this direction I run into some problems. I think these are solvable but at the time I didn't have a good idea of the big picture. With the new function work that @thisisnic is working on, I think there's a better way.

Part of what the custom evaluation thing does is keep track of which functions are actually used. The substrait.Plan contains a manifest of all the kernels (i.e., function name + input argument combinations) that are used in the plan (as opposed to which functions are available), so we need some way of keeping track of that. Instead of keeping track as we go, we can either use an active binding or post-process the substrait.Expression to make the list smaller before we send the Plan to the consumer.

Another part of what custom evaluation does is enable type resolution. When function arguments are evaluated using regular R evaluation, sometiemes we get an object (notably, the field reference) whose type can't be resolved unless we have access to a SubstraitCompiler. We can provide access to the compiler via some global variable (e.g., expose substrait::current_compiler() or something), although I think the reason that we needed access to the types was to be able to calculate the output type, which is a property of the expression that we have to return from our special translation of the function. Another thing we could do is leave this blank and walk the expression after evaluation and fill in the output types.

paleolimbot self-assigned this Aug 31, 2022

thisisnic mentioned this issue Sep 1, 2022

Implement translations for Substrait primitive functions #182

Open

12 tasks

paleolimbot mentioned this issue Sep 8, 2022

Update .proto files to the same version that Arrow is using #181

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Establish syntax for defining a translation/binding #194

Establish syntax for defining a translation/binding #194

paleolimbot commented Aug 31, 2022

paleolimbot commented Sep 12, 2022

Establish syntax for defining a translation/binding #194

Establish syntax for defining a translation/binding #194

Comments

paleolimbot commented Aug 31, 2022

paleolimbot commented Sep 12, 2022