You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For #182, we are going to need to define mappings between R syntax and functions that are part of the Substrait spec. The functions we have now are defined in an ad-hoc way to make the tests pass, but the current syntax is not well-suited to defining a lot of these.
The Arrow package uses syntax like:
register_binding("base::as.logical", function(x) {
build_expr("some_fun", all, the, args)
})
...we should maximize compatibility with that syntax because we'll need to copy those bindings at some point. On the flip side, we now have (will shortly have) the ability to use bindings within functions so that we can do stuff like:
With the ability to define bindings/translations, we also need the ability to do the translation. Right now our evaluation strategy is a bit complicated and that could be revisited here.
The text was updated successfully, but these errors were encountered:
I had to look into this a little in #181 because the substrait.Expression.ScalarFunction definition changed a little.
In Arrow, we define alternative versions of functions in an environment and use that environment to "mask" calls to functions we support. A simplified version is here:
library(rlang)
substrait_funcs<- new.env(parent= emptyenv())
substrait_funcs$nchar<-function(x) {
# something returning a substrait.Expression
message("The special function is getting called")
nchar(x)
}
current_columns<-list(col1="some value")
# without our function redefines
eval_tidy(quo(nchar(col1)), data=current_columns)
#> [1] 10# with our function redefineseval_mask<- c(current_columns, as.list(substrait_funcs))
eval_tidy(quo(nchar(col1)), data=eval_mask)
#> The special function is getting called#> [1] 10
Currently in Substrait, we do a custom evaluation strategy where we walk the syntax tree ourselves whilst carefully replacing function calls. This is much more complicated and more error-prone on our end, and I'd like to move to something more like Arrow does. Every time I've tried refactoring in this direction I run into some problems. I think these are solvable but at the time I didn't have a good idea of the big picture. With the new function work that @thisisnic is working on, I think there's a better way.
Part of what the custom evaluation thing does is keep track of which functions are actually used. The substrait.Plan contains a manifest of all the kernels (i.e., function name + input argument combinations) that are used in the plan (as opposed to which functions are available), so we need some way of keeping track of that. Instead of keeping track as we go, we can either use an active binding or post-process the substrait.Expression to make the list smaller before we send the Plan to the consumer.
Another part of what custom evaluation does is enable type resolution. When function arguments are evaluated using regular R evaluation, sometiemes we get an object (notably, the field reference) whose type can't be resolved unless we have access to a SubstraitCompiler. We can provide access to the compiler via some global variable (e.g., expose substrait::current_compiler() or something), although I think the reason that we needed access to the types was to be able to calculate the output type, which is a property of the expression that we have to return from our special translation of the function. Another thing we could do is leave this blank and walk the expression after evaluation and fill in the output types.
For #182, we are going to need to define mappings between R syntax and functions that are part of the Substrait spec. The functions we have now are defined in an ad-hoc way to make the tests pass, but the current syntax is not well-suited to defining a lot of these.
The Arrow package uses syntax like:
...we should maximize compatibility with that syntax because we'll need to copy those bindings at some point. On the flip side, we now have (will shortly have) the ability to use bindings within functions so that we can do stuff like:
With the ability to define bindings/translations, we also need the ability to do the translation. Right now our evaluation strategy is a bit complicated and that could be revisited here.
The text was updated successfully, but these errors were encountered: