rebase_expr
should probably use transform_down
and not transform_up
#8280
Labels
bug
Something isn't working
Describe the bug
While working with
first_value
aggregation function I noticed what seems to be a minute, albeit general, deficiency in therebase_expr
function, which in the case below leads to an error.Namely, when re-basing an expression against a list of provided base expressions transformations are done in a post-order fashion:
https://github.com/apache/arrow-datafusion/blob/2156dde54623d26635d4388d161d94ac79918cdc/datafusion/sql/src/utils.rs#L64-L69
The issue below arises when this list of base expressions contains overlapping elements, e.g. the first base expression (group by clause) is a sub-expression of the second base expression (the
first_value
aggregation). Since the transformation is done inside-out, once the first sub-expression is columnized, the original expression gets changed and so it can't be re-based anymore (even when the original root expression is also an available base expression as is the case below).However, even regardless of that error, I think it could be argued that
rebase_expr
should perform pre-order traversal in general:rebase_expr
should strive to re-base on the outermost available base expression in order to minimize (however negligible) the computation that needs to be done on top of it. The only reservation I have here is if somehow there is some logic that relies on repeated inside-out columnization which can only be achieved viatransform_up
(this doesn't seem to be the case judging by the tests).transform_up
transforms the current sub-expression only after visiting it (and it's children),transform_down
transforms it before traversing any child expressions. Since the transformation involved is columnization, which effectively prunes that branch of the expression tree, this leads to strictly less-or-equal number of node visits thantransform_up
.To Reproduce
Expected behavior
Additional context
No response
The text was updated successfully, but these errors were encountered: