Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TreeNode recursions #7942

Closed
wants to merge 11 commits into from

Conversation

peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Oct 26, 2023

Which issue does this PR close?

This PR is a proof of concept to refactor TreeNode recursions and offer better alternatives to current tree visit and transform/rewrite functions. Currently the PR contains multiple realted ideas that can be splitted into smaller changes if any of those look reasonable for the community.

Rationale for this change

  1. This PR introduces TreeNodeTransformer trait (to replace TreeNodeRewriter):

    pub trait TreeNodeTransformer: Sized {
        /// The node type which is visitable.
        type Node: TreeNode;
    
        /// Invoked before any inner children or children of a node are modified.
        fn pre_transform(&mut self, node: &mut Self::Node) -> Result<TreeNodeRecursion>;
    
        /// Invoked after all inner children and children of a node are modified.
        fn post_transform(&mut self, node: &mut Self::Node) -> Result<TreeNodeRecursion>;
    }

    The main changes in the behavior of the TreeNodeTransformer compared to the old TreeNodeRewriter is that the pre_transform() and post_transform() methods are mutating the nodes in place (node: &mut Self::Node).
    This change has the advantages over the value consuming and producing fn mutate(&mut self, node: Self::N) -> Result<Self::N> that the self mutating behaviour encourage developers to reuse the exsinting objects / memory allocations so as to write more effective transformation closures.
    The current implementation of fn map_children<F>(self, transform: F) -> Result<Self> method of Expr is a good example of the issue:
    https://github.com/apache/arrow-datafusion/blob/4578f3daeefd84305d3f055c243827856e2c036c/datafusion/expr/src/tree_node/expr.rs#L153-L425
    An Expr tree uses Vecs and Boxes. The problem is that TreeNode.rewrite() call on an expresion tree basically creates a whole new tree regardless if any change was made to any node due to how transform_boxed(), transform_option_box(), transform_option_vec() and transform_vec() work.
    As the type of the tree node can't change during rewrite and Rust prevents data races at compile time, an in place mutation seems more reasonable for such transformations. Also, the size of a reference to a tree node is usually smaller than the size of a node so deeper recursion can be supported with the same stack size.

    Please note that not all TreeNode trees suffer from the above issue. E.g. LogicalPlan tree uses Vecs and Arcs. Cloning an Arc is cheap compared to a Box. Actaully in this case the proposed self mutating transform doesn't bring much improvement as an Arc can't be mutated and a new one needs to be created anyways, but Vecs can be reused.

    Update: The above analysis about memory reuse is not correct. @sadboy showed in Perf: avoid unnecessary allocations when transforming Expr #8591 that current expression transformation functions do reuse memory due to Rust compiler optimizations.
    But the suggested &mut Self::Node based transform functions still seem to make sense as Refactor TreeNode recursions #7942 (comment) and Refactor TreeNode recursions #7942 (comment) mini benchmarks show considerable performance improvement.
    Let's wait for Benchmarks for planning queries #8638 to measure more concreate effects of the suggested.

  2. This PR unifies the 2 recursion related enums (VisitRecursion and RewriteRecursion) as they are a bit confusing.
    Currently VisitRecursion controls TreeNode.apply() and TreeNode.visit() and RewriteRecursion controls TreeNode.rewrite(). The Stop element of both behave differently as it fully stops the recursion in case of visit, but it doesn't do so in case of rewrite. Also, the Skip element prevents recursion into childrens in case of visit, but it doesn't in case of rewrite.
    In this PR I'm proposing to use a new TreeNodeRecursion that can be used with both visit and transform/rewrite:

    pub enum TreeNodeRecursion {
        /// Continue the visit to the next node.
        Continue,
    
        /// Prune the current subtree.
        /// If a preorder visit of a tree node returns [`TreeNodeRecursion::Prune`] then inner
        /// children and children will not be visited and postorder visit of the node will not
        /// be invoked.
        Prune,
    
        /// Stop recursion on current tree.
        /// If recursion runs on an inner tree then returning [`TreeNodeRecursion::Stop`] doesn't
        /// stop recursion on the outer tree.
        Stop,
    
        /// Stop recursion on all (including outer) trees.
        StopAll,
    }

    This PR also proposes to remove RewriteRecursion::Mutate as it doesn't seem to add any value. The pre_visit() method during rewrite could return the modified node (or mutate the node in place as suggested in 1.) and return how the recursion should continue.

  3. This PR adds a new default value Nop to Expr enum.
    This new expressions does nothing and will not occur in any valid plans but it is sometimes useful to be able to replace expressions to a dummy one. Please see Expr.unalias() for an example.

    Update: This is not needed. See discussion: Refactor TreeNode recursions #7942 (comment)

  4. This PR proposes to adds transform_down_with_payload(), transform_up_with_payload() and transform_with_payload() methods to TreeNodes to be able to propagate down/up additional payloads during transformation. These new methods make EnforceSorting, EnforceDistribution and similar rules much simpler as there is no need to create special tree nodes like SortPushDown and PlanWithKeyRequirements.

    Update: This is idea is moved to issue: Get rid of special TreeNodes #8663 and PR: Transform with payload #8664

What changes are included in this PR?

This PR:

  • Adds TreeNodeTransformer and TreeNode.transform() method as a better alternative to TreeNodeRewriter and TreeNode.rewrite(). Some of the TreeNodeRewriter usages are refactored to TreeNodeTransformer as examples, the remaining occurances can be refactored in follow-up PRs if this PR gets accepted.
  • This PR modifies TreeNode.transform_up() and TreeNode.transform_down() methods to be self mutating ones and refactors a few usages as examples. (The old methods are still kept as TreeNode.transform_up_old() and TreeNode.transform_down_old()).
  • Adds TreeNodeRecursion enum to control tree recursions. Modifies TreeNode methods to use the new enum.

Are these changes tested?

Using exinsting tests.

Are there any user-facing changes?

No.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate labels Oct 26, 2023
@peter-toth
Copy link
Contributor Author

#5609 has been closed so I can open a new issue for the PR if needed.

@peter-toth peter-toth marked this pull request as draft October 27, 2023 09:31
@alamb
Copy link
Contributor

alamb commented Oct 27, 2023

It might help to open a new ticket with a description of what you hope to do, but if you already have a PR that is probably good enough to get feedback

Thank you for helping to make DataFusion better 🙏

@peter-toth
Copy link
Contributor Author

Thanks @alamb! I will open a ticket and describe my goals. I will also update this PR with a few fixes in a few days before review can start.
One question, is there a code style guide for this project?

@alamb
Copy link
Contributor

alamb commented Oct 29, 2023

One question, is there a code style guide for this project?

@peter-toth what we have is here: https://arrow.apache.org/datafusion/contributor-guide/index.html

I would say we follow clippy and rustfmt and in general try to use a style consistent with existing code, but there is nothing more formal that I know of

@peter-toth peter-toth force-pushed the refactor-treenode-apply branch 5 times, most recently from 606a6b0 to 36e36fd Compare December 15, 2023 11:44
@peter-toth
Copy link
Contributor Author

@alamb, I haven't created any issues yet, but wanted to put together a POC PR with some of the changes I would like to propose.

I think this is a bit related to #7775 and to this planning performance epic: #5637.

If any of the above make sense I'm happy to create dedicated issues and then update or split this PR.

@peter-toth peter-toth changed the title Refactor TreeNode::apply and its relatives Refactor TreeNode recursions Dec 16, 2023
@alamb
Copy link
Contributor

alamb commented Dec 18, 2023

Thanks @peter-toth -- I will try and review this proposal later today or tomorrow

@github-actions github-actions bot added the sql SQL Planner label Dec 19, 2023
… `TreeNode`s and use them in a few examples

- add `transform_down_with_payload()`, `transform_up_with_payload()`, `transform_with_payload()` and use it in `EnforceSorting` as an example
@peter-toth peter-toth force-pushed the refactor-treenode-apply branch 2 times, most recently from 5044144 to 09bbb6b Compare December 19, 2023 15:53
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this PR @peter-toth. While I likely did not grok all the nuances of this PR and its implications, I really like where it is headed and.

Thoughts about API breakages

In so far that we can minimize or spread out in time the breaking API changes required I think that would help roll out these changes in a way for users accept this change. I also think we can take most/all of the ideas in this PR and minimize the breaking changes

Some potential ways to keep the API change smaller:

  1. keep transform_up called transform_up rather than renaming it to transform_up_old
  2. Typedef let type VisitRecusion = TreeNodeRecursion so users don't have to change

Things I am not not sure about

I am not sure about introducing Expr::Nop -- the thinking is that then one has to check at runtime that no Expr::Nop is left, rather than using Option<Expr> where you can have the compiler check for you

I think it would be good to get some feeback from the broader community as well

cc @liukun4515 and @yahoNanJing who were instrumental in implementing the current system. cc @sadboy who has been looking into improving planning performance as well - this could be part of the story

@mustafasrepo and @metesynnada and @crepererum perhaps you have insights to share as well

AggregateMode::Partial
) && can_combine(
plan.transform_down(&mut |plan| {
plan.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is certainly nice to avoid the clone

/// computational cost by pushing down `SortExec`s through some executors.
///
/// [`EnforceSorting`]: crate::physical_optimizer::enforce_sorting::EnforceSorting
#[derive(Debug, Clone)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was inlined into pushdown_sort via transform_down_with_payload which I think is a nice change 👍

pub enum Expr {
#[default]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit about the need / usecase for Expr::Nop? Could the same be accomplished with Option<Expr>?

Copy link
Contributor Author

@peter-toth peter-toth Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I added Expr::Nop is to be able to refactor unalias() to be self mutating one: https://github.com/apache/arrow-datafusion/pull/7942/files#diff-204cfc4f999c3d12dc065f323cb952fb0ecb33c5570eed8dc1fb52b806e87004R960. I needed a dummy Expr for mem::take(), but as you showed in https://github.com/apache/arrow-datafusion/pull/7942/files#r1431799873 unalias() doens't need to be self mutating.

But, maybe having a default dummy value of Expr is still useful in some cases, like in @sadboy's PR: https://github.com/apache/arrow-datafusion/pull/8591/files#diff-6515fda3c67b9d487ab491fd21c27e32c411a8cbee24725d737290d19c10c199R388-R389, https://github.com/apache/arrow-datafusion/pull/8591/files#diff-6515fda3c67b9d487ab491fd21c27e32c411a8cbee24725d737290d19c10c199R427-R428, Expr::Wildcard { qualifier: None } is used for such purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification: is nop "no op(eration)"? If so, could we use the more industry standard Noop? A quick google search seems to be evidence for it's prevalence:

https://www.google.com/search?client=firefox-b-1-d&q=abbreviate+no+operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my intention. Noop sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed a dummy Expr

If it's just for this purpose, then I think the Null literal should serve it well enough:

impl Default for Expr {
    fn default() -> Self {
        Expr::Literal(ScalarValue::Null)
    }
}

As @alamb mentioned above, there is a (relatively high) cost to introducing new Expr variants, as it increases potential invalid states along every step of the analyzer/optimizer pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Null literal sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6cd5d39.

match self {
Expr::Alias(alias) => alias.expr.as_ref().clone(),
_ => self,
pub fn unalias(&mut self) -> &mut Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice spot -- I think we can avoid a copy like this too: #8588

/// children.
pub fn inspect_expressions<F, E>(self: &LogicalPlan, mut f: F) -> Result<(), E>
/// Apply `f` on expressions of the plan node.
/// `f` is not allowed to return [`TreeNodeRecursion::Prune`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't it return Prune?

Copy link
Contributor Author

@peter-toth peter-toth Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we basically iterate over the expressions in a logical plan tree node and apply f on each. Those expressions are the root nodes of expressions trees and the trees have no connection with each other. (Maybe we can think of them as siblings?)

So actually, I'm a bit uncertain about what should we do if f returns Prune here. (Other TreeNodeRecursion elements are clear how to proceed with.) Shall we handle Prune as Continue and proceed to the next expression?

.node
.expressions()
.iter()
.for_each_till_continue(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this for_each_till_continue is an interesting concept

Copy link
Contributor Author

@peter-toth peter-toth Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying define clear APIs on TreeNodes. After f4d28e0 we have

  • visit(), visit_down(),
  • transform(), transform_down(), transform_up(),
  • transform_with_payload(), transform_down_with_payload() and transform_up_with_payload()

functions on TreeNode and all can be controlled with TreeNodeRecursion.

/// If a preorder visit of a tree node returns [`TreeNodeRecursion::Prune`] then inner
/// children and children will not be visited and postorder visit of the node will not
/// be invoked.
Prune,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equivalent to RewriteRecursion::Skip? If so, perhaps we can use the same terminology

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep Skip if you prefer. (To me Prune better describes that children should not be visited.)

}

impl TreeNodeRecursion {
pub fn and_then_on_continue<F>(self, f: F) -> Result<TreeNodeRecursion>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are neat helpers, it would be useful to document their intended usecases if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment to and_then_on_continue() this in 8882285. I will add more details and comments to other helpers later. Let's see first if we need fail_on_prune() at all in #7942 (comment).

})
}

pub fn fail_on_prune(self) -> Result<TreeNodeRecursion> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the usecase for this method -- if there is going to be a panic, perhaps it would be clearer to put the check directly at the callsite with a explination for why the situation warrants a panic

@sadboy
Copy link
Contributor

sadboy commented Dec 20, 2023

Rust is somehow smart enough to optimize away the memory allocations already.

Playing around in godbolt pointed me to this -- https://doc.rust-lang.org/src/alloc/vec/in_place_collect.rs.html :)

@peter-toth
Copy link
Contributor Author

peter-toth commented Dec 20, 2023

I wrote a small benchmark test:

#[cfg(test)]
mod test {
    use crate::{and, lit, Expr};
    use datafusion_common::tree_node::{Transformed, TreeNode, TreeNodeRecursion};
    use std::time::Instant;

    fn create_and_tree(level: u32) -> Expr {
        if level == 0 {
            lit(true)
        } else {
            and(create_and_tree(level - 1), create_and_tree(level - 1))
        }
    }

    #[test]
    fn transform_test() {
        let now = Instant::now();
        let mut and_tree = create_and_tree(25);
        println!("create_and_tree: {}", now.elapsed().as_millis());

        let now = Instant::now();
        and_tree = and_tree
            .transform_down_old(&mut |e| Ok(Transformed::No(e)))
            .unwrap();
        println!("and_tree.transform_down_old: {}", now.elapsed().as_millis());

        let now = Instant::now();
        let mut and_tree_clone = and_tree.clone();
        println!("and_tree.clone: {}", now.elapsed().as_millis());

        let now = Instant::now();
        and_tree_clone
            .transform_down(&mut |_e| Ok(TreeNodeRecursion::Continue))
            .unwrap();
        println!(
            "and_tree_clone.transform_down: {}",
            now.elapsed().as_millis()
        );

        println!("results: {}", and_tree == and_tree_clone);

        let now = Instant::now();
        and_tree = and_tree
            .transform_down_old(&mut |e| match e {
                Expr::Literal(_) => Ok(Transformed::Yes(lit(false))),
                o => Ok(Transformed::No(o)),
            })
            .unwrap();
        println!(
            "and_tree.transform_down_old 2: {}",
            now.elapsed().as_millis()
        );

        let now = Instant::now();
        and_tree_clone
            .transform_down(&mut |e| match e {
                Expr::Literal(_) => {
                    *e = lit(false);
                    Ok(TreeNodeRecursion::Continue)
                }
                o => Ok(TreeNodeRecursion::Continue),
            })
            .unwrap();
        println!(
            "and_tree_clone.transform_down 2: {}",
            now.elapsed().as_millis()
        );

        println!("results: {}", and_tree == and_tree_clone);
    }
}

available here: https://github.com/peter-toth/arrow-datafusion/commits/refactor-treenode-benchmark/ and run it with --release as cargo test --color=always --lib tree_node::expr::test::transform_test --release -- --show-output and this is what I got :

---- tree_node::expr::test::transform_test stdout ----
create_and_tree: 8912
and_tree.transform_down_old: 6129
and_tree.clone: 12670
and_tree_clone.transform_down: 2137
results: true
and_tree.transform_down_old 2: 6507
and_tree_clone.transform_down 2: 2734
results: true

So transform_down() seems to be 2.5-3x times faster than transform_down_old().
The above results already contain @sadboy's #8591 improvement to the current code (transform_down_old() in this PR).
I'm failry new to Datafusion and Rust so please let me know if you would suggest a different benchmark.

@Dandandan
Copy link
Contributor

I like where this is going 🚀

I suggest to also add some benchmarking. We could take for example TCP-H and TCP-DS (which we already have in the benchmarks / tests) and benchmark the time it takes to plan/optimize the queries rather than execute them. It seems it might not be much work adding an option to the benchmark code to only perform the planning rather than executing the queries.

…visit()`, `visit_down()`, `transform()`, `transform_down()`, `transform_up()`, `transform_with_payload()`, `transform_down_with_payload()` and `transform_up_with_payload()` functions on `TreeNode`, others can be deprecated and removed once no longer used
…yload()` in its pre-order transform (`f_down`) function
@peter-toth
Copy link
Contributor Author

peter-toth commented Dec 20, 2023

I like where this is going 🚀

I suggest to also add some benchmarking. We could take for example TCP-H and TCP-DS (which we already have in the benchmarks / tests) and benchmark the time it takes to plan/optimize the queries rather than execute them. It seems it might not be much work adding an option to the benchmark code to only perform the planning rather than executing the queries.

I like this idea, but I'm not sure that this PR itself can bring much improvement yet. This PR only refactors a few transform/rewrite operations but the old methods are still kept and used at many places.
Also, some trees like LogicalPlan uses Arcs and their new in place mutation method (transform_children() in this PR: https://github.com/apache/arrow-datafusion/pull/7942/files#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R62-R80) is not yet better than their old map_children() (https://github.com/apache/arrow-datafusion/pull/7942/files#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R39) is.
Actually I'm not sure yet it's possible to do in place mutation on Arcs at all.

BTW, anyone can explain me why are this difference between TreeNodes in Datafusion? Why do some of them use Boxs but others use Arcs? Do we share subtrees between threads?

# Conflicts:
#	datafusion/expr/src/expr.rs
#	datafusion/expr/src/utils.rs
@peter-toth
Copy link
Contributor Author

peter-toth commented Dec 21, 2023

I've updated the https://github.com/peter-toth/arrow-datafusion/commits/refactor-treenode-benchmark/ with a LogicalPlan benchmark so as to see how the proposed PR affects trees with Arcs.

The benchmark is very similar to the previous Expr based one but uses Union and EmptyRelation to build up a tree: peter-toth@6d8ad17#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R83-R153

It is interresting to see that new transform_down() is better than the old transform_down_old() on LogicalPlan trees as well, but the improvement is not that significant:

---- tree_node::plan::test::transform_test stdout ----
create_union_tree: 8481
union_tree.transform_down_old: 6406
union_tree.clone: 0
union_tree_clone.transform_down: 3861
results: true
union_tree.transform_down_old 2: 11855
union_tree_clone.transform_down 2: 9479
results: true
results: false

I think the key takeaway of these 2 benchmarks is that I had to scale down the LogicalPlan based one run on a 23 height binary tree (peter-toth@75cde35#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R102) vs the Expr based one that ran on 25 height tree (peter-toth@75cde35#diff-6515fda3c67b9d487ab491fd21c27e32c411a8cbee24725d737290d19c10c199R498) to get roughly similar transform down numbers. I think this is the cost of using Arcs vs. Boxes and the fact that we can't mutate in place. (Although there might be good reasons for using Arcs, please see my question above.)

@alamb
Copy link
Contributor

alamb commented Dec 23, 2023

I think this is the cost of using Arcs vs. Boxes and the fact that we can't mutate in place. (Although there might be good reasons for using Arcs, please see my question above.)

I think the particular choices of Boxs vs Arcs does not have a well thought out rationale or if there is one I do not know of one.

@alamb
Copy link
Contributor

alamb commented Dec 23, 2023

Here is my suggestion in how to proceed with this PR

  1. Create some basic end to end planning performance benchmarks (I elaborated on @Dandandan 's idea Benchmarks for planning queries #8638 Refactor TreeNode recursions #7942 (comment))

  2. Use that information to guide which part(s) of this PR are the most valuable for increasing performance.

@sadboy, do you have any benchmarks you could share that model your existing workload?

+1 to the importance of this -- our workloads involve lots of analysis/transformations on the Datafusion LogicalPlan, so any perf improvements in this department would be extremely beneficial to us.

It would be great if there's some kind of benchmark to demonstrate the concrete effects of this change -- perf-related impacts can often times be counter-intuitive and surprising.

100% agree

@ozankabak
Copy link
Contributor

I like this general effort and we will be happy to help. The main challenge I see is that this touches many files and procedures, and we may lose/break certain behaviors that are not adequately tested. Therefore, IMO it makes sense to first clean-up some of the tree traversal logic in our planner/optimization rules as a stepping stone to this.

We will submit a cleanup PR early next week to simplify around half of the usages in physical planning/optimization so the job here will be easier.

@sadboy
Copy link
Contributor

sadboy commented Dec 23, 2023

@sadboy, do you have any benchmarks you could share that model your existing workload?

Not readily, ours is all production queries that we can not share. But I can certainly synthesize some test cases from the more "pathological" cases we've encountered, e.g. large WITHs, deep nested IFs, 1000+ columns, etc. Would be a good augment to what you described in #8638.

@peter-toth
Copy link
Contributor Author

Thanks you all for the feedbacks! I've updated the PR description with the lastest findings.

…ransform_up_with_payload` related changes
# Conflicts:
#	datafusion/common/src/tree_node.rs
#	datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs
#	datafusion/core/src/physical_optimizer/enforce_distribution.rs
#	datafusion/core/src/physical_optimizer/enforce_sorting.rs
#	datafusion/core/src/physical_optimizer/pipeline_checker.rs
#	datafusion/core/src/physical_optimizer/replace_with_order_preserving_variants.rs
#	datafusion/core/src/physical_optimizer/sort_pushdown.rs
#	datafusion/expr/src/tree_node/expr.rs
#	datafusion/expr/src/tree_node/plan.rs
#	datafusion/optimizer/src/analyzer/count_wildcard_rule.rs
#	datafusion/optimizer/src/analyzer/type_coercion.rs
#	datafusion/optimizer/src/push_down_filter.rs
#	datafusion/physical-expr/src/equivalence.rs
#	datafusion/physical-expr/src/sort_properties.rs
#	datafusion/physical-expr/src/utils/mod.rs
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Apr 24, 2024
@peter-toth peter-toth closed this Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants