Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Filter: Boolean(false) with EmptyRelation #3864

Closed
andygrove opened this issue Oct 17, 2022 · 6 comments · Fixed by #4192
Closed

Replace Filter: Boolean(false) with EmptyRelation #3864

andygrove opened this issue Oct 17, 2022 · 6 comments · Fixed by #4192
Labels
enhancement New feature or request optimizer Optimizer rules

Comments

@andygrove
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I have a query that looks like this after optimization:

Union 
  Projection
    ...
  Projection
    Filter: Boolean(false)
      Aggregate
        Inner Join
          Inner Join

Describe the solution you'd like
Remove everything under the Filter that is always false and replace with EmptyRelation so that we avoid executing the join and aggregate.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@andygrove andygrove added enhancement New feature or request optimizer Optimizer rules labels Oct 17, 2022
@mingmwang
Copy link
Contributor

We can have two general rules: FoldFilters and EmptyRelationPropagation.
FoldFilters can remove the filters which can be evaluated trivially. If Filter condition is alway false, replace it with an EmptyRelation

EmptyRelationPropagation can propagate the empty relation up to the tree, for example, inner join with EmptyRelation can return EmptyRelation.

@mingmwang
Copy link
Contributor

@andygrove Looks like there is already a rule EliminateFilter for this purpose.

@alamb
Copy link
Contributor

alamb commented Oct 18, 2022

Perhaps the changes in #3841 from @Dandandan made it possible to simplify the filter

Perhaps we could solve this issue by adding another run of EliminateFilter at the end of the list in datafusion/optimizer/src/optimizer.rs

This was referenced Oct 18, 2022
@alamb
Copy link
Contributor

alamb commented Oct 22, 2022

@andygrove can you share a reproducer? We can then add it to the regression tests to make sure the empty filter is indeed eliminated and doesn't come back in

@jackwener
Copy link
Member

jackwener commented Nov 12, 2022

Datafusion only can propagate EmptyRelation just optimizer_config.max_passes times. Because we depend on multi-passes optimization to multiply elimination.

In other words, we currently don't have really EmptyRelationPropagation.

We can reproduce it.

After 3 optimizations, the plan still can continue to be eliminated.

 CREATE TABLE IF NOT EXISTS t1 AS VALUES(1,'HELLO'),(12,'DATAFUSION');
 CREATE TABLE IF NOT EXISTS t2 AS VALUES(1,'HELLO'),(12,'DATAFUSION');
 CREATE TABLE IF NOT EXISTS t3 AS VALUES(1,'HELLO'),(12,'DATAFUSION');

explain verbose  select column1 from t1 join ( select column1 from t2 join (select column1 from t3 where false ) as ta2 on t2.column1 = ta2.column1 ) as ta1 on t1.column1 = ta1.column1;

@andygrove
Copy link
Member Author

FWIW, This is no longer an issue for me in Dask SQL. I am not sure what fixed it but maybe the multiple optimizer passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request optimizer Optimizer rules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants