Skip to content

BinaryExpr evaluate lacks optimization for Or and And scenarios #11212

@acking-you

Description

@acking-you

Describe the bug

As shown in the code link, BinaryExpr directly calculates the left and right operands without considering the possibility that left might already be false and op is And, or that left might be true and op is Or.

For example, in this complex filtering statement, unnecessary calculations will be performed:

SELECT 
    customer_id,
    first_name,
    last_name,
    email,
    total_purchases
FROM 
    customers
WHERE 
    1 = 0  -- This condition is always false
    AND (
        (total_purchases > 1000 AND EXTRACT(YEAR FROM registration_date) > 2018)
        OR (LOWER(email) LIKE '%@gmail.com' AND loyalty_points > 500)
        OR (
            LENGTH(first_name) + LENGTH(last_name) > 15
            AND SUBSTR(last_name, 1, 1) IN ('M', 'N', 'O', 'P')
            AND DATE_PART('month', last_purchase_date) BETWEEN 6 AND 8
        )
    )
    AND (
        customer_id % 3 = 0
        OR EXISTS (
            SELECT 1 
            FROM orders 
            WHERE orders.customer_id = customers.customer_id
            AND orders.order_status = 'Completed'
            AND orders.order_total > 250
        )
    );

To Reproduce

When evaluating AND or OR expressions

Expected behavior

When an And expression is executed, a false occurs to stop the computation of the result.
When an Or expression is executed, a true occurs and the result is stopped.

Additional context

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformanceMake DataFusion faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions