Skip to content

20234: Remove recursive const check in simplify_const_expr#232

Open
martin-augment wants to merge 1 commit intomainfrom
pr-20234-2026-02-09-12-50-17
Open

20234: Remove recursive const check in simplify_const_expr#232
martin-augment wants to merge 1 commit intomainfrom
pr-20234-2026-02-09-12-50-17

Conversation

@martin-augment
Copy link
Owner

20234: To review by AI

@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the performance and maintainability of DataFusion's physical expression simplifier by refining how constant expressions are identified and evaluated. The primary goal is to make the simplification process more efficient by eliminating unnecessary recursive traversals and improving the modularity of internal components. This leads to a cleaner and potentially faster expression optimization pipeline.

Highlights

  • Optimized Constant Expression Simplification: The simplify_const_expr function has been refactored to check only the immediate children of an expression for literal status, rather than performing a recursive check on the entire subtree. This change leverages the bottom-up traversal of the expression simplifier, significantly reducing redundant work during constant evaluation.
  • Removal of Redundant Recursive Checks: The helper functions can_evaluate_as_constant and has_column_references, which previously performed recursive checks on expression trees, have been removed. Their logic is now implicitly handled by the optimized simplify_const_expr approach, streamlining the codebase.
  • Restricted Visibility of Internal Components: Several functions (simplify_const_expr, simplify_not_expr) and modules (const_evaluator, not, unwrap_cast) have had their visibility reduced from pub to pub(crate). This improves encapsulation by explicitly marking them as internal components of the simplifier module, preventing external direct usage.
Changelog
  • datafusion/physical-expr/src/simplifier/const_evaluator.rs
    • Refactored simplify_const_expr to check only immediate children for literal status, removing recursive subtree checks.
    • Merged simplify_const_expr_with_dummy functionality directly into simplify_const_expr.
    • Removed can_evaluate_as_constant and has_column_references functions.
    • Updated documentation for simplify_const_expr to reflect the new simplification strategy.
  • datafusion/physical-expr/src/simplifier/mod.rs
    • Updated import paths for simplify_const_expr.
    • Changed module visibility for const_evaluator, not, and unwrap_cast from pub mod to mod.
    • Updated the call site for simplify_const_expr within the PhysicalExprSimplifier.
  • datafusion/physical-expr/src/simplifier/not.rs
    • Changed simplify_not_expr function visibility from pub fn to pub(crate) fn.
Activity
  • The pull request description indicates that it is 'To review by AI', suggesting no human review activity has occurred yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

Walkthrough

This pull request refactors the const expression evaluator in the physical expression simplifier. The simplify_const_expr function is updated to accept a RecordBatch parameter and evaluate expressions only when all direct children are literals, following a bottom-up traversal approach. The wrapper function simplify_const_expr_with_dummy is removed along with helper functions has_column_references and can_evaluate_as_constant. The public API surface is reduced by making three submodules (const_evaluator, not, unwrap_cast) and the simplify_not_expr function crate-private instead of public.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-20234-2026-02-09-12-50-17

No actionable comments were generated in the recent review. 🎉


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Feb 9, 2026

Code Review for PR #232: Remove recursive const check in simplify_const_expr

Summary

This PR optimizes the constant expression simplifier by removing recursive tree traversal in favor of a linear check of immediate children. This is a performance improvement that reduces complexity from O(subtree) to O(children).


✅ Code Quality & Best Practices

Excellent improvements:

  1. Algorithm optimization: The change from recursive can_evaluate_as_constant() (which traversed the entire subtree) to checking only immediate children is a smart optimization. Since the simplifier uses bottom-up traversal (transform), children are already simplified when the parent is visited.

  2. Better separation of concerns: The function now correctly assumes bottom-up traversal context, as documented in the updated comment at datafusion/physical-expr/src/simplifier/const_evaluator.rs:34-36.

  3. Code cleanup: Removed dead code:

    • can_evaluate_as_constant() function (replaced by simpler logic)
    • has_column_references() function (unused)
    • Public simplify_const_expr() wrapper (consolidated into internal version)
  4. Improved documentation: The function doc comment now clearly states the bottom-up traversal assumption.

Minor suggestions:

  1. Visibility changes (datafusion/physical-expr/src/simplifier/mod.rs:33-35): Changed from pub mod to mod (private). This is good for encapsulation, but ensure this doesn't break any external consumers of these modules. Based on the grep search showing only 2 files using simplify_const_expr, this appears safe.

🐛 Potential Issues

Medium priority:

  1. Edge case: expressions with volatile children - The current code checks expr.is_volatile_node() but only for the current node, not children. However, if a child is volatile, it won't be a Literal (since volatile expressions can't be const-evaluated), so all_children_literal will be false. This is correct, but subtle. Consider adding a comment explaining this invariant at datafusion/physical-expr/src/simplifier/const_evaluator.rs:57-58.

  2. Error handling silently swallows errors (datafusion/physical-expr/src/simplifier/const_evaluator.rs:87-92): The Err(_) case keeps the original expression. While the comment explains this is intentional for short-circuit evaluation, consider logging these errors at debug/trace level for troubleshooting purposes.

Low priority:

  1. Unused imports cleanup: The removal of TreeNodeRecursion import is good, but verify that Transformed is still needed (it is - used in return types).

⚡ Performance Considerations

Major improvements:

  1. Complexity reduction: From O(n) where n=subtree size to O(k) where k=number of children. For deeply nested expressions, this is a significant win.

  2. Reduced allocations: No longer creating closures and intermediate state for apply() traversal.

  3. Better cache locality: Linear iteration over children vs recursive tree traversal.

Measurement suggestion:

Consider adding a benchmark or performance test for deeply nested constant expressions to demonstrate the improvement. The existing test at datafusion/physical-expr/src/simplifier/mod.rs:543-564 tests correctness but not performance.


🔒 Security Concerns

No security issues identified. The change is purely an optimization and maintains the same security properties:

  • Still prevents evaluation of volatile expressions
  • Still handles evaluation errors safely
  • Still prevents evaluation of column references

✅ Test Coverage

Strengths:

  1. Comprehensive existing tests: The mod.rs file has excellent test coverage including:

    • Simple literal expressions (datafusion/physical-expr/src/simplifier/mod.rs:496-506)
    • Nested literal expressions (datafusion/physical-expr/src/simplifier/mod.rs:529-541)
    • Deeply nested literals (datafusion/physical-expr/src/simplifier/mod.rs:543-564)
    • Mixed column/literal expressions (datafusion/physical-expr/src/simplifier/mod.rs:567-601)
  2. Tests cover the critical assumption: Test test_partial_simplify_with_column (line 583) verifies that bottom-up traversal works correctly - the inner (1 + 2) is simplified to 3 before the outer 3 + c1 is evaluated.

Recommendations:

  1. Add a test for volatile expressions: Verify that expressions like RANDOM() + 1 are not simplified. This would ensure the is_volatile_node() check works correctly.

  2. Add a test for evaluation errors: Create a test case where evaluation might fail (e.g., division by zero in a context where it's safe to keep the original expression) to verify the error handling path.

Example test to add:

#[test]
fn test_no_simplify_volatile() {
    let schema = Schema::empty();
    let simplifier = PhysicalExprSimplifier::new(&schema);
    
    // Assume we have a volatile expression like RandomExpr
    // let volatile_expr = Arc::new(RandomExpr::new());
    // let expr = Arc::new(BinaryExpr::new(volatile_expr, Operator::Plus, lit(1i32)));
    // let result = simplifier.simplify(expr.clone()).unwrap();
    // Should not be simplified
    // assert!(Arc::ptr_eq(&result, &expr));
}

📝 Overall Assessment

Rating: ✅ Approve with minor suggestions

This is a well-thought-out optimization that:

  • ✅ Improves performance significantly
  • ✅ Reduces code complexity
  • ✅ Maintains correctness through existing comprehensive tests
  • ✅ Has clear documentation explaining the approach

Before merging:

  1. Consider adding debug logging for evaluation errors
  2. Consider adding tests for volatile expressions (if not covered elsewhere)
  3. Verify no external consumers rely on the now-private modules

Nice work on this optimization! The code is cleaner and faster. 🚀


Review generated with analysis of code diff, existing tests, and DataFusion codebase patterns.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes a recursive check in simplify_const_expr for constant evaluation, replacing it with a more efficient check on immediate children. This is a good performance optimization that leverages the bottom-up traversal of the expression simplifier. The changes also improve encapsulation by making several simplifier modules crate-private. The logic appears sound, and I have one minor suggestion to improve code conciseness.

Comment on lines +46 to +59
// Already a literal - nothing to do
if expr.as_any().is::<Literal>() {
return Ok(Transformed::no(expr));
}

// Column references cannot be evaluated at plan time
if expr.as_any().is::<Column>() {
return Ok(Transformed::no(expr));
}

// Volatile nodes cannot be evaluated at plan time
if expr.is_volatile_node() {
return Ok(Transformed::no(expr));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For conciseness, the series of if statements to check for non-simplifiable expressions can be combined into a single if statement using || operators.

    // Expressions that are already literals, column references, or volatile
    // can't be simplified further at this stage.
    if expr.as_any().is::<Literal>()
        || expr.as_any().is::<Column>()
        || expr.is_volatile_node()
    {
        return Ok(Transformed::no(expr));
    }

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:valid-but-wont-fix; category:bug; feedback: The Gemini AI reviewer is correct but the current way is also fine - it is the same performance-wise and clearer due to the local comments.

@augmentcode
Copy link

augmentcode bot commented Feb 9, 2026

🤖 Augment PR Summary

Summary: Refactors physical constant folding to avoid recursive subtree checks.

Changes:

  • Fold nodes only when all direct children are Literals (relies on bottom-up traversal)
  • Remove recursive constant-check helpers and pass a shared dummy RecordBatch into the folder
  • Tighten visibility of simplifier internals (const_evaluator/not/unwrap_cast)

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

pub mod const_evaluator;
pub mod not;
pub mod unwrap_cast;
mod const_evaluator;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching these simplifier submodules/functions to crate-private removes previously public APIs (e.g. simplify_const_expr, simplify_not_expr) for downstream users of datafusion_physical_expr::simplifier. Can you confirm this visibility/API break is intended and aligned with semver/release notes?

Severity: medium

Other Locations
  • datafusion/physical-expr/src/simplifier/const_evaluator.rs:42
  • datafusion/physical-expr/src/simplifier/not.rs:46

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:valid-but-wont-fix; category:bug; feedback: The Augment AI reviewer is correct but the author intentionally removes those methods from the public API because: 1) they were not supposed to be public; 2) they are just building blocks, not a complete solution; 3) less public API = easier to evolve the implementation in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants