Skip to content

18481: feat: Add selectivity metric to NestedLoopJoinExec for EXPLAIN ANALYZE#1

Open
martin-augment wants to merge 7 commits intomainfrom
pr-18481-2025-11-04-11-27-07
Open

18481: feat: Add selectivity metric to NestedLoopJoinExec for EXPLAIN ANALYZE#1
martin-augment wants to merge 7 commits intomainfrom
pr-18481-2025-11-04-11-27-07

Conversation

@martin-augment
Copy link
Owner

@martin-augment martin-augment commented Nov 4, 2025

18481: To review by AI


Note

Adds a selectivity ratio metric to NestedLoopJoinExec and validates it via EXPLAIN ANALYZE tests.

  • Physical plan / metrics:
    • Introduce NestedLoopJoinMetrics with selectivity (RatioMetrics) for NestedLoopJoinExec.
    • Compute totals per right batch (left_rows * right_rows) and increment parts on output emission; integrate with existing BuildProbeJoinMetrics and timers.
    • Wire metric creation in execute and propagate through NestedLoopJoinStream; update metric recording points accordingly.
  • Tests:
    • Add nested_loop_join_selectivity in core/tests/sql/explain_analyze.rs asserting selectivity for INNER, LEFT, RIGHT, and FULL joins via EXPLAIN ANALYZE.

Written by Cursor Bugbot for commit 08f8bc7. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link

coderabbitai bot commented Nov 4, 2025

Walkthrough

Introduces NestedLoopJoinMetrics struct to consolidate BuildProbeJoinMetrics and selectivity tracking in the nested loop join implementation. Updates the join stream to use this new metrics container and adds a test validating selectivity calculation across different join types.

Changes

Cohort / File(s) Summary
Test Addition
datafusion/core/tests/sql/explain_analyze.rs
Adds async test nested_loop_join_selectivity that validates selectivity metrics (1%, 10%, 19%) across INNER, LEFT, RIGHT, and FULL join types using EXPLAIN ANALYZE queries with cross-joins and ON conditions.
Metrics Refactoring
datafusion/physical-plan/src/joins/nested_loop_join.rs
Introduces NestedLoopJoinMetrics struct combining BuildProbeJoinMetrics and RatioMetrics for selectivity. Replaces direct BuildProbeJoinMetrics field in NestedLoopJoinStream with nested metrics container. Updates selectivity tracking through batch completion, polling, and row/batch combination processing. Refactors stream initialization and all downstream metrics references to use the new container.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-18481-2025-11-04-11-27-07

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db5f47c and 08f8bc7.

📒 Files selected for processing (2)
  • datafusion/core/tests/sql/explain_analyze.rs (1 hunks)
  • datafusion/physical-plan/src/joins/nested_loop_join.rs (18 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
datafusion/core/tests/sql/explain_analyze.rs (2)
datafusion/core/src/execution/context/mod.rs (1)
  • sql (597-599)
datafusion/core/tests/sql/mod.rs (1)
  • execute_to_batches (124-131)
datafusion/physical-plan/src/joins/nested_loop_join.rs (2)
datafusion/physical-plan/src/repartition/mod.rs (3)
  • metrics (806-808)
  • new (546-575)
  • partition (348-356)
datafusion/physical-plan/src/joins/utils.rs (5)
  • new (738-749)
  • new (1354-1389)
  • new (1511-1513)
  • new (1539-1545)
  • batch (2751-2754)
🔇 Additional comments (2)
datafusion/core/tests/sql/explain_analyze.rs (1)

1107-1134: Selectivity regression coverage looks solid

Nice touch exercising INNER/LEFT/RIGHT/FULL joins against the deterministic generate_series setup; the expected 1/10/19 row counts line up with the cross-product denominator and should keep the new metric honest across join variants.

datafusion/physical-plan/src/joins/nested_loop_join.rs (1)

1126-1134: Selectivity bookkeeping fits the state machine

Updating the total in the Ok(false) branch—right after a probe batch is fully compared against the buffered left side—keeps the denominator aligned with the actual Cartesian work, and the maybe_flush_ready_batch hook makes sure every emitted batch (including unmatched spill) contributes to the numerator. Clean integration.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants