Skip to content

Add try_new for LogicalPlan::Join #15757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 21, 2025

Conversation

kumarlokesh
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

This PR adds a new Join::try_new() constructor method that:

  • Automatically computes the correct schema based on join type

  • Leverages the existing build_join_schema() function

  • Handles all join types correctly (Inner, Left, Right, Full, Semi, Anti, and LeftMark)

  • Additionally, it updates the LogicalPlanBuilder:

    • cross_join()
    • join_with_expr_keys()
    • join_using()

to use the new try_new method instead of manual schema computation.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Apr 18, 2025
@ctsk
Copy link
Contributor

ctsk commented Apr 20, 2025

The implementation looks good to me. Two points:

  1. The testing seems quite verbose - I briefly browsed around and think DFSchema::matches_arrow_schema can help you shorten it.
  2. There are might be some more places where this is applicable:
    let schema =
    build_join_schema(left.schema(), right.schema(), &join_type)?;
    let new_on: Vec<_> = on
    .into_iter()
    .map(|equi_expr| {
    // SimplifyExpression rule may add alias to the equi_expr.
    (equi_expr.0.unalias(), equi_expr.1.unalias())
    })
    .collect();
    Ok(LogicalPlan::Join(Join {
    left,
    right,
    join_type,
    join_constraint,
    on: new_on,
    filter,
    schema: DFSchemaRef::new(schema),
    null_equals_null,
    }))

    let join_schema =
    build_join_schema(left.schema(), right.schema(), &original_join.join_type)?;
    Ok(Join {
    left,
    right,
    on,
    filter: original_join.filter.clone(),
    join_type: original_join.join_type,
    join_constraint: original_join.join_constraint,
    schema: Arc::new(join_schema),
    null_equals_null: original_join.null_equals_null,
    })

    let join_schema = Arc::new(build_join_schema(
    left_input.schema(),
    right_input.schema(),
    &JoinType::Inner,
    )?);
    return Ok(LogicalPlan::Join(Join {
    left: Arc::new(left_input),
    right: Arc::new(right_input),
    join_type: JoinType::Inner,
    join_constraint: JoinConstraint::On,
    on: join_keys,
    filter: None,
    schema: join_schema,
    null_equals_null: false,
    }));

    let join_schema = Arc::new(build_join_schema(
    left_input.schema(),
    right.schema(),
    &JoinType::Inner,
    )?);
    Ok(LogicalPlan::Join(Join {
    left: Arc::new(left_input),
    right: Arc::new(right),
    schema: join_schema,
    on: vec![],
    filter: None,
    join_type: JoinType::Inner,
    join_constraint: JoinConstraint::On,
    null_equals_null: false,
    }))

Could be also applied here, but would change the behaviour in the error case - I don't know if that matters.

let join_schema =
build_join_schema(self.plan.schema(), right.schema(), &join_type)?;
// Inner type without join condition is cross join
if join_type != JoinType::Inner && on.is_empty() && filter.is_none() {
return plan_err!("join condition should not be empty");
}
Ok(Self::new(LogicalPlan::Join(Join {
left: self.plan,
right: Arc::new(right),
on,
filter,
join_type,
join_constraint: JoinConstraint::On,
schema: DFSchemaRef::new(join_schema),
null_equals_null,
})))

Note that all of these are mere suggestions, I am not very familiar with this part of datafusion and apologize if any of them are wrong.

Cheers!

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @kumarlokesh

@comphead comphead merged commit 9730404 into apache:main Apr 21, 2025
27 checks passed
@kumarlokesh kumarlokesh deleted the try-new-method-for-logical-plan branch April 21, 2025 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add try_new for LogicalPlan::Join Join and others
3 participants