Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove logical cross join in planning #12985

Merged
merged 23 commits into from
Oct 18, 2024

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Oct 17, 2024

Which issue does this PR close?

Closes #13007

Rationale for this change

This simplifies (optimization) code dealing with logical plan, as it only has to deal with LogicalPlan::Join.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Oct 17, 2024
@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Oct 17, 2024
@Dandandan Dandandan changed the title Remove logical cross join in planning WIP: Remove logical cross join in planning Oct 17, 2024
@github-actions github-actions bot added sql SQL Planner substrait labels Oct 17, 2024
@Dandandan Dandandan marked this pull request as ready for review October 18, 2024 09:52
@Dandandan Dandandan changed the title WIP: Remove logical cross join in planning Remove logical cross join in planning Oct 18, 2024
@Dandandan
Copy link
Contributor Author

This is now ready for review

Arc::new(CrossJoinExec::new(physical_left, physical_right))
} else {
// there is no equal join condition, use the nested loop join
// TODO optimize the plan, and use the config of `target_partitions` and `repartition_joins`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we create a separate ticket on "optimize the plan, and use the config of target_partitions and repartition_joins" and remove the comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the comment means, it was copied. I think repartition_joins is not needed for NestedLoopJoin and target_partitions should be already used on child plans...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just remove the comment

@@ -1873,6 +1874,11 @@ impl LogicalPlan {
.as_ref()
.map(|expr| format!(" Filter: {expr}"))
.unwrap_or_else(|| "".to_string());
let join_type = if filter.is_none() && keys.is_empty() && matches!(join_type, JoinType::Inner) {
"Cross".to_string()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we extend JoinType enum to support Cross?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s better not to, otherwise it will be similar to having LogicalPlan::CrossJoin (I.e. unnecesary).

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @Dandandan some comments and the PR is good to go

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @Dandandan

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Arc::new(CrossJoinExec::new(physical_left, physical_right))
} else {
// there is no equal join condition, use the nested loop join
// TODO optimize the plan, and use the config of `target_partitions` and `repartition_joins`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just remove the comment

@Dandandan Dandandan merged commit 34bd823 into apache:main Oct 18, 2024
24 checks passed
universalmind303 added a commit to Eventual-Inc/Daft that referenced this pull request Oct 28, 2024
still todo: 

- [x] add tests



## Notes for reviewers:

This does not actually implement a physical cross join, but just
implements the logical cross join as well as cross join to inner join
optimization `eliminate_cross_join.rs`

This treats an inner join with no join conditions as cross join.
(inspired by a recent [change in
datafusion](apache/datafusion#12985)).

If the cross join can not be optimized away, an error will be raised
when attempting to execute the plan.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove logical cross join in planning
3 participants