-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT]: Sql joins with duplicate cols #3241
[FEAT]: Sql joins with duplicate cols #3241
Conversation
CodSpeed Performance ReportMerging #3241 will improve performances by ×2Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@universalmind303 I think the Union/Except
set operations support could be in a separate PR, and if wanted, I can help to contribute these support.
yeah that makes sense, will split them out! |
EXCEPT
set operation
@universalmind303 I feel like join behavior diverge quite a bit between SQL and dataframe. Should we somehow split them up instead of sort of tacking the SQL parser on top of the dataframe one like we are currently doing? Right now I create a project on the right table before the join to handle common column names for dataframe, we could move that out of the actual join op creation and have separate implementations of that project for each API. |
yeah I'm not a huge fan of all of the extra work needed to replicate sql style joins in our engine. I think this likely involves a larger refactor to how we handle joins and LogicalPlans though. The logicalPlan doesn't have a concept of a relation/table name which causes us to do all of this extra work. |
Would you like to start on that refactor in this PR or do you think it's worth merging this fix in for now? If the latter I can give this a review. |
I'm feeling the same way. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think one thing we should do at some point is actually move the project/renaming logic out of Join::try_new, so that it can be tested independently and other places that create joins, such as LogicalPlan::with_new_children
, do not need worry about that. But the code here is fine to merge in.
adds support for
union
union all
, andexcept
set operations, as well as fixes an issue when performing joins with duplicate columns #3194