Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gen4: Semi join #9039

Merged
merged 10 commits into from
Oct 22, 2021
Merged

Gen4: Semi join #9039

merged 10 commits into from
Oct 22, 2021

Conversation

GuptaManan100
Copy link
Member

@GuptaManan100 GuptaManan100 commented Oct 20, 2021

Description

This PR adds a new primitive SemiJoin which is used to execute correlated subqueries occurring in the exists clause. An example of a query which will work now is select id from t1 where exists(select 1 from t2 where t1.col = t2.tcol1) order by id

The new primitive SemiJoin has two children, called LHS and RHS. It executes the LHS first. For each result of LHS, it runs the RHS. If there are any returned rows from RHS then it adds the LHS row to the output.

Essentially, SemiJoin filters rows from LHS based on whether there were any returned rows from the right side.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

GuptaManan100 and others added 8 commits October 19, 2021 11:57
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100 GuptaManan100 marked this pull request as ready for review October 21, 2021 13:17
@GuptaManan100 GuptaManan100 added Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature) release notes labels Oct 21, 2021
@GuptaManan100 GuptaManan100 changed the title Semi join Gen4: Semi join Oct 21, 2021
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
@systay systay merged commit 93fbfca into vitessio:main Oct 22, 2021
@systay systay deleted the semi-join branch October 22, 2021 11:56
Comment on lines +126 to +136
sqltypes.MakeTestResultNoFields(
rightFields,
),
sqltypes.MakeTestResultNoFields(
rightFields,
"4|d|dd",
),
sqltypes.MakeTestResultNoFields(
rightFields,
),
sqltypes.MakeTestResultNoFields(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use MakeTestStreamingResults method for streaming tests.

Comment on lines +79 to +88
// MakeTestResultNoFields builds a *sqltypes.Result object for testing.
// result := sqltypes.MakeTestResult(
// fields,
// " 1|a",
// "10|abcd",
// )
// The field type values are set as the types for the rows built.
// Spaces are trimmed from row values. "null" is treated as NULL.
func MakeTestResultNoFields(fields []*querypb.Field, rows ...string) *Result {
result := &Result{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might not be required if using MakeTestStreamingResults

Comment on lines +66 to +72
func (c *concatenateTree) pushPredicate(ctx *planningContext, expr sqlparser.Expr) error {
return vterrors.New(vtrpc.Code_UNIMPLEMENTED, "pushPredicate does not work on concatenate trees")
}

func (c *concatenateTree) removePredicate(ctx *planningContext, expr sqlparser.Expr) error {
return vterrors.New(vtrpc.Code_UNIMPLEMENTED, "removePredicate does not work on concatenate trees")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like an INTERNAL error to me than an UNIMPLEMENTED error.

Comment on lines +77 to +84
func (d *derivedTree) pushPredicate(ctx *planningContext, expr sqlparser.Expr) error {
return vterrors.New(vtrpcpb.Code_UNIMPLEMENTED, "pushPredicate does not work on derivedTrees")
}

func (d *derivedTree) removePredicate(ctx *planningContext, expr sqlparser.Expr) error {
return vterrors.New(vtrpcpb.Code_UNIMPLEMENTED, "removePredicate does not work on derivedTrees")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these needs to be marked as INTERNAL error

Comment on lines +299 to +302
passDownReuseCol := reuseCol
if !reuseCol {
passDownReuseCol = expr.As.IsEmpty()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure when we would need this special handling, we can just pass reuseCol as is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as we do for joins. What this code accomplishes is that even if the reuseCol is set false, we do not need to add it the children if the alias is empty. We can just add it to the output columns from this semi-join

Comment on lines +102 to +111
return vterrors.New(vtrpc.Code_UNIMPLEMENTED, "pushPredicate does not work on joinTrees with predicates having dependencies from both the sides")
}

func (jp *joinTree) removePredicate(ctx *planningContext, expr sqlparser.Expr) error {
if ctx.semTable.RecursiveDeps(expr).IsSolvedBy(jp.lhs.tableID()) {
return jp.lhs.removePredicate(ctx, expr)
} else if ctx.semTable.RecursiveDeps(expr).IsSolvedBy(jp.rhs.tableID()) {
return jp.rhs.removePredicate(ctx, expr)
}
return vterrors.New(vtrpc.Code_UNIMPLEMENTED, "removePredicate does not work on joinTrees with predicates having dependencies from both the sides")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should re-phrase this error message to something like add/remove '%s' predicate not supported on cross-shard join query

Comment on lines +210 to +233
// get the bindVariable for that column name and replace it in the predicate
bindVar := ctx.reservedVars.ReserveColName(node)
cursor.Replace(sqlparser.NewArgument(bindVar))
// check whether the bindVariable already exists in the map
_, alreadyExists := vars[bindVar]
if alreadyExists {
return false
}
// if it does not exist, then push this as an output column in the outerTree and add it to the joinVars
columnIndexes, err := outerTree.pushOutputColumns([]*sqlparser.ColName{node}, ctx.semTable)
if err != nil {
rewriteError = err
return false
}
columnIndex := columnIndexes[0]
vars[bindVar] = columnIndex
return false
}
}
return true
}, nil)
if rewriteError != nil {
return nil, rewriteError
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add cases for multiple co-related predicates
like

  1. select 1 from user u1, user u2 where exists (select 1 from user_extra ue where ue.col = u1.col and ue.col = u2.col

  2. select 1 from user u where exists (select 1 from user_extra ue where ue.col = u.col and u.col = ue.col2)
    the best plan is to supply u.col once but looks like it will be supplied twice as we reserve the column at the top and do not check.

@GuptaManan100
Copy link
Member Author

Review comments handled in #9083

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants