Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt] merge filters on composite primary keys in plan (1.2-dev) #16693

Merged
merged 2 commits into from
Jun 5, 2024

Conversation

aunjgr
Copy link
Contributor

@aunjgr aunjgr commented Jun 5, 2024

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #16146

What this PR does / why we need it:

(a,b,c) are composite keys

a=? and b in (?,?...) ==> __mo_cpkey_col prefix_in (?,?...) a=? and b = ? and c in (?,?...) ==> __mo_cpkey_col in (?,?...)


PR Type

Enhancement, Bug fix


Description

  • Enhanced expression handling and function binding in various parts of the SQL planner.
  • Improved handling of in and not_in functions to safely cast expressions.
  • Fixed typos in function names.
  • Updated type assignments to use specific types instead of T_tuple.
  • Added constant folding for partition expressions.
  • Improved selectivity estimation for in and prefix_in functions.
  • Updated test cases to reflect changes in type assignment.

Changes walkthrough 📝

Relevant files
Enhancement
13 files
apply_indices.go
Enhance expression handling and function binding in apply indices.

pkg/sql/plan/apply_indices.go

  • Added support for additional expression types in isRuntimeConstExpr.
  • Replaced bindFuncExprAndConstFold with BindFuncExprImplByPlanExpr for
    prefix_between and prefix_in.
  • +22/-8   
    base_binder.go
    Improve handling of `in` and `not_in` functions in base binder.

    pkg/sql/plan/base_binder.go

  • Added handling for in and not_in functions to safely cast expressions.
  • Removed redundant code for in and not_in functions.
  • +71/-62 
    build_insert.go
    Update type assignment for `in` expressions in build insert.

    pkg/sql/plan/build_insert.go

  • Updated type assignment for in expressions to match primary key
    expression type.
  • +2/-6     
    expr_opt.go
    Enhance merging of filters on composite keys in expression
    optimization.

    pkg/sql/plan/expr_opt.go

  • Enhanced merging of filters on composite keys to handle in
    expressions.
  • +87/-29 
    func_get_admin.go
    Change admin name retrieval to use byte slices.                   

    pkg/sql/plan/function/func_get_admin.go

    • Changed handling of admin name retrieval to use byte slices.
    +3/-4     
    list_operator.go
    Update supported operators to use specific types.               

    pkg/sql/plan/function/list_operator.go

  • Updated supported operators to use specific types instead of T_tuple.
  • +48/-48 
    make.go
    Update vector creation functions to use specific types.   

    pkg/sql/plan/make.go

  • Updated vector creation functions to use specific types instead of
    T_tuple.
  • +3/-7     
    partition_list.go
    Add constant folding for partition expressions.                   

    pkg/sql/plan/partition_list.go

    • Added constant folding for partition expressions.
    +12/-0   
    query_builder.go
    Move mergeFiltersOnCompositeKey call to earlier stage in query
    creation.

    pkg/sql/plan/query_builder.go

  • Moved mergeFiltersOnCompositeKey call to an earlier stage in query
    creation.
  • +1/-1     
    runtime_filter.go
    Update type handling for runtime filters.                               

    pkg/sql/plan/runtime_filter.go

  • Updated type handling for runtime filters to use specific types
    instead of T_tuple.
  • +2/-2     
    stats.go
    Improve selectivity estimation for `in` and `prefix_in` functions.

    pkg/sql/plan/stats.go

  • Improved selectivity estimation for in and prefix_in functions.
  • +12/-9   
    utils.go
    Update MakeInExpr to use specific types.                                 

    pkg/sql/plan/utils.go

    • Updated MakeInExpr to use specific types instead of T_tuple.
    +1/-5     
    util.go
    Update type assignment for primary key expressions.           

    pkg/vm/engine/disttae/util.go

  • Updated type assignment for primary key expressions to use specific
    types instead of T_tuple.
  • +1/-3     
    Bug fix
    3 files
    build_constraint_util.go
    Fix typo in function name in build constraint utility.     

    pkg/sql/plan/build_constraint_util.go

  • Fixed typo in function name from appendPrimaryConstrantPlan to
    appendPrimaryConstraintPlan.
  • +1/-1     
    build_dml_util.go
    Fix typo in function name in build DML utility.                   

    pkg/sql/plan/build_dml_util.go

  • Fixed typo in function name from appendPrimaryConstrantPlan to
    appendPrimaryConstraintPlan.
  • +1/-1     
    constant_fold.go
    Fix retrieval of string values in constant folding.           

    pkg/sql/plan/rule/constant_fold.go

    • Fixed retrieval of string values from vectors in constant folding.
    +1/-1     
    Tests
    1 files
    util_test.go
    Update test cases for primary key expression type assignment.

    pkg/vm/engine/disttae/util_test.go

  • Updated test cases to reflect changes in type assignment for primary
    key expressions.
  • +7/-6     

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    …6510)
    
    (a,b,c) are composite keys
    
    a=? and b in (?,?...) ==> __mo_cpkey_col prefix_in (?,?...)
    a=? and b = ? and c in (?,?...) ==> __mo_cpkey_col in (?,?...)
    
    Approved by: @m-schen, @badboynt1, @XuPeng-SH, @ouyuanning
    @mergify mergify bot requested a review from sukki37 June 5, 2024 10:35
    @mergify mergify bot added the kind/feature label Jun 5, 2024
    @aunjgr aunjgr changed the title [opt] merge filters on composite primary keys in plan (#16510) [opt] merge filters on composite primary keys in plan (1.2-dev) Jun 5, 2024
    @matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Jun 5, 2024
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Review 🔍

    ⏱️ Estimated effort to review [1-5]

    3, because the PR involves multiple changes across various files with complex logic related to SQL planning and optimization. The changes include enhancements to expression handling, function binding, and type assignments, as well as improvements to selectivity estimation and filter merging on composite keys. The complexity of the SQL planner logic and the potential impact on query performance and correctness necessitate a thorough review.

    🧪 Relevant tests

    No

    ⚡ Possible issues

    Possible Bug: The changes in function binding and expression handling could introduce bugs if not all edge cases are handled. For example, the modifications in apply_indices.go and base_binder.go involve complex logic that could potentially misinterpret or mishandle SQL expressions under certain conditions.

    Performance Concern: The modifications to how filters are applied and how expressions are evaluated could potentially alter the performance characteristics of SQL queries. It's crucial to ensure that these changes do not degrade performance, especially for complex queries involving multiple joins and subqueries.

    🔒 Security concerns

    No

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Best practice
    Add unit tests to verify the correct behavior of the Overloads slice

    To ensure consistency and avoid potential issues, consider adding unit tests that verify
    the correct behavior of the Overloads slice, especially after the changes made in this PR.

    pkg/sql/plan/function/list_operator.go [480-486]

    -Overloads: []overload{
    -    {
    -        overloadId: 0,
    -        args:       []types.T{types.T_uint8, types.T_uint8},
    -        retType: func(parameters []types.Type) types.Type {
    -            return types.T_bool.ToType()
    -        },
    -    },
    -    // ... more overloads
    -},
    +// Add unit tests to verify the correct behavior of the Overloads slice
    +func TestOverloads(t *testing.T) {
    +    overloads := getOverloads() // Assuming getOverloads() returns the Overloads slice
    +    for _, overload := range overloads {
    +        if len(overload.args) != 2 {
    +            t.Errorf("Expected 2 args, got %d", len(overload.args))
    +        }
    +        if overload.retType(nil) != types.T_bool.ToType() {
    +            t.Errorf("Expected return type T_bool, got %v", overload.retType(nil))
    +        }
    +    }
    +}
     
    Suggestion importance[1-10]: 9

    Why: Adding unit tests is crucial for ensuring the reliability and correctness of the code, especially after significant modifications. This suggestion addresses a fundamental aspect of software development.

    9
    Possible bug
    Add a nil check before accessing the Expr field to prevent potential nil pointer dereference

    To avoid potential nil pointer dereference, add a check to ensure exprImpl.F.Args[1] is
    not nil before accessing its Expr field.

    pkg/sql/plan/stats.go [510-511]

    +if exprImpl.F.Args[1] == nil {
    +    return 0.5
    +}
     switch arg1Impl := exprImpl.F.Args[1].Expr.(type) {
     
    Suggestion importance[1-10]: 8

    Why: This suggestion correctly identifies a potential nil pointer dereference, which is a critical issue that could lead to runtime panics.

    8
    Add a validation check to ensure the args slice contains exactly two elements

    Consider adding a validation check to ensure that the args slice contains exactly two
    elements before using it. This will prevent potential runtime errors if the args slice is
    incorrectly populated.

    pkg/sql/plan/function/list_operator.go [483]

    -args:       []types.T{types.T_uint8, types.T_uint8},
    +args: func() []types.T {
    +    if len(args) != 2 {
    +        panic("args slice must contain exactly two elements")
    +    }
    +    return []types.T{types.T_uint8, types.T_uint8}
    +}(),
     
    Suggestion importance[1-10]: 7

    Why: Adding a validation check is a good practice to prevent runtime errors, though the suggestion's implementation using a panic might not be the best approach for production code.

    7
    Enhancement
    Use a loop to initialize the Overloads slice, making the code more concise and easier to extend

    Consider using a loop to initialize the Overloads slice, which will make the code more
    concise and easier to extend in the future.

    pkg/sql/plan/function/list_operator.go [480-486]

    -Overloads: []overload{
    -    {
    -        overloadId: 0,
    -        args:       []types.T{types.T_uint8, types.T_uint8},
    -        retType: func(parameters []types.Type) types.Type {
    -            return types.T_bool.ToType()
    -        },
    -    },
    -    {
    -        overloadId: 1,
    -        args:       []types.T{types.T_uint16, types.T_uint16},
    -        retType: func(parameters []types.Type) types.Type {
    -            return types.T_bool.ToType()
    -        },
    -    },
    -    // ... more overloads
    -},
    +Overloads: func() []overload {
    +    var overloads []overload
    +    typesList := []types.T{types.T_uint8, types.T_uint16, types.T_uint32, types.T_uint64, types.T_int8, types.T_int16, types.T_int32, types.T_int64, types.T_float32, types.T_float64, types.T_decimal64, types.T_decimal128, types.T_varchar, types.T_char, types.T_date, types.T_datetime, types.T_bool, types.T_timestamp, types.T_blob, types.T_uuid, types.T_text, types.T_time, types.T_binary, types.T_varbinary}
    +    for i, t := range typesList {
    +        overloads = append(overloads, overload{
    +            overloadId: i,
    +            args:       []types.T{t, t},
    +            retType: func(parameters []types.Type) types.Type {
    +                return types.T_bool.ToType()
    +            },
    +        })
    +    }
    +    return overloads
    +}(),
     
    Suggestion importance[1-10]: 8

    Why: Using a loop to initialize the Overloads slice is a significant improvement for scalability and maintainability of the code.

    8
    Extract repeated logic into a helper function to improve readability and maintainability

    To improve readability and maintainability, consider extracting the repeated logic of
    creating a plan.Expr with Expr_List or Expr_Vec into a helper function.

    pkg/sql/plan/build_insert.go [760-764]

    -{
    -    Typ: pkExpr.Typ,
    -    Expr: &plan.Expr_List{
    -        List: &plan.ExprList{
    -            List: colExprs[0],
    -        },
    -    },
    -}
    +createExprWithList(pkExpr.Typ, colExprs[0])
     
    Suggestion importance[1-10]: 6

    Why: The suggestion to extract repeated logic into a helper function is valid and would improve code maintainability and readability.

    6
    Performance
    Optimize isRuntimeConstExpr by using a single loop for both Expr_F and Expr_List cases

    The isRuntimeConstExpr function can be optimized by using a single loop to handle both
    Expr_F and Expr_List cases, reducing code duplication.

    pkg/sql/plan/apply_indices.go [32-47]

    -case *plan.Expr_F:
    -    for _, subExpr := range exprImpl.F.Args {
    +case *plan.Expr_F, *plan.Expr_List:
    +    var subExprs []*plan.Expr
    +    if exprImplF, ok := exprImpl.(*plan.Expr_F); ok {
    +        subExprs = exprImplF.F.Args
    +    } else if exprImplList, ok := exprImpl.(*plan.Expr_List); ok {
    +        subExprs = exprImplList.List.List
    +    }
    +    for _, subExpr := range subExprs {
             if !isRuntimeConstExpr(subExpr) {
                 return false
             }
         }
         return true
     
    -case *plan.Expr_List:
    -    for _, subExpr := range exprImpl.List.List {
    -        if !isRuntimeConstExpr(subExpr) {
    -            return false
    -        }
    -    }
    -    return true
    -
    Suggestion importance[1-10]: 8

    Why: This suggestion effectively reduces code duplication by handling both Expr_F and Expr_List in a single case block, improving the performance and readability of the function. The proposed change correctly implements the logic for checking runtime constant expressions in a more concise manner.

    8
    Possible issue
    Ensure data is properly initialized and handle potential errors from MarshalBinary

    To avoid potential issues with uninitialized data, ensure data is properly initialized
    before using it.

    pkg/sql/plan/utils.go [206]

    -data, _ := vec.MarshalBinary()
    +data, err := vec.MarshalBinary()
    +if err != nil {
    +    return nil // or handle the error appropriately
    +}
     
    Suggestion importance[1-10]: 8

    Why: Proper error handling for MarshalBinary is crucial to avoid using uninitialized data, which can lead to undefined behavior or crashes.

    8
    Add a nil check before accessing cols[0] to handle potential errors more gracefully

    To handle potential errors more gracefully, add a check to ensure cols[0] is not nil
    before calling GetBytesAt.

    pkg/sql/plan/function/func_get_admin.go [56]

    -adminNme = cols[0].GetBytesAt(0)
    +if cols[0] != nil {
    +    adminNme = cols[0].GetBytesAt(0)
    +}
     
    Suggestion importance[1-10]: 7

    Why: Adding a nil check before accessing cols[0] is a good practice to prevent runtime errors, making the suggestion valuable for robustness.

    7
    Maintainability
    Move currColPos initialization and check into a helper function to improve code clarity

    The variable currColPos is initialized to -1 and checked multiple times. Consider moving
    the initialization and check into a helper function to improve code clarity.

    pkg/sql/plan/expr_opt.go [111-115]

    -if currColPos == -1 {
    -    currColPos = mergedFn.Args[0].GetCol().ColPos
    -} else if currColPos != mergedFn.Args[0].GetCol().ColPos {
    +if !updateCurrColPos(&currColPos, mergedFn.Args[0].GetCol().ColPos) {
         newOrArgs = append(newOrArgs, subExpr)
         continue
     }
     
    +// Helper function
    +func updateCurrColPos(currColPos *int32, newColPos int32) bool {
    +    if *currColPos == -1 {
    +        *currColPos = newColPos
    +        return true
    +    }
    +    return *currColPos == newColPos
    +}
    +
    Suggestion importance[1-10]: 7

    Why: The suggestion to encapsulate the logic of currColPos into a helper function is valid and would improve the maintainability and clarity of the code. The provided helper function correctly implements the described functionality and maintains the logic of the original code.

    7
    Define a helper function to construct the args slice, reducing code duplication

    To improve readability and maintainability, consider defining a helper function that
    constructs the args slice for each overload, reducing code duplication.

    pkg/sql/plan/function/list_operator.go [483]

    -args:       []types.T{types.T_uint8, types.T_uint8},
    +args:       createArgs(types.T_uint8, types.T_uint8),
     
    Suggestion importance[1-10]: 6

    Why: The suggestion to reduce code duplication through a helper function is valid and improves maintainability, but it's a relatively minor enhancement.

    6
    Refactor repeated nil checks into a single check to improve readability

    The check for mergedFn.Args[0].GetCol() being nil is repeated multiple times. This can be
    refactored into a single check at the beginning of the loop to improve readability and
    reduce redundancy.

    pkg/sql/plan/expr_opt.go [106-109]

    -if mergedFn == nil || mergedFn.Args[0].GetCol() == nil || !isRuntimeConstExpr(mergedFn.Args[1]) {
    +if mergedFn == nil || mergedFn.Args[0].GetCol() == nil {
    +    newOrArgs = append(newOrArgs, subExpr)
    +    continue
    +}
    +if !isRuntimeConstExpr(mergedFn.Args[1]) {
         newOrArgs = append(newOrArgs, subExpr)
         continue
     }
     
    Suggestion importance[1-10]: 6

    Why: The suggestion correctly identifies redundancy in the nil checks and proposes a refactoring that could improve code readability. However, the suggested code introduces a logical change by separating the checks for mergedFn.Args[0].GetCol() == nil and !isRuntimeConstExpr(mergedFn.Args[1]) into two different conditions, which might not be equivalent to the original logic depending on the context.

    6
    Consolidate inArgs append operations into a single function to reduce redundancy

    The inArgs slice is appended with elements in multiple places. Consider consolidating
    these append operations into a single function to reduce redundancy.

    pkg/sql/plan/expr_opt.go [128-131]

    -inArgs = append(inArgs, inArg)
    -inArgs = append(inArgs, mergedFn.Args[1])
    +appendInArgs(&inArgs, inArg)
    +appendInArgs(&inArgs, mergedFn.Args[1])
     
    +// Helper function
    +func appendInArgs(inArgs *[]*plan.Expr, arg *plan.Expr) {
    +    *inArgs = append(*inArgs, arg)
    +}
    +
    Suggestion importance[1-10]: 6

    Why: The suggestion to consolidate the append operations into a helper function is a good practice for reducing redundancy and improving code modularity. However, the impact on overall code maintainability is moderate, as the original code is not overly complex or repetitive.

    6

    @mergify mergify bot merged commit 32e1e53 into matrixorigin:1.2-dev Jun 5, 2024
    17 of 18 checks passed
    @aunjgr aunjgr deleted the cp-1.2 branch June 11, 2024 09:05
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    7 participants