Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp to 1.2-dev 'fix a bug that cause hashmap data wrong' #18779

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

badboynt1
Copy link
Contributor

@badboynt1 badboynt1 commented Sep 13, 2024

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #https://github.com/matrixorigin/MO-Cloud/issues/4049

What this PR does / why we need it:

cp to 1.2-dev 'fix a bug that cause hashmap data wrong'


PR Type

Bug fix, Tests


Description

  • Fixed a bug in the hashmap data handling by introducing a needDupVec flag to manage vector duplication when evaluating join conditions.
  • Enhanced the container struct to include logic for freeing duplicated vectors, ensuring proper memory management.
  • Added new test cases to verify the delete operation on a newly created table t3, ensuring the correctness of the delete functionality.

Changes walkthrough 📝

Relevant files
Bug fix
build.go
Fix hashmap data issue by handling vector duplication       

pkg/sql/colexec/hashbuild/build.go

  • Added import for pbplan.
  • Introduced needDupVec flag to handle condition expressions.
  • Modified logic to duplicate vectors based on needDupVec.
  • +14/-1   
    Enhancement
    types.go
    Enhance container struct with vector duplication handling

    pkg/sql/colexec/hashbuild/types.go

  • Added needDupVec field to container struct.
  • Implemented logic to free duplicated vectors.
  • +9/-2     
    Tests
    delete.result
    Add test case for delete operation on table t3                     

    test/distributed/cases/dml/delete/delete.result

  • Added test case for deleting entries in table t3.
  • Verified count after deletion.
  • +7/-0     
    delete.test
    Implement delete operation test for table t3                         

    test/distributed/cases/dml/delete/delete.test

    • Added setup and execution for delete test on table t3.
    +5/-0     

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Memory Management
    The introduction of needDupVec flag and vector duplication may lead to increased memory usage. Ensure proper memory management and cleanup.

    Performance Impact
    The addition of vector duplication and cleanup logic may impact performance. Consider profiling to ensure minimal performance overhead.

    @mergify mergify bot requested a review from sukki37 September 13, 2024 08:46
    @mergify mergify bot added the kind/bug Something isn't working label Sep 13, 2024
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Error handling
    Improve error handling for vector duplication operation

    Consider adding error handling for the vec.Dup() operation. While the error is being
    returned, it's good practice to log or handle the error explicitly before returning.

    pkg/sql/colexec/hashbuild/build.go [471-474]

     ctr.vecs[idx1][idx2], err = vec.Dup(proc.Mp())
     if err != nil {
    -    return err
    +    // Log the error or perform any necessary cleanup
    +    return fmt.Errorf("failed to duplicate vector: %w", err)
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Adding error handling for the vector duplication operation enhances robustness by ensuring that any errors are logged or handled, providing more context for debugging.

    8
    Maintainability
    Use a constant for the threshold that determines when to use strhashmap

    Consider using a constant or a configuration parameter for the threshold that
    determines when to use strhashmap. This would make the code more maintainable and
    easier to adjust in the future.

    pkg/sql/colexec/hashbuild/build.go [61-62]

    -// todo : for varlena type, always go strhashmap
    -if types.T(typ.Id).FixedLength() < 0 {
    +const varlenaThreshold = 0
    +// Use strhashmap for varlena types
    +if types.T(typ.Id).FixedLength() <= varlenaThreshold {
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Using a constant for the threshold improves maintainability by making it easier to adjust the threshold value in the future without modifying the code logic.

    7
    Resource management
    Use defer to ensure proper resource cleanup

    Consider using a defer statement to ensure that resources are properly freed, even
    if an error occurs during the execution of the function.

    pkg/sql/colexec/hashbuild/types.go [117-123]

     if ctr.needDupVec {
    -    for i := range ctr.vecs {
    -        for j := range ctr.vecs[i] {
    -            ctr.vecs[i][j].Free(proc.Mp())
    +    defer func() {
    +        for i := range ctr.vecs {
    +            for j := range ctr.vecs[i] {
    +                ctr.vecs[i][j].Free(proc.Mp())
    +            }
             }
    -    }
    +    }()
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Using defer for resource cleanup is a good practice, but in this context, it may not be necessary as the cleanup is already conditional and straightforward.

    6
    Documentation
    Add a comment explaining the purpose of the needDupVec flag

    Consider adding a comment explaining the purpose and implications of the needDupVec
    flag. This will improve code readability and maintainability.

    pkg/sql/colexec/hashbuild/build.go [52-58]

    +// needDupVec indicates whether vector duplication is required.
    +// It's set to true if any condition is not a simple column expression,
    +// as complex expressions may modify the original vector.
     ctr.needDupVec = false
     ctr.executor = make([]colexec.ExpressionExecutor, len(ap.Conditions))
     ctr.keyWidth = 0
     for i, expr := range ap.Conditions {
         if _, ok := ap.Conditions[i].Expr.(*pbplan.Expr_Col); !ok {
             ctr.needDupVec = true
         }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 5

    Why: Adding comments improves code readability and maintainability by providing context and explanations for future developers, but it is not a critical change.

    5

    @matrix-meow matrix-meow added the size/S Denotes a PR that changes [10,99] lines label Sep 13, 2024
    @mergify mergify bot merged commit 7e9a980 into matrixorigin:1.2-dev Sep 13, 2024
    18 checks passed
    @badboynt1 badboynt1 deleted the 1.2-dev branch September 13, 2024 12:26
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Bug fix kind/bug Something isn't working Review effort [1-5]: 3 size/S Denotes a PR that changes [10,99] lines Tests
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    6 participants