Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flush object dedup 1.2 #16860

Merged
merged 3 commits into from
Jun 13, 2024
Merged

flush object dedup 1.2 #16860

merged 3 commits into from
Jun 13, 2024

Conversation

aptend
Copy link
Contributor

@aptend aptend commented Jun 13, 2024

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #16630

What this PR does / why we need it:

reduce the size of tombstone files


PR Type

Bug fix, Enhancement


Description

  • Added deduplication logic for row IDs in foreachPersistedDeletes to reduce the size of tombstone files.
  • Improved the NewFlushTableTailTask function to avoid processing duplicate objects by using a map to track seen objects.
  • Introduced logic to collect and log tombstone statistics for specific tables in flushAllDeletesFromDelSrc.
  • Enhanced the GetLatestDeltaloc method in ObjectMVCCHandle to handle cases where deltaloc or its latest node is nil.

Changes walkthrough 📝

Relevant files
Bug fix
base.go
Add deduplication logic for row IDs in persisted deletes 

pkg/vm/engine/tae/tables/base.go

  • Added deduplication logic for row IDs during the processing of
    persisted deletes.
  • Introduced a prev variable to track the previous row ID and avoid
    duplicates.
  • +6/-2     
    mvcc.go
    Improve robustness of `GetLatestDeltaloc` method                 

    pkg/vm/engine/tae/tables/updates/mvcc.go

  • Enhanced GetLatestDeltaloc method to handle cases where deltaloc or
    its latest node is nil.
  • +2/-1     
    Enhancement
    flushTableTail.go
    Improve object deduplication and tombstone statistics in flush tasks

    pkg/vm/engine/tae/tables/jobs/flushTableTail.go

  • Added a map to track seen objects and avoid processing duplicates.
  • Introduced logic to collect and log tombstone statistics for specific
    tables.
  • Enhanced the handling of delete intents and tombstone locations.
  • +36/-0   

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review [1-5]

    3

    🧪 Relevant tests

    No

    🔒 Security concerns

    No

    ⚡ Key issues to review

    Possible Bug:
    The deduplication logic in foreachPersistedDeletes uses a prev variable to avoid processing duplicate row IDs. However, the prev variable is not updated in all code paths, potentially leading to incorrect deduplication.

    Performance Concern:
    In NewFlushTableTailTask, the use of a map objSeen to track seen objects might not be efficient if the number of objects is large. Consider optimizing this part if performance issues arise.

    Logic Issue:
    In flushAllDeletesFromDelSrc, the logic to handle tombstone statistics is only triggered if the table name matches "bmsql_stock". This could be problematic if the table name changes or if similar functionality is needed for other tables.

    @matrix-meow matrix-meow added the size/S Denotes a PR that changes [10,99] lines label Jun 13, 2024
    @mergify mergify bot requested a review from sukki37 June 13, 2024 05:51
    @mergify mergify bot added the kind/bug Something isn't working label Jun 13, 2024
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Possible bug
    Add a nil check for objData to avoid potential nil pointer dereferences

    To avoid potential nil pointer dereferences, add a check to ensure objData is not nil
    before accessing its methods.

    pkg/vm/engine/tae/tables/jobs/flushTableTail.go [700]

     objData := obj.GetObjectData()
    +if objData == nil {
    +    continue
    +}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Adding a nil check before accessing methods on objData is a good practice to prevent runtime errors due to nil pointer dereferences. This is a valuable suggestion that enhances the robustness of the code.

    7
    Readability
    Combine conditions in the if statement into a single line for better readability

    To improve readability, consider combining the conditions in the if statement into a
    single line.

    pkg/vm/engine/tae/tables/updates/mvcc.go [540-541]

    -if mvcc == nil || mvcc.deltaloc == nil || mvcc.deltaloc.GetLatestNodeLocked() == nil {
    -    return nil
    -}
    +if mvcc == nil || mvcc.deltaloc == nil || mvcc.deltaloc.GetLatestNodeLocked() == nil { return nil }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 4

    Why: While combining the conditions into a single line might slightly enhance readability, it's a minor stylistic change and does not significantly impact the overall quality or functionality of the code.

    4
    Possible issue
    Initialize prev with a valid types.Rowid value to avoid potential issues

    To avoid potential issues with uninitialized variables, consider initializing prev with a
    valid types.Rowid value instead of an empty struct.

    pkg/vm/engine/tae/tables/base.go [517]

    -prev := types.Rowid{}
    +prev := types.Rowid{ID: [16]byte{}}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 3

    Why: The suggestion to initialize prev with a specific value is valid but not critical, as the default zero value for structs in Go (an empty struct in this case) is a common and acceptable practice unless specific initialization is required by context, which is not clearly indicated here.

    3

    @sukki37 sukki37 merged commit f81ebb8 into matrixorigin:1.2-dev Jun 13, 2024
    14 of 17 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Bug fix Enhancement kind/bug Something isn't working Review effort [1-5]: 3 size/S Denotes a PR that changes [10,99] lines
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    4 participants