Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC tombstone object (#16102) 1.2 #16238

Merged
merged 3 commits into from
May 22, 2024

Conversation

LeftHandCold
Copy link
Contributor

The expired tombstone object has not been GC
now fixed

Approved by: @XuPeng-SH

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue ##16101 #15114 #15171

What this PR does / why we need it:

The expired tombstone object has not been GC
now fixed

The expired tombstone object has not been GC
now fixed

Approved by: @XuPeng-SH
@LeftHandCold LeftHandCold changed the title GC tombstone object (#16102) GC tombstone object (#16102) 1.2 May 20, 2024
@matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label May 20, 2024
@mergify mergify bot requested a review from sukki37 May 20, 2024 04:45
@matrix-meow
Copy link
Contributor

@LeftHandCold Thanks for your contributions!

Here are review comments for file pkg/container/types/rowid.go:

Pull Request Review:

Title:

The title of the pull request is concise and indicates that the changes are related to addressing an issue with GC tombstone object.

Body:

The body of the pull request provides a brief description of the problem being fixed, which is the expired tombstone object not being garbage collected. It also mentions that the issue has been fixed.

Changes in rowid.go:

  • Added Function ObjectNameString():
    • A new function ObjectNameString() has been added to the Blockid struct in rowid.go.
    • This function generates a string representation of the object name based on the segment and file number.
    • The function uses the Segment() method to get the segment and Offsets() method to get the file number.
    • The generated string format is segment_fileNum, where fileNum is padded with zeros to a width of 5.

Suggestions for Improvement:

  1. Error Handling:

    • In the ObjectNameString() function, there is an underscore _ used to ignore the second return value of b.Offsets(). It's important to handle errors properly instead of ignoring them. Consider adding error handling to provide a more robust implementation.
  2. Consistency in Naming:

    • Ensure consistency in naming conventions across the codebase. If the function is named ObjectNameString(), make sure it aligns with the naming conventions used in other parts of the code.
  3. Documentation:

    • Add comments or documentation to the new function ObjectNameString() to explain its purpose, inputs, and outputs. This will help other developers understand the function's functionality without having to dive into the implementation details.
  4. Testing:

    • Consider adding unit tests for the new function ObjectNameString() to ensure its correctness and prevent regressions in the future.
  5. Optimization:

    • Depending on the performance requirements, consider optimizing the implementation of ObjectNameString() for efficiency. Ensure that the function performs well, especially if it is called frequently or in performance-critical paths.
  6. Security Considerations:

    • Review the changes for any potential security implications. Ensure that the new function does not introduce vulnerabilities such as injection attacks or data leaks.

By addressing these suggestions, the codebase can be improved in terms of reliability, maintainability, and performance. Additionally, following best practices for error handling, naming conventions, documentation, and testing will contribute to a more robust and high-quality codebase.

Here are review comments for file pkg/vm/engine/tae/db/gc/types.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that it addresses an issue related to GC tombstone objects.

Body:

The body of the pull request provides a brief description of the problem being fixed, which is the issue of expired tombstone objects not being garbage collected. It also mentions the approval by a specific user.

Changes in types.go:

  1. Addition of Constants:

    • The addition of CurrentVersion constant and Versions, ObjectList, TombstoneList enums seems appropriate for managing different types of batches.
    • Suggestion: Consider adding comments to explain the purpose of these new constants and enums for better code readability.
  2. Addition of Tombstone-related Constants:

    • New constants Tombstone, GCAttrTombstone, and GCAttrVersion are added for managing tombstone-related attributes.
    • Suggestion: Ensure that the naming conventions are consistent and clear to avoid confusion in the future.
  3. Schema Attributes and Types for Tombstone:

    • New schema attributes and types for tombstone objects are defined.
    • Suggestion: Consider encapsulating the tombstone schema attributes and types into a struct or a map for better organization and maintenance.
  4. Schema Attributes and Types for Versions:

    • New schema attributes and types for versions are defined.
    • Suggestion: Similar to tombstone schema, consider grouping version schema attributes and types for better code structure.
  5. Overall Suggestions:

    • Add comments to explain the purpose of new additions.
    • Ensure consistent naming conventions for clarity.
    • Consider grouping related constants, attributes, and types for better organization.

Security Concerns:

  • No apparent security concerns were identified in the provided changes.

General Suggestions:

  • Add comments to explain the purpose of new additions for better code understanding.
  • Consider grouping related constants, attributes, and types for improved code organization.
  • Ensure that naming conventions are consistent and descriptive to enhance code readability.

Overall, the changes seem to address the issue of GC tombstone objects effectively. By incorporating the suggestions provided, the codebase can be further optimized for clarity and maintainability.

Here are review comments for file pkg/vm/engine/tae/db/test/db_test.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that it addresses an issue related to GC tombstone object.

Body:

The body of the pull request provides a brief description of the problem being fixed, which is the expired tombstone object not being garbage collected. It also mentions that the issue has been fixed now.

Changes in pkg/vm/engine/tae/db/test/db_test.go:

  1. Code Quality:

    • In the TestAppendAndGC function, there is a condition if db.Runtime.Scheduler.GetPenddingLSNCnt() != 0 followed by a return. This pattern can lead to unexpected behavior as the subsequent assertions and cleanup code will not be executed. It is recommended to handle this scenario differently, such as logging a warning or failing the test explicitly.
    • Similar issue exists in the TestAppendAndGC function where if minMerged == nil is followed by a return statement. This can potentially hide failures or incomplete test execution. Consider revising the flow to ensure all necessary checks and assertions are performed.
    • In the TestSnapshotGC function, there is a condition if db.DiskCleaner.GetCleaner().GetMinMerged() == nil followed by a return. This can result in incomplete test coverage and potential issues being overlooked. It's advisable to handle this scenario more robustly to ensure comprehensive testing.
  2. Optimization:

    • Instead of using multiple if conditions followed by return statements, consider restructuring the code to perform all necessary checks and validations first and then decide on the appropriate action. This will ensure that all relevant assertions and cleanup steps are executed consistently.
  3. Consistency:

    • Ensure consistent error handling and test flow across different test functions to maintain clarity and predictability in the test suite.
  4. Security:

    • While the changes do not introduce any direct security vulnerabilities, the incomplete test execution due to early returns could potentially mask underlying issues or bugs. It's crucial to ensure that all test scenarios are thoroughly covered to maintain the reliability and stability of the codebase.

Suggestions:

  1. Refactor the test functions to avoid early returns after conditional checks. Instead, perform all necessary validations and assertions before deciding on the test outcome.
  2. Consider enhancing error handling and logging to provide better visibility into test failures and potential issues.
  3. Ensure consistency in test structure and flow to improve maintainability and readability of the test suite.
  4. Verify that the changes do not inadvertently impact the overall test coverage and effectiveness.

By addressing the mentioned points, the quality and reliability of the test suite can be improved, leading to a more robust codebase.

Here are review comments for file pkg/vm/engine/tae/logtail/snapshot.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to addressing an issue with GC tombstone objects.

Body:

The body of the pull request provides a brief description of the problem being fixed, which is the expired tombstone object not being garbage collected. It also mentions that the issue has been fixed. However, it lacks detailed information on the specific changes made to address the problem.

Changes in pkg/vm/engine/tae/logtail/snapshot.go:

  1. Logging Additions:

    • Lines 201, 222, and 230: New log messages have been added for debugging purposes. While logging can be helpful for troubleshooting, excessive logging in production code can impact performance and may expose sensitive information. It's important to ensure that log messages are necessary and do not reveal sensitive data.
  2. Batch Handling:

    • Lines 383-405: Two new batches (snapTableBat and bat) are created and populated with vectors. It's good practice to ensure that resources associated with these batches are properly managed and closed to prevent memory leaks. Consider using defer statements to close these batches to ensure proper resource cleanup.
  3. Object Writing:

    • Lines 412-428: Writing data from snapTableBat to the object writer. It's essential to handle errors properly when writing data to avoid data corruption or loss. Ensure that error handling is robust and informative to handle any potential issues during the writing process.
  4. Rebuilding Functions:

    • Lines 446-453 and 565-582: New functions RebuildTid and Rebuild have been added. The RebuildTid function seems to handle the rebuilding of TID information. It's important to ensure that the logic in these functions is correct and efficient. Additionally, consider adding comments to explain the purpose and functionality of these functions for better code readability.

Suggestions for Improvement:

  1. Logging Improvement:

    • Review the added log messages to ensure they provide meaningful information for debugging purposes without exposing sensitive data. Consider using log levels effectively to control the verbosity of logs.
  2. Resource Management:

    • Utilize defer statements to ensure proper closure of resources like batches (snapTableBat and bat) to prevent memory leaks and improve code readability.
  3. Error Handling:

    • Enhance error handling mechanisms during data writing operations to handle potential errors gracefully and provide informative error messages for debugging purposes.
  4. Code Documentation:

    • Consider adding comments to the newly added functions (RebuildTid and Rebuild) to explain their purpose, input parameters, and expected behavior for better code understanding.
  5. Optimization:

    • Review the code for any redundant operations or optimizations that can improve performance or reduce complexity without compromising functionality.

By addressing the suggestions mentioned above, the code quality, maintainability, and performance of the snapshot.go file can be enhanced, ensuring a more robust and efficient implementation.

@mergify mergify bot merged commit 5f5d068 into matrixorigin:1.2-dev May 22, 2024
17 of 19 checks passed
@aylei aylei mentioned this pull request Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M Denotes a PR that changes [100,499] lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants