Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/cache weight allocated size #161

Merged
merged 11 commits into from
Nov 1, 2024
Merged

Conversation

sh-cha
Copy link
Contributor

@sh-cha sh-cha commented Oct 31, 2024

Description

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the correct branch
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

Summary by CodeRabbit

  • New Features

    • Introduced a custom memory allocator to enhance memory management.
    • Added new structures for wrapping code and module code to improve size tracking.
  • Improvements

    • Enhanced script and module caching mechanisms for better memory allocation management.
    • Updated methods to reflect new structures for handling scripts and modules.
    • Improved the handling of script caching and verification for more effective management of script metadata.
    • Removed outdated testing features from the e2e-move-tests package.
    • Transitioned to using wrapper structures for module and script codes to better manage size and caching.
  • Bug Fixes

    • Resolved issues with script verification and caching logic to ensure accurate handling of script metadata.

@sh-cha sh-cha self-assigned this Oct 31, 2024
@sh-cha sh-cha requested a review from a team as a code owner October 31, 2024 09:43
Copy link

coderabbitai bot commented Oct 31, 2024

Walkthrough

The changes in this pull request involve modifications to several Cargo.toml files and the implementation of new structures and methods in the codebase. Specifically, the initia-move-storage package has had its feature definitions removed, while the e2e-move-tests package has removed a specific testing feature. Additionally, a custom memory allocator has been introduced, and various code structures have been updated to utilize new wrapper types for managing code and module sizes. These alterations enhance the organization and functionality of the code related to memory management and feature handling.

Changes

File Change Summary
crates/e2e-move-tests/Cargo.toml Removed "initia-move-storage/testing" feature from the testing feature list.
crates/storage/Cargo.toml Removed the entire [features] section, including default and testing features; added rand as a dev dependency.
crates/storage/src/allocator.rs Introduced SizeCounterAllocator, a custom memory allocator that tracks allocated memory size.
crates/storage/src/code_scale.rs Updated CodeScale and ModuleCodeScale to use ScriptWrapper and ModuleWrapper types.
crates/storage/src/code_storage.rs Modified deserialize_and_cache_script and verify_and_cache_script methods for script handling.
crates/storage/src/lib.rs Added new module allocator.
crates/storage/src/module_cache.rs Changed module_cache field type to use ModuleWrapper and updated related methods.
crates/storage/src/module_storage.rs Updated access patterns for module code verification and handling.
crates/storage/src/script_cache.rs Replaced Code<CompiledScript, Script> with ScriptWrapper in cache structure and methods.

Possibly related PRs

  • fix: audit #126: Modifies the Cargo.toml file for the initia-move-storage package, which is related to the Cargo.toml changes in the main PR where the "initia-move-storage/testing" feature was removed.
  • feat: impl json object #128: Changes in this PR involve modifications to the Cargo.toml file and tests in the e2e-move-tests package, which may relate to the removal of the "initia-move-storage/testing" feature as it impacts how tests are structured.
  • feat: receive movevm version and use_minlib as new cmd argument #134: This PR introduces new command-line arguments that may indirectly relate to the overall configuration and feature management in the Cargo.toml files across the project, including the e2e-move-tests.
  • fix: metadata to denom #135: The changes in this PR regarding metadata and denom handling may connect with the overall feature management and testing capabilities in the Cargo.toml files.
  • fix: restrict struct creation from the other modules #149: This PR introduces restrictions on struct creation from other modules, which could relate to the feature management in the Cargo.toml files by enforcing encapsulation and module integrity.
  • ripemd160 & bech32 #150: The introduction of new dependencies and modifications in the Cargo.toml files for various modules may relate to the overall dependency management and feature configuration in the main PR.

Suggested reviewers

  • sh-cha

🐇 In the land of code, where features do play,
A rabbit hops softly, in a new, bright way.
With allocators tracking the sizes we need,
And wrappers for modules, our code will succeed!
So let’s celebrate changes, both big and small,
For together in coding, we’ll conquer it all! 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (6)
crates/storage/src/allocator.rs (1)

26-32: Add documentation for public functions.

The public functions lack documentation explaining their purpose, thread safety guarantees, and potential race conditions. Consider adding rustdoc comments:

+/// Resets the allocation size counter to zero for the current thread.
+///
+/// # Thread Safety
+/// This function operates on thread-local storage, so it only affects the current thread's counter.
+/// However, be cautious when calling this during active allocations as it may lead to inconsistent counts.
 pub(crate) fn initialize_size() {
     SIZE.with(|size| size.set(0));
 }

+/// Returns the current allocation size counter for the current thread.
+///
+/// # Thread Safety
+/// This function operates on thread-local storage and returns the counter for the current thread only.
+/// The returned value might not reflect concurrent allocations happening on the same thread.
 pub(crate) fn get_size() -> usize {
     SIZE.with(|size| size.get())
 }
crates/storage/src/script_cache.rs (2)

26-26: Clarify units of cache_capacity in new method

Multiplying cache_capacity by 1024 * 1024 suggests that the input is now expected in megabytes. Please update the method documentation to reflect this change for clarity.


53-56: Improve clarity of error message

The error message "Script storage cache eviction error" may not clearly indicate the cause of the failure. Consider updating it to "Script exceeds cache capacity" to better inform users.

crates/storage/src/module_cache.rs (3)

67-67: Clarify cache capacity units in InitiaModuleCache::new

Multiplying cache_capacity by 1024 * 1024 converts it from megabytes to bytes. Ensure that this unit conversion is intentional and that the parameter's expected unit (megabytes) is clearly documented in the method's comments or documentation to avoid confusion for future maintainers.


83-83: Document the allocated_size parameter

The allocated_size parameter has been added to insert_deserialized_module. Please add documentation for this parameter to explain its purpose and how it should be calculated. This will improve code readability and maintainability.


111-111: Document the allocated_size parameter

Similarly, the allocated_size parameter in insert_verified_module should be documented. Providing clear guidance on how to obtain or calculate this size will help maintain consistency and prevent errors.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 02d65ec and 56f0fdb.

📒 Files selected for processing (9)
  • crates/e2e-move-tests/Cargo.toml (0 hunks)
  • crates/storage/Cargo.toml (0 hunks)
  • crates/storage/src/allocator.rs (1 hunks)
  • crates/storage/src/code_scale.rs (1 hunks)
  • crates/storage/src/code_storage.rs (4 hunks)
  • crates/storage/src/lib.rs (1 hunks)
  • crates/storage/src/module_cache.rs (7 hunks)
  • crates/storage/src/module_storage.rs (9 hunks)
  • crates/storage/src/script_cache.rs (5 hunks)
💤 Files with no reviewable changes (2)
  • crates/e2e-move-tests/Cargo.toml
  • crates/storage/Cargo.toml
✅ Files skipped from review due to trivial changes (1)
  • crates/storage/src/lib.rs
🔇 Additional comments (42)
crates/storage/src/allocator.rs (3)

1-8: LGTM! Thread-local storage implementation is correct.

The use of Cell<usize> with thread-local storage is an appropriate choice for tracking per-thread allocation sizes without synchronization overhead.


23-24: LGTM! Global allocator declaration is correct.

The static initialization and global allocator attribute are properly implemented.


1-32: Verify the allocator's usage across the codebase.

Since this is a critical component affecting all memory allocations, let's verify its usage:

✅ Verification successful

Based on the results, I can now provide a final verification response:

Memory allocation tracking implementation is properly integrated

The allocator's usage is well-defined and follows a consistent pattern across the codebase:

  • initialize_size() is called before memory-intensive operations in module/code storage
  • get_size() is used immediately after these operations to measure allocation size
  • The implementation is used in critical paths:
    • Module storage: During module verification and building
    • Code storage: During script deserialization and verification
    • Module cache: During build operations

The pattern shows the allocator is being used as intended to track memory allocations for caching and storage operations, with proper initialization before operations and size collection afterward.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usages of the size tracking functions
echo "=== Searching for initialize_size usage ==="
rg "initialize_size\(\)" -A 5

echo "=== Searching for get_size usage ==="
rg "get_size\(\)" -A 5

# Look for potential memory allocation patterns that might be affected
echo "=== Searching for Vec, String, and Box allocations ==="
ast-grep --pattern "Vec::$_" 
ast-grep --pattern "String::$_"
ast-grep --pattern "Box::$_"

Length of output: 5767

crates/storage/src/code_scale.rs (3)

16-19: LGTM: Improved weight calculation using actual size.

The implementation now correctly uses the allocated size stored in CodeWrapper instead of a constant value, which better reflects the actual memory usage.


24-28: LGTM: Consistent implementation with CodeWrapper.

The ModuleCodeScale implementation mirrors the CodeWrapper changes, providing consistent behavior across both code types.


Line range hint 1-54: Verify consistent usage of wrapper types across the codebase.

Let's ensure that all code using these wrapper types has been updated to use the new size() methods (after applying the suggested improvements).

✅ Verification successful

Let me examine the specific usage contexts to ensure consistency.


Based on the analysis of the usage patterns, I can now provide the final response:

The size field usage is consistent across the codebase

The verification shows that:

  • All CodeWrapper::new and ModuleCodeWrapper::new calls properly include the size parameter
  • Size field access is used appropriately in:
    • Cache weight calculations
    • Module storage operations
    • Code storage deserialization

The size tracking implementation is complete and consistent with no issues found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for direct access to size field to ensure all code is updated
rg -l "\.size\s*[^(]" 

# Search for creation of wrapper types to verify size parameter is always provided
ast-grep --pattern 'CodeWrapper::new($_, $_)'
ast-grep --pattern 'ModuleCodeWrapper::new($_, $_)'

Length of output: 1317


Script:

#!/bin/bash
# Check the context of size field usage
rg "\.size\s*[^(]" -C 3

# Check implementation of put_with_weight to understand size usage
ast-grep --pattern 'fn put_with_weight($$$)'

Length of output: 8375

crates/storage/src/script_cache.rs (8)

13-16: Imports updated to include CodeScale and CodeWrapper

The necessary imports for CodeScale, CodeWrapper, and Checksum have been correctly added.


19-19: Update script_cache to use CodeWrapper

The script_cache now stores CodeWrapper, encapsulating code along with its allocated size, which enhances cache management.


40-40: Added allocated_size parameter to insert_deserialized_script

The addition of the allocated_size parameter effectively tracks the memory usage of deserialized scripts.


44-44: Retrieve deserialized script from cache correctly

The code correctly returns the cached deserialized script without unnecessary processing.


50-56: Handle cache insertion with proper error management

The use of put_with_weight along with appropriate error handling ensures that scripts exceeding cache capacity are managed gracefully.


67-67: Added allocated_size parameter to insert_verified_script

Including the allocated_size parameter aids in tracking and managing the memory footprint of verified scripts.


73-78: Ensure script verification status before caching

The logic correctly checks if the script is already verified to prevent unnecessary cache updates.


90-90: Insert verified script with weight management

Using put_with_weight with allocated_size accurately reflects the script's impact on cache capacity.

crates/storage/src/module_cache.rs (8)

14-18: Imports updated for new caching strategy

The necessary modules (get_size, initialize_size, ModuleCodeScale, ModuleCodeWrapper, and Checksum) are imported to support the updated module caching implementation.


61-62: Updated module_cache to use ModuleCodeWrapper

The module_cache field now uses ModuleCodeWrapper, which aligns with the new approach of wrapping module codes for size tracking. This change enhances the caching mechanism by incorporating module size into cache management.


100-100: Handle cache eviction errors consistently

The use of .put_with_weight incorporates the module's allocated size into cache management. Ensure that the error handling for cache eviction is consistent across all cache insertions to maintain reliability.


118-121: Correctly check for verified modules in cache

The condition if code_wrapper.module_code.code().is_verified() ensures that the cache returns a verified module if available. This check is essential to maintain module integrity.


126-129: Ensure consistent error handling during cache insertion

When inserting a new module into the cache with .put_with_weight, the error handling properly addresses potential cache eviction scenarios. Consistent handling of these errors is important for cache reliability.


138-138: Handle cache insertion for new verified modules

In the case where the module is not found in the cache, the insertion logic correctly adds the new verified module along with its allocated size. This ensures that the cache stays updated with the latest verified modules.


157-157: Return type changed to Option<ModuleCodeWrapper>

The get_module_or_build_with method now returns an Option<ModuleCodeWrapper>, aligning with the new caching strategy. This change simplifies the return type and consolidates module code and size information.


167-173: Checksum mismatch error handling is appropriate

The code correctly checks for a checksum mismatch after module building and returns a detailed error if there is a discrepancy. This validation is crucial for maintaining module integrity.

crates/storage/src/code_storage.rs (7)

13-13: Approved: Importing necessary modules

The addition of ModuleBytesStorage, module_linker_error, and sha3_256 is appropriate for the new functionality.


17-17: Approved: Importing allocator functions

Including get_size and initialize_size from the allocator module is necessary for managing allocated sizes.


99-99: Accessing script code correctly

The change to script.code.deserialized().clone() ensures the deserialized script is retrieved properly from the cache.


101-105: Properly initializing and retrieving allocated size

Initializing the size before deserialization and capturing the allocated size afterward is correctly implemented.


152-156: Ensure accurate size calculation after verification

By calling initialize_size() before verification and summing sizes appropriately, the allocated size reflects the total memory usage of the verified script.


179-179: Approved: Correctly asserting script is not verified

The assertion assert!(!script.code.is_verified()) accurately checks the script's verification status before the test operation.


183-183: Approved: Correctly asserting script is verified

The assertion assert!(script.code.is_verified()) confirms that the script is verified after the operation, ensuring the test's integrity.

crates/storage/src/module_storage.rs (13)

5-5: Import get_size and initialize_size from allocator module

The addition of get_size and initialize_size from the allocator module is appropriate and necessary for managing allocated sizes within the module storage.


125-125: Ensure correct verification status in tests

The assertion assert!(!module.module_code.code().is_verified()) checks that the module is not verified in this test scenario. Confirm that this aligns with the intended test logic.


133-133: Ensure correct verification status in tests

The assertion assert!(module.module_code.code().is_verified()) verifies that the module is indeed verified. This is appropriate for the test scenario to ensure modules are correctly marked as verified after the process.


211-211: Retrieve module bytes using module_code.extension().bytes()

The method now retrieves the module bytes via module.module_code.extension().bytes().clone(). This change reflects the updated module structure and seems correct.


227-227: Retrieve module size using module_code.extension().bytes().len()

By accessing the length of the bytes using module.module_code.extension().bytes().len(), the module size is correctly obtained. This change aligns with the new structure for handling module bytes.


243-243: Access module metadata through deserialized code

The metadata is now accessed using module.module_code.code().deserialized().metadata.clone(). This aligns with the structural changes and ensures that the metadata is retrieved from the deserialized module.


259-259: Fetch deserialized module using module_code.code().deserialized()

The deserialized module is retrieved using module.module_code.code().deserialized().clone(). This change is consistent with the new module structure and correctly accesses the deserialized module.


275-275: Use module_code_wrapper for module retrieval

Introducing module_code_wrapper simplifies access to module code and size. This change ensures that both the module code and its allocated size are available for further processing.


283-286: Check module verification status using module_code_wrapper

The verification status is checked with module_code_wrapper.module_code.code().is_verified(), and the verified module is returned accordingly. This is consistent with the updated module structure and correctly handles the verification logic.


294-295: Pass module_code and size to dependency verification

The function visit_dependencies_and_verify now accepts module_code_wrapper.module_code and module_code_wrapper.size, allowing it to properly handle module code and track allocated size during verification.


319-319: Add allocated_size parameter to visit_dependencies_and_verify

Adding allocated_size: usize as a parameter enables the function to accumulate the total allocated size of modules during verification. This change is necessary for accurate size tracking.


342-387: Handle dependency checksums and recursive verification

The changes correctly handle the retrieval of dependency checksums and implement recursive verification of dependencies. The use of the visited HashSet to detect cycles is appropriate and prevents infinite recursion. Error handling for missing dependencies and cyclic dependencies is properly implemented.


390-393: Initialize size tracking and calculate module allocated size

Calling initialize_size() before verification resets the size counter. Calculating module_allocated_size with get_size() + allocated_size accurately accumulates the size of the module and its dependencies. Passing module_allocated_size to insert_verified_module ensures that the total size is tracked during module insertion.

Also applies to: 400-400

crates/storage/src/allocator.rs Outdated Show resolved Hide resolved
crates/storage/src/code_scale.rs Outdated Show resolved Hide resolved
crates/storage/src/code_scale.rs Outdated Show resolved Hide resolved
crates/storage/src/script_cache.rs Outdated Show resolved Hide resolved
crates/storage/src/module_cache.rs Outdated Show resolved Hide resolved
crates/storage/src/module_cache.rs Outdated Show resolved Hide resolved
crates/storage/src/module_cache.rs Outdated Show resolved Hide resolved
crates/storage/src/code_storage.rs Outdated Show resolved Hide resolved
Copy link
Member

@beer-1 beer-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
crates/storage/src/allocator.rs (2)

44-51: Document size calculation and improve error handling.

Two suggestions for improvement:

  1. Document why size_of::<T>() is added to the final size
  2. Consider handling cases where metering fails to start/finish properly

Add documentation explaining the size calculation:

 #[inline]
+/// Measures the heap memory allocated during the execution of the provided closure.
+/// The returned size includes:
+/// 1. All heap allocations made during closure execution
+/// 2. The size of the return type T to account for the result's memory footprint
 pub(crate) fn get_size<T, O: FnOnce() -> VMResult<T>>(f: O) -> VMResult<(T, usize)>

53-82: Enhance test coverage.

Consider adding the following test cases:

  1. Deallocation tracking (once implemented)
  2. Error handling for allocation failures
  3. Nested metering calls
  4. Edge cases like zero allocations and maximum size allocations

Would you like me to help generate these additional test cases?

crates/storage/src/script_cache.rs (1)

26-27: Consider using a named constant for size conversion

The multiplication by 1024 * 1024 converts MB to bytes, but this could be made clearer with a named constant.

+const BYTES_PER_MB: usize = 1024 * 1024;
+
 pub fn new(cache_capacity: usize) -> Arc<InitiaScriptCache> {
     Arc::new(InitiaScriptCache {
         script_cache: Mutex::new(CLruCache::with_config(
-            CLruCacheConfig::new(NonZeroUsize::new(cache_capacity * 1024 * 1024).unwrap())
+            CLruCacheConfig::new(NonZeroUsize::new(cache_capacity * BYTES_PER_MB).unwrap())
                 .with_scale(ScriptScale),
         )),
     })
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 56f0fdb and f8dd0ef.

📒 Files selected for processing (7)
  • crates/storage/Cargo.toml (1 hunks)
  • crates/storage/src/allocator.rs (1 hunks)
  • crates/storage/src/code_scale.rs (1 hunks)
  • crates/storage/src/code_storage.rs (4 hunks)
  • crates/storage/src/module_cache.rs (6 hunks)
  • crates/storage/src/module_storage.rs (10 hunks)
  • crates/storage/src/script_cache.rs (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/storage/Cargo.toml
  • crates/storage/src/code_storage.rs
🔇 Additional comments (31)
crates/storage/src/code_scale.rs (5)

14-19: LGTM: Clean and correct implementation of ScriptScale.

The weight scaling implementation correctly uses the size field from ScriptWrapper.


22-28: LGTM: Consistent implementation with ScriptScale.

The ModuleScale implementation maintains consistency with ScriptScale and correctly uses the size field.


30-40: Previous suggestions for ScriptWrapper improvements are still applicable.

The previous review comment suggesting improvements to ScriptWrapper (adding Debug derive, size validation, documentation, and private fields with getters) remains valid.


42-54: Previous suggestions for ModuleWrapper improvements are still applicable.

The previous review comment suggesting improvements to ModuleWrapper (adding Debug derive, size validation, documentation, and private fields with getters) remains valid.


37-39: Verify size validation in the allocator.

The wrapper constructors accept any size value without validation. We should verify that the size values are properly validated in the allocator to prevent potential memory issues.

Also applies to: 49-53

crates/storage/src/allocator.rs (3)

1-13: LGTM! Thread-local storage implementation is correct.

The use of thread-local storage with Cell provides safe interior mutability and prevents race conditions in concurrent scenarios.


15-27: Address memory tracking concerns.

The current implementation has the same issues as previously identified:

  1. The size counter never decreases during deallocation
  2. No protection against integer overflow
  3. No handling of allocation failures

29-30: LGTM! Global allocator is correctly defined.

The static instance is properly marked with the #[global_allocator] attribute.

crates/storage/src/script_cache.rs (5)

13-16: LGTM: Import changes are well-organized

The new imports for ScriptScale and ScriptWrapper are properly organized and necessary for the cache implementation.


19-19: LGTM: Improved cache type with weight-based eviction support

The change to use ScriptWrapper and ScriptScale enables proper memory management through weight-based cache eviction.


100-101: LGTM: Simplified get_script implementation

The method has been improved by using cloned() as suggested in the previous review.


105-107: LGTM: Simple and clear accessor method

The num_scripts method is straightforward and correctly implemented.


50-56: Verify error handling for cache eviction

The error handling for cache eviction looks good, but let's verify it's consistent across the codebase.

✅ Verification successful

Error handling for cache eviction is consistent

The error handling pattern for cache eviction is consistent across the codebase:

  • Script cache uses MEMORY_LIMIT_EXCEEDED with message "Script storage cache eviction error" in script_cache.rs
  • Module cache uses the same status code with message "Module storage cache eviction error" in module_cache.rs
  • Both use appropriate location context (Script/Module) in the error
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar cache eviction error handling patterns
rg -A 3 "MEMORY_LIMIT_EXCEEDED.*cache eviction error" --type rust

Length of output: 67


Script:

#!/bin/bash
# Let's search for all MEMORY_LIMIT_EXCEEDED usages to understand error handling patterns
rg -B 2 -A 2 "MEMORY_LIMIT_EXCEEDED" --type rust

# Also search for cache-related error handling patterns
rg -B 2 -A 2 "cache.*error|error.*cache" --type rust

Length of output: 7035

crates/storage/src/module_storage.rs (4)

176-180: Improved error handling for checksum mismatch.

The error handling has been enhanced with more detailed context, including the module location and a descriptive message.


219-219: Consistent use of module_code accessor.

The code has been updated to consistently use module.module_code for accessing module properties, improving code consistency and encapsulation.

Also applies to: 235-235, 251-251, 267-267


396-400: Size tracking for verified modules.

The code now properly tracks allocated size for verified modules using the get_size function. This is important for memory management.


283-294: Verify the module verification caching logic.

The code correctly checks if a module is already verified before proceeding with dependency verification. However, we should verify that the caching behavior works as expected.

✅ Verification successful

Let me gather more information about the module verification implementation to ensure the caching logic is sound.


Let me check one more aspect to understand the verification state transitions and thread safety.


Based on the gathered information, I can now provide a final response about the module verification caching logic:

Module verification caching implementation is thread-safe and correctly implemented

The caching implementation is sound for the following reasons:

  • Uses parking_lot::Mutex for thread-safe access to the module cache
  • Verification state transitions are handled through immutable Arc references
  • Cache invalidation and updates are properly synchronized through the mutex guard
  • Multiple verification checks are consistent across the codebase

The code in question correctly implements a thread-safe, two-phase verification pattern:

  1. First check if module is verified in cache
  2. If not verified, proceed with verification while holding proper locks
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential race conditions in module verification caching

# Test: Look for other places where module verification status is checked
rg -A 2 "is_verified\(\)"

Length of output: 2258


Script:

#!/bin/bash
# Check the Code struct implementation to understand verification state management
ast-grep --pattern 'struct Code {
  $$$
}'

# Look for functions that modify verification state
ast-grep --pattern 'impl Code {
  $$$
}'

# Check for concurrent access patterns
rg -A 5 "Mutex|RwLock" 

Length of output: 4994


Script:

#!/bin/bash
# Look for Code struct implementation details and state transitions
rg -A 10 "impl.*Code"

# Check for any verification-related tests
rg -l "test.*verif" 

# Look for any atomic operations
rg "atomic|Arc::new"

Length of output: 8034

crates/storage/src/module_cache.rs (14)

14-18: Imports are correctly added

The new imports for get_size, ModuleScale, ModuleWrapper, and Checksum are necessary for the updated functionality and are appropriately included.


61-61: Updated module_cache field aligns with new caching mechanism

Changing the module_cache field to use ModuleWrapper and ModuleScale reflects the new approach to module caching and size tracking, enhancing cache management.


66-69: Cache capacity calculation is accurate

Multiplying cache_capacity by 1024 * 1024 correctly converts the capacity from megabytes to bytes, ensuring the cache size is set appropriately. The use of NonZeroUsize and CLruCacheConfig::with_scale with ModuleScale is appropriate.


82-82: Addition of allocated_size parameter enhances size tracking

Introducing the allocated_size parameter in insert_deserialized_module allows for precise tracking of module sizes within the cache, which aids in effective cache eviction policies.


99-99: Proper insertion with weight into cache

Using .put_with_weight along with ModuleWrapper::new(module, allocated_size) correctly associates the module with its size, supporting size-based cache management.


110-110: allocated_size parameter added for consistency

Adding allocated_size to insert_verified_module maintains consistency in size tracking across different module insertion methods, improving cache accuracy.


116-117: Efficient retrieval of verified modules

The condition checks if the cached module is verified and returns it directly, which optimizes performance by avoiding unnecessary verifications.


123-123: Correct cache insertion of verified modules

Inserting the verified module with its allocated_size ensures that the cache maintains accurate size information, which is crucial for cache eviction logic.


142-142: Return type updated to match new module wrapper

Changing the return type of get_module_or_build_with to VMResult<Option<ModuleWrapper>> aligns with the new ModuleWrapper structure, enhancing cohesiveness in the codebase.


145-146: Cache hit handling is appropriate

Cloning the ModuleWrapper when a cache hit occurs ensures that the caller receives a valid module instance without affecting the cache's internal state.


147-147: Accurate allocation size measurement during module build

Using get_size to measure the allocated_size while building the module ensures that size tracking is precise, which is essential for cache management.


150-156: Checksum validation ensures module integrity

The code correctly verifies that the module's hash matches the expected checksum, preventing potential inconsistencies or security issues due to mismatched modules.


158-158: Wrapping code with ModuleWrapper for size tracking

Creating a ModuleWrapper with the module and its allocated_size facilitates effective cache size management and eviction policies.


160-165: Error handling aligns with existing patterns

The error handling for cache insertion failures is consistent with the established approach in the codebase, providing clear and specific error messages.

crates/storage/src/allocator.rs Outdated Show resolved Hide resolved
crates/storage/src/script_cache.rs Show resolved Hide resolved
crates/storage/src/module_storage.rs Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)
crates/storage/src/allocator.rs (1)

55-84: Enhance test coverage.

While the current test verifies basic functionality, consider adding these test cases:

  1. Edge cases (maximum size, overflow scenarios)
  2. Concurrent allocations with varying sizes
  3. Error cases (nested metering, allocation failures)
  4. Mixed allocation sizes

Here's an example test case for nested metering:

#[test]
fn test_nested_metering() {
    let result = get_size(|| {
        // Attempt nested metering
        let nested = get_size(|| Ok(()));
        assert!(nested.is_err()); // Should fail
        Ok(())
    });
    assert!(result.is_ok());
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f8dd0ef and 0a30558.

📒 Files selected for processing (1)
  • crates/storage/src/allocator.rs (1 hunks)
🔇 Additional comments (1)
crates/storage/src/allocator.rs (1)

8-13: LGTM! Thread-safe implementation using thread-local storage.

The implementation correctly uses thread-local storage with Cell for interior mutability, ensuring thread safety for the metering mechanism.

crates/storage/src/allocator.rs Show resolved Hide resolved
crates/storage/src/allocator.rs Show resolved Hide resolved
crates/storage/src/allocator.rs Show resolved Hide resolved
Copy link
Member

@beer-1 beer-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
crates/storage/src/script_cache.rs (1)

22-23: Consider using const values for size conversion

The multiplication by 1024 * 1024 would be more maintainable with named constants.

+ const BYTES_PER_MB: usize = 1024 * 1024;
+
  pub fn new(cache_capacity: usize) -> Arc<InitiaScriptCache> {
      Arc::new(InitiaScriptCache {
          script_cache: Mutex::new(CLruCache::with_config(
-             CLruCacheConfig::new(NonZeroUsize::new(cache_capacity * 1024 * 1024).unwrap())
+             CLruCacheConfig::new(NonZeroUsize::new(cache_capacity * BYTES_PER_MB).unwrap())
                  .with_scale(ScriptScale),
          )),
      })
  }
crates/storage/src/module_cache.rs (2)

60-64: Document cache capacity units

The cache capacity parameter is now interpreted in megabytes (MB) rather than entry count. This significant change should be documented in the method's documentation to prevent confusion.

Add documentation like:

+    /// Creates a new module cache with the specified capacity in megabytes.
     pub fn new(cache_capacity: usize) -> Arc<InitiaModuleCache> {

93-97: Enhance cache capacity warning message

The warning message could be more informative by including the module size and cache capacity.

Consider enhancing the warning message:

-    eprintln!("WARNING: failed to insert module {:?} into cache; cache capacity might be too small",
-             module_id.short_str_lossless().to_string());
+    eprintln!(
+        "WARNING: failed to insert module {:?} (size: {} bytes) into cache (capacity: {} bytes); consider increasing cache capacity",
+        module_id.short_str_lossless(),
+        allocated_size,
+        module_cache.capacity()
+    );

Also applies to: 120-124

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0a30558 and eeb8017.

📒 Files selected for processing (3)
  • crates/storage/src/code_storage.rs (4 hunks)
  • crates/storage/src/module_cache.rs (6 hunks)
  • crates/storage/src/script_cache.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/storage/src/code_storage.rs
🔇 Additional comments (1)
crates/storage/src/script_cache.rs (1)

15-15: LGTM: Type change aligns with weight allocation feature

The change to use ScriptWrapper and ScriptScale in the cache type properly implements the weight-based caching mechanism.

crates/storage/src/script_cache.rs Show resolved Hide resolved
crates/storage/src/module_cache.rs Show resolved Hide resolved
@beer-1 beer-1 merged commit 3b79790 into main Nov 1, 2024
5 of 6 checks passed
@beer-1 beer-1 deleted the feat/cache-weight-allocated-size branch November 1, 2024 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants