-
-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Enhance Memory Management with Lock-Free Allocator, Preallocation, and Optimized Thread-Local Caching #2825
Open
beats-dh
wants to merge
15
commits into
main
Choose a base branch
from
lockerfree
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This comment was marked as outdated.
This comment was marked as outdated.
dudantas
reviewed
Sep 18, 2024
This PR is stale because it has been open 45 days with no activity. |
Quality Gate passedIssues Measures |
This PR is stale because it has been open 45 days with no activity. |
Quality Gate passedIssues Measures |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Detailed Description for PR:
1. Introduction of Static Preallocation
preallocate Method: Added functionality to preallocate a fixed number of memory blocks (STATIC_PREALLOCATION_SIZE = 500) during the initialization of the LockfreePoolingAllocator. This minimizes runtime dynamic allocations, boosting overall system performance.
Thread-Safe Initialization: Ensured preallocation occurs only once using std::call_once combined with std::once_flag, guaranteeing thread-safe initialization.
2. Optimization of Thread-Local Cache
Efficient Caching Mechanism:
Introduced thread_local caches for each thread, significantly reducing contention for shared resources.
Configured the thread-local cache size to optimize memory usage while maintaining high throughput, with a default batch size of 128.
Prefetching for Performance: Implemented memory prefetching (PREFETCH) within caching logic to improve cache line utilization and reduce latency.
3. Enhancements to Allocation Process
Streaming Allocation Logic:
Allocations prioritize the thread-local cache for speed.
If the local cache is empty, memory is fetched in batches from the lock-free shared list. As a fallback, dynamic allocation ensures availability.
Dynamic Growth: The try_grow method dynamically expands capacity when approaching allocation limits, ensuring scalability.
4. Improvements to Deallocation
Balanced Deallocation Strategy:
Memory is first returned to the thread-local cache.
If the cache is full, excess memory is flushed back to the lock-free shared list, maintaining a balance between local and global resources.
False Sharing Prevention: Cache-line alignment and proper struct padding minimize false sharing, further optimizing deallocation.
5. Integration with Custom Memory Management
Polymorphic Allocators Support: Enabled integration with std::pmr::memory_resource for flexible custom memory management. This allows seamless use of LockfreeFreeList in modern memory resource-based systems.
Allocator Design: A custom LockfreePoolingAllocator was introduced to replace standard allocation mechanisms, offering finer control over memory operations.
Key Benefits of the New Implementation
1. Enhanced Performance in Multithreaded Environments
The lock-free design significantly reduces contention between threads. Thread-local caching ensures low-latency memory allocation and deallocation, critical for high-performance applications.
2. Precise Memory Management
The separation of allocation and deallocation logic between thread-local and global resources allows granular control over memory reuse, reducing fragmentation and improving predictability.
3. Dynamic Adaptability
The implementation scales dynamically with thread count and workload. Adjustments to preallocation size, batch size, and growth behavior ensure the system adapts to varying demands efficiently.
4. Flexibility for Future Extensions
This design provides a robust foundation for further enhancements. Future optimizations (e.g., adaptive batch sizes, priority-based allocation) can be easily incorporated without disrupting the core architecture.
Rationale for Replacing std::make_shared
1. Finer Control Over Memory
Unlike std::make_shared, which combines object and reference counter allocation, the new system separates and optimizes these processes. This is crucial for scenarios demanding thread-local caching and preallocation.
2. Thread-Specific Optimization
The use of thread-local caches ensures minimal contention and faster memory reuse, which std::make_shared cannot accommodate.
3. Improved Scalability
The lock-free shared list and dynamic growth capabilities enable efficient scaling in high-concurrency environments, outperforming the general-purpose design of std::make_shared.
4. Reduced Overhead
Granular memory management reduces memory fragmentation and overhead, offering predictable performance even under high load.
5. Customizability
The system supports advanced features such as prefetching, cache-line alignment, and integration with polymorphic memory resources, none of which are possible with std::make_shared.
In summary, this implementation introduces a high-performance, scalable memory management solution tailored for multithreaded environments. It replaces std::make_shared to offer greater flexibility, precision, and efficiency in memory allocation and deallocation.