perf: Enhance Memory Management with Lock-Free Allocator, Preallocation, and Optimized Thread-Local Caching #2825

beats-dh · 2024-08-17T05:14:15Z

Detailed Description for PR:

1. Introduction of Static Preallocation

preallocate Method: Added functionality to preallocate a fixed number of memory blocks (STATIC_PREALLOCATION_SIZE = 500) during the initialization of the LockfreePoolingAllocator. This minimizes runtime dynamic allocations, boosting overall system performance.

Thread-Safe Initialization: Ensured preallocation occurs only once using std::call_once combined with std::once_flag, guaranteeing thread-safe initialization.

2. Optimization of Thread-Local Cache

Efficient Caching Mechanism:

Introduced thread_local caches for each thread, significantly reducing contention for shared resources.

Configured the thread-local cache size to optimize memory usage while maintaining high throughput, with a default batch size of 128.

Prefetching for Performance: Implemented memory prefetching (PREFETCH) within caching logic to improve cache line utilization and reduce latency.

3. Enhancements to Allocation Process

Streaming Allocation Logic:

Allocations prioritize the thread-local cache for speed.

If the local cache is empty, memory is fetched in batches from the lock-free shared list. As a fallback, dynamic allocation ensures availability.

Dynamic Growth: The try_grow method dynamically expands capacity when approaching allocation limits, ensuring scalability.

4. Improvements to Deallocation

Balanced Deallocation Strategy:

Memory is first returned to the thread-local cache.

If the cache is full, excess memory is flushed back to the lock-free shared list, maintaining a balance between local and global resources.

False Sharing Prevention: Cache-line alignment and proper struct padding minimize false sharing, further optimizing deallocation.

5. Integration with Custom Memory Management

Polymorphic Allocators Support: Enabled integration with std::pmr::memory_resource for flexible custom memory management. This allows seamless use of LockfreeFreeList in modern memory resource-based systems.

Allocator Design: A custom LockfreePoolingAllocator was introduced to replace standard allocation mechanisms, offering finer control over memory operations.

Key Benefits of the New Implementation

1. Enhanced Performance in Multithreaded Environments

The lock-free design significantly reduces contention between threads. Thread-local caching ensures low-latency memory allocation and deallocation, critical for high-performance applications.

2. Precise Memory Management

The separation of allocation and deallocation logic between thread-local and global resources allows granular control over memory reuse, reducing fragmentation and improving predictability.

3. Dynamic Adaptability

The implementation scales dynamically with thread count and workload. Adjustments to preallocation size, batch size, and growth behavior ensure the system adapts to varying demands efficiently.

4. Flexibility for Future Extensions

This design provides a robust foundation for further enhancements. Future optimizations (e.g., adaptive batch sizes, priority-based allocation) can be easily incorporated without disrupting the core architecture.

Rationale for Replacing std::make_shared

1. Finer Control Over Memory

Unlike std::make_shared, which combines object and reference counter allocation, the new system separates and optimizes these processes. This is crucial for scenarios demanding thread-local caching and preallocation.

2. Thread-Specific Optimization

The use of thread-local caches ensures minimal contention and faster memory reuse, which std::make_shared cannot accommodate.

3. Improved Scalability

The lock-free shared list and dynamic growth capabilities enable efficient scaling in high-concurrency environments, outperforming the general-purpose design of std::make_shared.

4. Reduced Overhead

Granular memory management reduces memory fragmentation and overhead, offering predictable performance even under high load.

5. Customizability

The system supports advanced features such as prefetching, cache-line alignment, and integration with polymorphic memory resources, none of which are possible with std::make_shared.

In summary, this implementation introduces a high-performance, scalable memory management solution tailored for multithreaded environments. It replaces std::make_shared to offer greater flexibility, precision, and efficiency in memory allocation and deallocation.

src/server/network/message/outputmessage.cpp

github-actions · 2024-10-19T02:32:48Z

This PR is stale because it has been open 45 days with no activity.

sonarqubecloud · 2024-10-30T04:58:20Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

github-actions · 2024-11-30T02:39:23Z

This PR is stale because it has been open 45 days with no activity.

sonarqubecloud · 2025-01-01T19:41:02Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

This comment was marked as outdated.

Sign in to view

jhogberg mentioned this pull request Sep 12, 2024

feat: livestream (cast) system #2653

Open

3 tasks

dudantas reviewed Sep 18, 2024

View reviewed changes

src/server/network/message/outputmessage.cpp Outdated Show resolved Hide resolved

github-actions bot added Stale No activity and removed Stale No activity labels Oct 19, 2024

beats-dh force-pushed the lockerfree branch from 93b526d to a4637a9 Compare October 26, 2024 15:16

beats-dh and others added 4 commits October 29, 2024 19:38

init

ca96030

fix: condition if-statement

ecc3d80

up

7fe9977

test

a51681a

beats-dh force-pushed the lockerfree branch from a4637a9 to a51681a Compare October 29, 2024 23:39

fix

bb8fff2

beats-dh force-pushed the lockerfree branch from 9120cdf to bb8fff2 Compare October 30, 2024 04:53

github-actions bot added the Stale No activity label Nov 30, 2024

beats-dh and others added 3 commits December 5, 2024 21:25

Merge branch 'main' into lockerfree

7073134

update

f1e02d3

Code format - (Clang-format)

e61fcf6

github-actions bot removed the Stale No activity label Dec 7, 2024

majestyotbr and others added 4 commits December 29, 2024 16:13

Merge branch 'main' into lockerfree

8215716

Merge branch 'main' into lockerfree

211a288

up

7ef9013

Update lockfree.hpp

1cee517

beats-dh force-pushed the lockerfree branch from a6704f7 to 1cee517 Compare January 1, 2025 18:37

up

716fad4

beats-dh force-pushed the lockerfree branch from a873125 to 716fad4 Compare January 1, 2025 19:17

beats-dh added 2 commits January 1, 2025 15:37

Update outputmessage.cpp

433f9a7

Merge branch 'main' into lockerfree

ef59d64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Enhance Memory Management with Lock-Free Allocator, Preallocation, and Optimized Thread-Local Caching #2825

perf: Enhance Memory Management with Lock-Free Allocator, Preallocation, and Optimized Thread-Local Caching #2825

beats-dh commented Aug 17, 2024 •

edited

Loading

This comment was marked as outdated.

github-actions bot commented Oct 19, 2024

sonarqubecloud bot commented Oct 30, 2024

github-actions bot commented Nov 30, 2024

sonarqubecloud bot commented Jan 1, 2025

perf: Enhance Memory Management with Lock-Free Allocator, Preallocation, and Optimized Thread-Local Caching #2825

Are you sure you want to change the base?

perf: Enhance Memory Management with Lock-Free Allocator, Preallocation, and Optimized Thread-Local Caching #2825

Conversation

beats-dh commented Aug 17, 2024 • edited Loading

Detailed Description for PR:

1. Introduction of Static Preallocation

2. Optimization of Thread-Local Cache

Efficient Caching Mechanism:

3. Enhancements to Allocation Process

Streaming Allocation Logic:

4. Improvements to Deallocation

Balanced Deallocation Strategy:

5. Integration with Custom Memory Management

Key Benefits of the New Implementation

1. Enhanced Performance in Multithreaded Environments

2. Precise Memory Management

3. Dynamic Adaptability

4. Flexibility for Future Extensions

Rationale for Replacing std::make_shared

1. Finer Control Over Memory

2. Thread-Specific Optimization

3. Improved Scalability

4. Reduced Overhead

5. Customizability

This comment was marked as outdated.

github-actions bot commented Oct 19, 2024

sonarqubecloud bot commented Oct 30, 2024

Quality Gate passed

github-actions bot commented Nov 30, 2024

sonarqubecloud bot commented Jan 1, 2025

Quality Gate passed

beats-dh commented Aug 17, 2024 •

edited

Loading