Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove slabs from the slab memory pool #30732

Merged
merged 1 commit into from
Mar 9, 2021
Merged

Remove slabs from the slab memory pool #30732

merged 1 commit into from
Mar 9, 2021

Conversation

davidfowl
Copy link
Member

@davidfowl davidfowl commented Mar 8, 2021

  • Slabs add an extra complication when trying to free blocks (blocks are attached to slabs and would need to be ref counted). Instead, since we can now directly allocate on the POH, we can see if slabs are still meaningful.
  • Use 4K arrays on the pinned heap
  • Remove the finalizer from MemoryPoolBlock
  • I didn't rename it

Investigating #27394

Looks slightly faster (allocation should be faster in general). I ran the plaintext benchmark.

Before:

| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 41         |
| Cores usage (%)        | 494        |
| Working Set (MB)       | 38         |
| Start Time (ms)        | 1          |
| First Request (ms)     | 96         |
| Requests/sec           | 2,293,809  |
| Requests               | 34,634,506 |
| Mean latency (ms)      | 1.25       |
| Max latency (ms)       | 53.75      |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 288.76     |
| Latency 50th (ms)      | 0.98       |
| Latency 75th (ms)      | 1.43       |
| Latency 90th (ms)      | 1.84       |
| Latency 99th (ms)      | 14.78      |

After

| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 41         |
| Cores usage (%)        | 494        |
| Working Set (MB)       | 38         |
| Start Time (ms)        | 0          |
| First Request (ms)     | 105        |
| Requests/sec           | 2,343,985  |
| Requests               | 35,392,833 |
| Mean latency (ms)      | 1.19       |
| Max latency (ms)       | 49.37      |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 295.07     |
| Latency 50th (ms)      | 0.96       |
| Latency 75th (ms)      | 1.41       |
| Latency 90th (ms)      | 1.79       |
| Latency 99th (ms)      | 16.58      |

The allocation rate is also lower (for more requests) in this run:

Before

| application                   |                           |
| ----------------------------- | ------------------------- |
| CPU Usage (%)                 | 100                       |
| Cores usage (%)               | 1,197                     |
| Working Set (MB)              | 70                        |
| Build Time (ms)               | 3,801                     |
| Start Time (ms)               | 175                       |
| Published Size (KB)           | 103,420                   |
| .NET Core SDK Version         | 6.0.100-preview.3.21158.6 |
| Max CPU Usage (%)             | 99                        |
| Max Working Set (MB)          | 73                        |
| Max GC Heap Size (MB)         | 10                        |
| Max Number of Gen 0 GCs / sec | 0.00                      |
| Max Number of Gen 1 GCs / sec | 0.00                      |
| Max Number of Gen 2 GCs / sec | 0.00                      |
| Max Time in GC (%)            | 0.00                      |
| Max Gen 0 Size (B)            | 0                         |
| Max Gen 1 Size (B)            | 0                         |
| Max Gen 2 Size (B)            | 0                         |
| Max LOH Size (B)              | 0                         |
| Max Allocation Rate (B/sec)   | 4,314,080                 |
| Max GC Heap Fragmentation     | 0                         |
| # of Assemblies Loaded        | 89                        |
| Max Exceptions (#/s)          | 852                       |
| Max Lock Contention (#/s)     | 337                       |
| Max ThreadPool Threads Count  | 28                        |
| Max ThreadPool Queue Length   | 220                       |
| Max ThreadPool Items (#/s)    | 158,591                   |
| Max Active Timers             | 0                         |
| IL Jitted (B)                 | 167,569                   |
| Methods Jitted                | 1,911                     |

After

| application                   |                           |
| ----------------------------- | ------------------------- |
| CPU Usage (%)                 | 100                       |
| Cores usage (%)               | 1,198                     |
| Working Set (MB)              | 70                        |
| Build Time (ms)               | 3,779                     |
| Start Time (ms)               | 221                       |
| Published Size (KB)           | 103,420                   |
| .NET Core SDK Version         | 6.0.100-preview.3.21158.6 |
| Max CPU Usage (%)             | 99                        |
| Max Working Set (MB)          | 72                        |
| Max GC Heap Size (MB)         | 10                        |
| Max Number of Gen 0 GCs / sec | 0.00                      |
| Max Number of Gen 1 GCs / sec | 0.00                      |
| Max Number of Gen 2 GCs / sec | 0.00                      |
| Max Time in GC (%)            | 0.00                      |
| Max Gen 0 Size (B)            | 0                         |
| Max Gen 1 Size (B)            | 0                         |
| Max Gen 2 Size (B)            | 0                         |
| Max LOH Size (B)              | 0                         |
| Max Allocation Rate (B/sec)   | 4,185,456                 |
| Max GC Heap Fragmentation     | 0                         |
| # of Assemblies Loaded        | 89                        |
| Max Exceptions (#/s)          | 896                       |
| Max Lock Contention (#/s)     | 359                       |
| Max ThreadPool Threads Count  | 32                        |
| Max ThreadPool Queue Length   | 178                       |
| Max ThreadPool Items (#/s)    | 161,368                   |
| Max Active Timers             | 0                         |
| IL Jitted (B)                 | 173,714                   |
| Methods Jitted                | 2,009                     |

- Use 4K arrays on the pinned heap
- I didn't rename it

Memory = MemoryMarshal.CreateFromPinnedArray(slab.PinnedArray, _offset, _length);
var pinnedArray = GC.AllocateUninitializedArray<byte>(length, pinned: true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.microsoft.com/en-us/dotnet/api/system.gc.allocateuninitializedarray?view=net-5.0 as an FYI to others reviewing.

We didn't zero-allocate the array before, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory = MemoryMarshal.CreateFromPinnedArray(slab.PinnedArray, _offset, _length);
var pinnedArray = GC.AllocateUninitializedArray<byte>(length, pinned: true);

Memory = MemoryMarshal.CreateFromPinnedArray(pinnedArray, 0, pinnedArray.Length);
}

/// <summary>
/// Back-reference to the memory pool which this block was allocated from. It may only be returned to this pool.
/// </summary>
public SlabMemoryPool Pool { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove? Only spot I see this used is when BLOCK_LEASE_TRACKING is defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its used in dispose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a good change to me!

@Maoni0
Copy link
Member

Maoni0 commented Mar 8, 2021

CC @cshung who's doing some perf work on POH

@davidfowl davidfowl marked this pull request as ready for review March 9, 2021 00:45
@davidfowl davidfowl merged commit 2e1063e into main Mar 9, 2021
@davidfowl davidfowl deleted the davidfowl/no-slabs branch March 9, 2021 08:22
/// This object cannot be instantiated outside of the static Create method
/// </summary>
internal MemoryPoolBlock(SlabMemoryPool pool, MemoryPoolSlab slab, int offset, int length)
internal MemoryPoolBlock(SlabMemoryPool pool, int length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SlabMemoryPool -> SlablessMemoryPool

😋

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For real though, name change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll rename it to PinnedBlockPool or something like that

@davidfowl davidfowl added this to the 6.0-preview3 milestone Mar 17, 2021
@JamesNK JamesNK mentioned this pull request Apr 8, 2021
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants