mem: Optimize buffer object re-use #8784

arjan-bal · 2025-12-22T09:57:19Z

Splitting a buffer results in fetching a new buffer object from a sync.Pool. The buffer object is returned back to the pool only once the shared ref count falls to 0. As a result, only one of the buffer objects is returned back to the pool for re-use. The "leaked" buffer objects may cause noticeable allocations when buffers are split more frequently. I noticed this when attempting to remove a buffer copy by replacing the bufio.Reader.

Solution

This PR introduces a root-owner model for the underlying *[]byte within buffer objects. The root object manages the slice's lifecycle, returning it to the pool only when its reference count reaches zero.

When a buffer is split, the new buffer is treated as a child, incrementing the ref counts for both itself and the root. Once a child’s ref count hits zero, it returns itself to the pool and decrements the root’s count.

Additionally, this PR removes the sync.Pool used for *atomic.Int32 by embedding atomic.Int32 as a value field within the buffer struct. By eliminating the second pool and the associated pointer indirection, we reduce allocation overhead and improve cache locality during buffer lifecycle events.

Benchmarks

A micro-benchmark showing the buffer object leak:

func BenchmarkSplit(b *testing.B) {
	pool := mem.DefaultBufferPool()
	b.Run("split", func(b *testing.B) {
		for b.Loop() {
			size := 1 << 15 // 32 KB
			slice := pool.Get(size)
			buf := mem.NewBuffer(slice, pool)
			left, right := mem.SplitUnsafe(buf, size/2)
			left.Free()
			right.Free()
		}
	})
	b.Run("no-split", func(b *testing.B) {
		for b.Loop() {
			size := 1 << 15 // 32 KB
			slice := pool.Get(size)
			buf := mem.NewBuffer(slice, pool)
			buf.Free()
		}
	})
}

Result on master vs this PR.

goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
                  │   old.txt   │               new.txt               │
                  │   sec/op    │   sec/op     vs base                │
Split/split-48      418.2n ± 0%   263.9n ± 1%  -36.89% (p=0.000 n=10)
Split/no-split-48   221.1n ± 1%   208.5n ± 0%   -5.70% (p=0.000 n=10)
geomean             304.1n        234.6n       -22.86%

                  │   old.txt    │                 new.txt                 │
                  │     B/op     │    B/op     vs base                     │
Split/split-48      64.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=10)
Split/no-split-48   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
geomean                        ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                  │   old.txt    │                 new.txt                 │
                  │  allocs/op   │ allocs/op   vs base                     │
Split/split-48      1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Split/no-split-48   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
geomean                        ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

The effect on local gRPC benchmarks is negligible since the SplitUnsafe function isn't called very frequently.

$ go run benchmark/benchresult/main.go unary-before unary-after       
unary-networkMode_Local-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurr
entCalls_120-reqSize_16000B-respSize_16000B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-c
lientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBuff
erPool_simple-sharedWriteBuffer_false
               Title       Before        After Percentage
            TotalOps      2985694      3024364     1.30%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op     74784.94     74784.99     0.00%
           Allocs/op       133.67       133.89     0.00%
             ReqT/op 6369480533.33 6451976533.33     1.30%
            RespT/op 6369480533.33 6451976533.33     1.30%
            50th-Lat   2.410033ms    2.40116ms    -0.37%
            90th-Lat   3.145118ms   3.081771ms    -2.01%
            99th-Lat   3.563055ms   3.629663ms     1.87%
             Avg-Lat   2.410529ms   2.379513ms    -1.29%
           GoVersion     go1.24.8     go1.24.8
         GrpcVersion   1.78.0-dev   1.78.0-dev

RELEASE NOTES:

mem: Improve pooling of buffer objects on using SplitUnsafe.

codecov · 2025-12-22T10:00:26Z

Codecov Report

❌ Patch coverage is 86.36364% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.27%. Comparing base (81a00ce) to head (fbbf808).
⚠️ Report is 22 commits behind head on master.

Files with missing lines	Patch %	Lines
mem/buffers.go	86.36%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8784      +/-   ##
==========================================
+ Coverage   83.22%   83.27%   +0.05%     
==========================================
  Files         418      418              
  Lines       32385    33005     +620     
==========================================
+ Hits        26952    27485     +533     
- Misses       4050     4106      +56     
- Partials     1383     1414      +31

Files with missing lines	Coverage Δ
mem/buffers.go	`87.00% <86.36%> (+0.13%)`	⬆️

... and 46 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mem/buffers.go

easwars · 2026-01-06T07:22:48Z

mem/buffers.go

+	// initialized enables sanity checks without the overhead of atomic
+	// operations. This field is not safe for concurrent access and is used in a
+	// best-effort manner for assertion purposes only. It does not play a role
+	// in the concurrent logic of reference counting.


Couple of things here:

The Buffer interface does that a buffer is not safe for concurrent access. Given that, do we need this to be mentioned here?

Do you have an idea of how much overhead the atomic operation of checking if the ref count is zero causes? The reason I'm asking is because this new field (and the checks associated with it) are sprinkled across multiple methods and I'm wondering if the code complexity (and the maintenance costs) are worth it?

I'm also a little confused about this line from the docstring:

// Note that a Buffer is not safe for concurrent access and instead each // goroutine should use its own reference to the data, which can be acquired via // a call to Ref().

A call to Ref simply increments the reference count. It does not return a new reference to the existing buffer that can be used concurrently. Do we ever use buffers concurrently?

Also, why did we earlier have a pointer to an atomic and not store the atomic by value?

The Buffer interface documentation states that a buffer is not safe for concurrent access. Given that, do we need to explicitly mention this here?

A call to Ref simply increments the reference count. It does not return a new reference to the existing buffer that can be used concurrently. Do we ever use buffers concurrently?

In the initial design, buf.Ref() likely returned a new object intended to be transferred to a separate goroutine:

ref := buf.Ref() go func() { // use ref here }() buf.Free()

However, in the merged implementation, Ref does not return a new object. So, the usage pattern becomes:

buf.Ref() go func() { // use buf here }() buf.Free()

Technically, this implies buf is being accessed concurrently. However, the specific pattern that is unsafe is attempting to reference buf in a new goroutine without incrementing the count first:

go func() { // Unsafe: Race condition with buf.Free() below ref := buf.Ref() }() buf.Free()

Source: #8209 (comment)

Yes, we do follow the safe pattern above by pushing data frame buffers into an unbounded channel to be consumed by another goroutine.

Do you have an idea of how much overhead the atomic operation of checking if the ref count is zero causes? The reason I'm asking is because this new field (and the checks associated with it) are sprinkled across multiple methods and I'm wondering if the code complexity (and the maintenance costs) are worth it?

Earlier there was a check if b.refs == nil, which is not possible using a non-pointer field. Using initialized provides the test coverage.

There are some methods such are Ref and Free which perform atomic operations anyways, so we can check the return value for validation. However, for method like ReadData that don't perform atomic operations, the overhead is significant. According to Gemini, an atomic operation is roughly 10x-15x slower than a similar non-atomic operation under low contention and the difference becomes orders of magnitude larger under high contention.

Also, why did we earlier have a pointer to an atomic and not store the atomic by value?

Previously, the new buffer created by SplitUnsafe pointed to the same atomic.Int32 as the original buffer, which required the field to be a pointer. Now, the new object maintains its own ref count and stores a pointer to the original buffer instead. Therefore, the reference count (atomic.Uint32) no longer needs to be a pointer.

Thank you for the information. That helps.

I would still like to see if there is actually any significant performance improvement by having the initialized field. The if b.refs == nil check could also be replaced with a if b.refs.Load() == 0 if there is no significant performance impact.

easwars

The only thing I need convincing is about the use of the initialized field, as opposed to directly checking the reference count by doing an atomic read of the value. Otherwise LGTM.

easwars · 2026-01-07T00:58:28Z

mem/buffers.go

+	// initialized enables sanity checks without the overhead of atomic
+	// operations. This field is not safe for concurrent access and is used in a
+	// best-effort manner for assertion purposes only. It does not play a role
+	// in the concurrent logic of reference counting.


Thank you for the information. That helps.

I would still like to see if there is actually any significant performance improvement by having the initialized field. The if b.refs == nil check could also be replaced with a if b.refs.Load() == 0 if there is no significant performance impact.

arjan-bal added this to the 1.79 Release milestone Dec 22, 2025

arjan-bal added Type: Performance Performance improvements (CPU, network, memory, etc) Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. labels Dec 22, 2025

Recycle buffer objects better

e1a28ac

arjan-bal force-pushed the better-recycle-buffer-objects branch from 45c3231 to e1a28ac Compare December 22, 2025 10:11

arjan-bal requested review from dfawley and easwars December 22, 2025 10:38

arjan-bal assigned easwars, dfawley and arjan-bal and unassigned easwars and dfawley Dec 22, 2025

arjan-bal force-pushed the better-recycle-buffer-objects branch 2 times, most recently from 7619ed9 to c33b2ae Compare December 23, 2025 09:42

optimize for no-split case

3331987

arjan-bal force-pushed the better-recycle-buffer-objects branch from c33b2ae to 3331987 Compare December 23, 2025 09:56

remove ref sync pool

f3bb58e

arjan-bal changed the title ~~mem: Correctly recycle buffer objects after SplitUnsafe~~ mem: Optimize buffer re-use Dec 23, 2025

arjan-bal changed the title ~~mem: Optimize buffer re-use~~ mem: Optimize buffer object re-use Dec 23, 2025

arjan-bal assigned easwars and dfawley and unassigned arjan-bal Dec 26, 2025

better comment from ai review

8d4cdf8

easwars reviewed Jan 6, 2026

View reviewed changes

easwars assigned arjan-bal and unassigned easwars and dfawley Jan 6, 2026

Change switch to if/else

fbbf808

arjan-bal assigned easwars Jan 6, 2026

arjan-bal removed their assignment Jan 6, 2026

easwars reviewed Jan 7, 2026

View reviewed changes

easwars assigned arjan-bal and unassigned easwars Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mem: Optimize buffer object re-use #8784

mem: Optimize buffer object re-use #8784

Uh oh!

arjan-bal commented Dec 22, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

easwars Jan 6, 2026

Uh oh!

easwars Jan 6, 2026

Uh oh!

arjan-bal Jan 6, 2026

Uh oh!

arjan-bal Jan 6, 2026

Uh oh!

arjan-bal Jan 6, 2026

Uh oh!

easwars Jan 7, 2026

Uh oh!

easwars left a comment

Uh oh!

easwars Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mem: Optimize buffer object re-use #8784

Are you sure you want to change the base?

mem: Optimize buffer object re-use #8784

Uh oh!

Conversation

arjan-bal commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Solution

Benchmarks

Uh oh!

codecov bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

easwars Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

easwars Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

arjan-bal Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

arjan-bal Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

arjan-bal Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

easwars Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

easwars left a comment

Choose a reason for hiding this comment

Uh oh!

easwars Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arjan-bal commented Dec 22, 2025 •

edited

Loading

codecov bot commented Dec 22, 2025 •

edited

Loading