Skip to content

Conversation

arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Oct 2, 2025

The pprof profiles for unary RPC benchmarks indicate significant time spent in runtime.mallocgc and runtime.gcBgMarkWorker. This indicates gRPC is spending significant CPU cycles allocating or garbage collecting.

This change reduces the number of pointer fields in the structs that represent client and server stream. This will reduce number of memory allocations (faster) and also reduce pressure on garbage collector (faster garbage collections) since the GC doesn't need to scan non-pointer fields. For structs which were stored as pointers to ensure values are not copied, a noCopy struct is embedded that will cause go vet to fail if copies are performed. Non-pointer fields are also moved to the end of the struct to improve allocation speed.

Results

There are improvements in QPS, latency and allocs/op for unary RPCs.

# test command
go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \
   -compression=off -maxConcurrentCalls=500 -trace=off \
   -reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}"   -recvBufferPool=simple

go run benchmark/benchresult/main.go unary-before unary-after       
               Title       Before        After Percentage
            TotalOps      7690250      7991877     3.92%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op     10218.14     10084.00    -1.31%
           Allocs/op       164.85       151.85    -7.89%
             ReqT/op 102536666.67 106558360.00     3.92%
            RespT/op 102536666.67 106558360.00     3.92%
            50th-Lat    3.57283ms   3.435143ms    -3.85%
            90th-Lat   5.152403ms   4.979906ms    -3.35%
            99th-Lat   5.985282ms   5.827893ms    -2.63%
             Avg-Lat    3.89872ms   3.750449ms    -3.80%
           GoVersion     go1.24.4     go1.24.4
         GrpcVersion   1.77.0-dev   1.77.0-dev

Resources

  • go/go/performance?polyglot=open-source#application-spends-too-much-on-gc-or-allocations
  • go/go/performance?polyglot=open-source#memory-optimizations

RELEASE NOTES:

  • transport: Reduce pointer usage to lower garbage collection pressure and improve unary RPC performance.

@arjan-bal arjan-bal added the Type: Performance Performance improvements (CPU, network, memory, etc) label Oct 2, 2025
@arjan-bal arjan-bal added this to the 1.77 Release milestone Oct 2, 2025
@arjan-bal arjan-bal changed the title transport: Reduce the use of pointer fields in Stream structs transport: Reduce pointer usage in Stream structs Oct 2, 2025
Copy link

codecov bot commented Oct 2, 2025

Codecov Report

❌ Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.15%. Comparing base (d0ebcdf) to head (0a52b64).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
internal/transport/transport.go 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8624      +/-   ##
==========================================
+ Coverage   82.12%   82.15%   +0.03%     
==========================================
  Files         415      415              
  Lines       40701    40711      +10     
==========================================
+ Hits        33425    33447      +22     
+ Misses       5895     5884      -11     
+ Partials     1381     1380       -1     
Files with missing lines Coverage Δ
internal/transport/client_stream.go 100.00% <ø> (ø)
internal/transport/flowcontrol.go 96.46% <100.00%> (-0.04%) ⬇️
internal/transport/handler_server.go 90.84% <100.00%> (+0.03%) ⬆️
internal/transport/http2_client.go 92.15% <100.00%> (-0.08%) ⬇️
internal/transport/http2_server.go 90.59% <100.00%> (-0.28%) ⬇️
internal/transport/server_stream.go 100.00% <ø> (+4.68%) ⬆️
internal/transport/transport.go 90.80% <50.00%> (+6.06%) ⬆️

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arjan-bal arjan-bal force-pushed the optimize-heap-allocs branch from a44546b to 4ebd663 Compare October 2, 2025 18:26
@arjan-bal arjan-bal force-pushed the optimize-heap-allocs branch from 4ebd663 to 42b1067 Compare October 2, 2025 18:39
@arjan-bal arjan-bal requested review from easwars and dfawley October 2, 2025 18:48
@arjan-bal arjan-bal added the Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. label Oct 2, 2025
func newWriteQuota(sz int32, done <-chan struct{}) *writeQuota {
w := &writeQuota{
func initWriteQuota(wq *writeQuota, sz int32, done <-chan struct{}) {
*wq = writeQuota{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax does feel a little weird to me. Did you try some of these options to see if they read better (and don't perform worse)?

  • directly set fields of wq in here instead of setting wq to a completely new instance of writeQuota
  • Can this initWriteQuota be a method on Stream?
  • Can we make the zero value of writeQouta something that can actually work?

This comment applies to other types as well like recvBuffer.

Thanks.

@easwars easwars assigned arjan-bal and unassigned easwars Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants