Skip to content

Profile Guided Optimization results in high memory usage #6991

Closed
@atollena

Description

@atollena

We recently started using profile guided optimizations (pgo) for our Go gRPC services, and in some cases saw a significant increase in memory usage from optimized binaries.

The details of the investigation can be found in golang/go#65532. To summarize, pgo may inlines internal/transport.(*loopyWriter).processData, which is called 3 times in internal/transport.(*loopyWriter).run. This is the goroutine that schedules writes of HTTP2 frames on TCP connections. processData allocates a 16KiB array on the stack to construct a frame, so inlining it in loopyWriter.run results in a total of fix 48KiB memory allocated per connection, instead of 16KiB (that may even be released if loopy is blocked). When there are many connnections, this can be a lot of memory. One of our production services saw a 20% memory increase after building with PGO due to this issue.

There are options to still use PGO (which provides otherwise interesting gains) while avoiding this undesirable side effect, but they require changes to grpc-go:

  1. As suggested in the issue, simply add a go:noinline pragma to loopyWriter.processData to avoid any memory increase.
  2. Move allocation of the local frame byte array directly inside loopyWriter.run. The downside is that when the connection is idle, the 16KiB cannot be reclaimed.
  3. A variation of option 2 where array allocation happens in a subroutine of loopyWriter.run, so that when loopy blocks because the connection is idle, the array is not allocated.

From those option 1 and 3 seem the most compelling to me. Would you be willing to accept a patch for this?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions