Skip to content

unique: large string still referenced, after interning only a small substring #69370

Closed
@Deleplace

Description

@Deleplace

Go version

go version go1.23.1 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/deleplace/Library/Caches/go-build'
GOENV='/Users/deleplace/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/deleplace/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/deleplace/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='on'
GOTELEMETRYDIR='/Users/deleplace/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/p4/g7fnpss96g5708_mmv3cxzgh0000gn/T/go-build3365171445=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

  • Create a large string book (300MB)
  • Intern a string word which is a small substring of book, using unique.Make
  • Run the garbage collector with runtime.GC, after the last use of book, before the last use of word
  • Print the memory currently allocated by the program, using runtime.ReadMemStats

What did you see happen?

Program is still using 300MB

What did you expect to see?

Program should be using less than 10MB

Discussion

This issue has already been fixed by @mknyszek in CL 610738. The change will be included in the future Go 1.24.

The current issue is about documenting the behavior of Go 1.23.0 and Go 1.23.1 when interning strings, and retaining more memory than expected.

The problem arises only with strings, not with other types comparable with ==. String is the only comparable type that supports slicing.

When interning a small substring y of a large string x, 2 strategies are possible in the implementation of unique.Make:

  1. Keep the original memory location of y, within x. x will not be garbage collected, because the object pool in unique is keeping a reference to a part of it. The user is responsible about the memory consequences of interning a substring. The user may decide to call strings.Clone beforehand themself.
  2. Clone y into a fresh string y2, and intern y2. x may then be garbage collected.

unique implements strategy (2).

Before the fix 610738, unique.Make did proactively clone input strings. However, an internal map was still referencing the original input strings, which was unintended.

image

(in this diagram, Bwords is a []unique.Handle[string])

After the fix 610738, unique.Make proactively clones input strings, and then keeps only references to the clones. The large original input strings are then properly garbage collected.

image

Metadata

Metadata

Assignees

Labels

NeedsFixThe path to resolution is known, but the work has not been done.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions