Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: Pool example suggests incorrect usage #23199

Open
dsnet opened this issue Dec 20, 2017 · 36 comments
Open

sync: Pool example suggests incorrect usage #23199

dsnet opened this issue Dec 20, 2017 · 36 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Documentation Issues describing a change to documentation. help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Dec 20, 2017

The operation of sync.Pool assumes that the memory cost of each element is approximately the same in order to be efficient. This property can be seen by the fact that Pool.Get returns you a random element, and not the one that has "the greatest capacity" or what not. In other words, from the perspective of the Pool, all elements are more or less the same.

However, the Pool example stores bytes.Buffer objects, which have an underlying []byte of varying capacity depending on how much of the buffer is actually used.

Dynamically growing an unbounded buffers can cause a large amount of memory to be pinned and never be freed in a live-lock situation. Consider the following:

pool := sync.Pool{New: func() interface{} { return new(bytes.Buffer) }}

processRequest := func(size int) {
	b := pool.Get().(*bytes.Buffer)
	time.Sleep(500 * time.Millisecond) // Simulate processing time
	b.Grow(size)
	pool.Put(b)
	time.Sleep(1 * time.Millisecond) // Simulate idle time
}

// Simulate a set of initial large writes.
for i := 0; i < 10; i++ {
	go func() {
		processRequest(1 << 28) // 256MiB
	}()
}

time.Sleep(time.Second) // Let the initial set finish

// Simulate an un-ending series of small writes.
for i := 0; i < 10; i++ {
	go func() {
		for {
			processRequest(1 << 10) // 1KiB
		}
	}()
}

// Continually run a GC and track the allocated bytes.
var stats runtime.MemStats
for i := 0; ; i++ {
	runtime.ReadMemStats(&stats)
	fmt.Printf("Cycle %d: %dB\n", i, stats.Alloc)
	time.Sleep(time.Second)
	runtime.GC()
}

Depending on timing, the above snippet takes around 35 GC cycles for the initial set of large requests (2.5GiB) to finally be freed, even though each of the subsequent writes only use around 1KiB. This can happen in a real server handling lots of small requests, where large buffers allocated by some prior request end up being pinned for a long time since they are not in Pool long enough to be collected.

The example claims to be based on fmt usage, but I'm not convinced that fmt's usage is correct. It is susceptible to the live-lock problem described above. I suspect this hasn't been an issue in most real programs since fmt.PrintX is typically not used to write very large strings. However, other applications of sync.Pool may certainly have this issue.

I suggest we fix the example to store elements of fixed size and document this.

\cc @kevinburke @LMMilewski @bcmills

@dsnet
Copy link
Member Author

dsnet commented Dec 20, 2017

I should also note that if #22950 is done, then usages like this will cause large buffers to be pinned forever since this example has a steady state of Pool usage, so the GC would never clear the pool.

@dsnet
Copy link
Member Author

dsnet commented Dec 20, 2017

Here's an even worse situation than earlier (suggested by @bcmills):

pool := sync.Pool{New: func() interface{} { return new(bytes.Buffer) }}

processRequest := func(size int) {
	b := pool.Get().(*bytes.Buffer)
	time.Sleep(500 * time.Millisecond) // Simulate processing time
	b.Grow(size)
	pool.Put(b)
	time.Sleep(1 * time.Millisecond) // Simulate idle time
}

// Simulate a steady stream of infrequent large requests.
go func() {
	for {
		processRequest(1 << 28) // 256MiB
	}
}()

// Simulate a storm of small requests.
for i := 0; i < 1000; i++ {
	go func() {
		for {
			processRequest(1 << 10) // 1KiB
		}
	}()
}

// Continually run a GC and track the allocated bytes.
var stats runtime.MemStats
for i := 0; ; i++ {
	runtime.ReadMemStats(&stats)
	fmt.Printf("Cycle %d: %dB\n", i, stats.Alloc)
	time.Sleep(time.Second)
	runtime.GC()
}

Rather than a single one-off large request, let there be a steady stream of occasional large requests intermixed with a large number of small requests. As this snippet runs, the heap keeps growing over time. The large request is "poisoning" the pool such that most of the small requests eventually pin a large capacity buffer under the hood.

@dsnet dsnet added the Documentation Issues describing a change to documentation. label Dec 20, 2017
@kevinburke
Copy link
Contributor

Yikes. My goal in adding the example was to try to show the easiest-to-understand use case for a Pool. fmt was the best one I could find in the standard library.

@ulikunitz
Copy link
Contributor

The solution is of course to put only buffers with small byte slices back into the pool.

if b.Cap() <= 1<<12 {
         pool.put(b)
}

@dsnet
Copy link
Member Author

dsnet commented Dec 21, 2017

Alternatively, you could use an array of sync.Pools to bucketize the items by size: https://github.com/golang/go/blob/7e394a2/src/net/http/h2_bundle.go#L998-L1043

@bcmills
Copy link
Contributor

bcmills commented Dec 21, 2017

The solution is of course …

There are many possible solutions: the important thing is to apply one of them.

A related problem can arise with goroutine stacks in conjunction with “worker pools”, depending on when and how often the runtime reclaims large stacks. (IIRC that has changed several times over the lifetime of the Go runtime, so I'm not sure what the current behavior is.) If you have a pool of worker goroutines executing callbacks that can vary significantly in stack usage, you can end up with all of the workers consuming very large stacks even if the overall fraction of large tasks remains very low.

@kevinburke
Copy link
Contributor

kevinburke commented Dec 21, 2017

Do you have any suggestions for better use cases we could include in the example, that are reasonably compact?

Maybe the solution is not to recommend a sync.Pool at all anymore? This is my understanding from a comment I read about how GC makes this more or less useless

@jzelinskie
Copy link
Contributor

Would changing the example to use an array (fixed size) rather than a slice solve this problem?
In Chihaya, this is how we've used sync.Pool and our implementation before it was in the standard library.

Maybe the solution is not to recommend a sync.Pool at all anymore?

I legitimately don't think there ever was a time to generally recommend sync.Pool. I find it a pretty contentious add to the standard library because of how careful and knowledgable of the runtime you need to be in order to use it effectively. If you need optimization at this level, you probably know how to implement this best for your own use case.

Sorry to interject randomly, but I saw this thread on Twitter and have strong opinions on this feature.

@aclements
Copy link
Member

Maybe the solution is not to recommend a sync.Pool at all anymore? This is my understanding from a comment I read about how GC makes this more or less useless

We would certainly like to get to this point, and the GC has improved a lot, but for high-churn allocations with obvious lifetimes and no need for zeroing, sync.Pool can still be a significant optimization. As @RLH likes to say, every use of sync.Pool is a bug report on the GC. But we're still taking those bug reports. :)

I should also note that if #22950 is done, then usages like this will cause large buffers to be pinned forever since this example has a steady state of Pool usage, so the GC would never clear the pool.

That's clearly true, but even right now it's partly by chance that these examples are eventually dropping the large buffers. And in the more realistic stochastic mix example, it's not clear to me that #22950 would make it any better or worse.

I agree with @dsnet's original point that we should document that sync.Pool treats all objects interchangeably, so they should all have roughly the same "weight". And it would be good to provide some suggestions for what to do in situations where this isn't the case, and perhaps some concrete examples of poor sync.Pool usage.

@FMNSSun
Copy link

FMNSSun commented Jul 22, 2018

We've used sync.Pool for dealing with network packets and others do too (such as lucas-clemente/quic-go) because for those use cases you gain performance when using them. However, in those cases []byte is used instead of bytes.Buffer and they all have the same capacity and are re-sliced as needed. Before putting them back they are re-sliced to MaxPacketSize and before reading you Get a new []byte and re-slice it to the size of the network packet.

The same we did even for structs such as when parsing packets to a struct func ParsePacket([]byte data) *Packet (which overwrites all fields in a struct anyway) which means instead of allocating/freeing thousands of new structs each second they can be re-used using sync.Pool which makes things a little bit faster.

@FMNSSun
Copy link

FMNSSun commented Jul 23, 2018

I guess a minimal example of such a usage would be:

package main

import (
	"sync"
	"net"
)

const MaxPacketSize = 4096

var bufPool = sync.Pool {
	New: func() interface{} {
		return make([]byte, MaxPacketSize)
	},
}

func process(outChan chan []byte) {
	for data := range outChan {
		// process data

		// Re-slice to maximum capacity and return it
		// for re-use. This is important to guarantee that
		// all calls to Get() will return a buffer of
		// length MaxPacketSize.
		bufPool.Put(data[:MaxPacketSize])
	}
}

func reader(conn net.PacketConn, outChan chan []byte) {
	for {
		data := bufPool.Get().([]byte)
		
		n, _, err := conn.ReadFrom(data)
		
		if err != nil {
			break
		}
		
		outChan <- data[:n]
	}
	
	close(outChan)
}

func main() {
	N := 3
	var wg sync.WaitGroup
	
	outChan := make(chan []byte, N)
	
	wg.Add(N)
	
	for i := 0; i < N; i++ {
		go func() {
			process(outChan)
			wg.Done()
		}()
	}
	
	wg.Add(1)

	conn, err := net.ListenPacket("udp", "localhost:10001")

	if err != nil {
		panic(err.Error())
	}
	
	go func() {
		reader(conn, outChan)
		wg.Done()
	}()
	
	wg.Wait()
}

but of course... whether this is going to be faster than the GC depends on how many packets per second you have to deal with and what exactly you do with the data etc. In real world you'd benchmark with GC/sync.Pool and compare the two. At the time we wrote our code there was a significant time spent allocating new stuff and using a scheme as above we've managed to increase the throughput. Of course, one should re-benchmark this with every update to the GC.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/136035 mentions this issue: encoding/json: fix usage of sync.Pool

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/136115 mentions this issue: sync: clarify proper Pool usage for dynamically sized buffers

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/136116 mentions this issue: fmt: fix usage of sync.Pool

aktau added a commit to aktau/log4jscanner that referenced this issue Mar 16, 2022
This uses a sync.Pool wrapper (called pool.Dynamic) that prevents the
pool from holding on to very large buffers indefinitely, while still
amortizing the cost of allocation. The policy appears to give good
results both with the pre-existing tests and the specific tests added
for the pool.

This is useful because the library is also used from long-running server
contexts, where it would be unfortunate to pin very large buffers for
too long. See golang/go#23199.

Example algorithm run (from the test):

```
num  allocs   value      target            capacity
1    1        100000     100000.000000     100000
2    1        1          52048.000000      100000
3    1        1          28072.000000      100000
4    1        1          16084.000000      100000
5    1        1          10090.000000      100000
6    1        1          7093.000000       100000
7    2        10         5594.500000       4096
8    2        1          4845.250000       4096
9    2        1          4470.625000       4096
10   2        1          4283.312500       4096
11   2        1          4189.656250       4096
12   2        1          4142.828125       4096
13   2        1          4119.414062       4096
14   2        1          4107.707031       4096
15   2        12         4101.853516       4096
16   2        1          4098.926758       4096
17   2        1          4097.463379       4096
18   2        1          4096.731689       4096
19   2        1          4096.365845       4096
20   2        1          4096.182922       4096
21   2        1          4096.091461       4096
22   2        1          4096.045731       4096
23   2        1000       4096.022865       4096
24   2        100        4096.011433       4096
25   3        10000      10000.000000      10000
26   4        100000     100000.000000     100000
27   4        1          52048.000000      100000
28   4        100000     100000.000000     100000
29   4        1          52048.000000      100000
30   4        50000      51024.000000      100000
31   4        1          27560.000000      100000
32   4        1          15828.000000      100000
33   4        25000      25000.000000      100000
34   4        1          14548.000000      100000
35   4        1          9322.000000       100000
36   5        1          6709.000000       4096
37   6        100000     100000.000000     100000
38   6        1          52048.000000      100000
39   6        1          28072.000000      100000
40   6        1          16084.000000      100000
41   6        1          10090.000000      100000
42   6        1          7093.000000       100000
43   7        1          5594.500000       4096
44   7        1          4845.250000       4096
45   7        1          4470.625000       4096
46   7        1          4283.312500       4096
47   7        100        4189.656250       4096
48   7        100        4142.828125       4096
49   7        100        4119.414062       4096
50   7        1          4107.707031       4096
51   7        1          4101.853516       4096
52   7        1          4098.926758       4096
53   7        1          4097.463379       4096
54   7        1          4096.731689       4096
55   7        100        4096.365845       4096
56   7        200        4096.182922       4096
57   7        300        4096.091461       4096
58   7        100        4096.045731       4096
59   7        50         4096.022865       4096
60   7        50         4096.011433       4096
61   7        50         4096.005716       4096
62   7        50         4096.002858       4096
63   7        50         4096.001429       4096
64   7        1          4096.000715       4096
65   7        1          4096.000357       4096
66   7        1          4096.000179       4096
67   7        1          4096.000089       4096
68   8        100000000  100000000.000000  100000000
69   8        1000000    50500000.000000   100000000
70   8        100000     25300000.000000   100000000
71   8        10000      12655000.000000   100000000
72   8        1000       6329548.000000    100000000
73   9        100        3166822.000000    4096
74   9        10         1585459.000000    4096
75   9        1          794777.500000     4096
76   9        1          399436.750000     4096
77   9        500        201766.375000     4096
78   9        2020       102931.187500     4096
79   9        400        53513.593750      4096
80   9        3984       28804.796875      4096
81   9        5          16450.398438      4096
82   9        200        10273.199219      4096
83   9        500        7184.599609       4096
84   10       40000      40000.000000      40000
85   10       35000      37500.000000      40000
86   11       45000      45000.000000      45000
87   11       42000      43500.000000      45000
88   11       38000      40750.000000      45000
89   11       38000      39375.000000      45000
90   11       39000      39187.500000      45000
91   11       41000      41000.000000      45000
92   11       42000      42000.000000      45000
93   11       42000      42000.000000      45000
94   11       2000       23048.000000      45000
95   11       4000       13572.000000      45000
96   11       3949       8834.000000       45000
97   11       2011       6465.000000       45000
98   11       4096       5280.500000       45000
99   11       33         4688.250000       45000
100  11       0          4392.125000       45000
101  12       4938       4938.000000       4938
102  12       1          4517.000000       4938
103  12       1          4306.500000       4938
104  12       1200       4201.250000       4938
105  12       2400       4148.625000       4938
106  12       1200       4122.312500       4938
107  12       200        4109.156250       4938
108  12       400        4102.578125       4938
109  12       600        4099.289062       4938
110  12       700        4097.644531       4938
111  12       100        4096.822266       4938
112  12       400        4096.411133       4938
113  12       500        4096.205566       4938
114  12       700        4096.102783       4938
115  12       600        4096.051392       4938
116  12       900        4096.025696       4938
117  12       1000       4096.012848       4938
118  12       1100       4096.006424       4938
119  12       1200       4096.003212       4938
120  12       1000       4096.001606       4938
```

Benchmarks also show that the pool does retain the buffer, as
performance is not worsened over the previous commit:

```
$ git checkout main
TMPDIR="$HOME/tmp/tmpdir"
mkdir "$TMPDIR" || true
for file in jar/testdata/* ; do
  RTMPDIR="$TMPDIR/$(basename $file)"
  mkdir "$RTMPDIR" || true
  ln -fv "$PWD/$file" "$RTMPDIR"
done
for commit in $(git log --pretty=oneline | head -5 | awk '{print $1}' | tac) ; do
  git checkout $commit
  go build
  hyperfine --ignore-failure --warmup 1 "./log4jscanner $TMPDIR/400mb_jar_in_jar.jar"
  rm log4jscanner
done
HEAD is now at 48d70bf jar: add benchmarks with 400mb_jar_in_jar.jar
  Time (mean ± σ):      2.026 s ±  0.324 s    [User: 2.363 s, System: 1.269 s]
  Range (min … max):    1.651 s …  2.749 s    10 runs

HEAD is now at bf524fa jar: close the zip.File reader before recursing
  Time (mean ± σ):      1.908 s ±  0.297 s    [User: 2.084 s, System: 1.218 s]
  Range (min … max):    1.502 s …  2.567 s    10 runs

HEAD is now at 4b23cd3 jar: prefer io.ReadFull over io.ReadAll
  Time (mean ± σ):     445.9 ms ±  51.2 ms    [User: 401.7 ms, System: 79.9 ms]
  Range (min … max):   386.3 ms … 566.1 ms    10 runs

HEAD is now at 37376ef jar: reuse buffers for nested .jar's
  Time (mean ± σ):     464.5 ms ±  41.8 ms    [User: 420.5 ms, System: 93.7 ms]
  Range (min … max):   409.2 ms … 545.5 ms    10 runs

HEAD is now at c17a81b jar: do not keep large buffers unnecessarily
  Time (mean ± σ):     436.1 ms ±  26.2 ms    [User: 409.5 ms, System: 77.6 ms]
  Range (min … max):   390.2 ms … 472.7 ms    10 runs
```
ericchiang pushed a commit to google/log4jscanner that referenced this issue Mar 18, 2022
This uses a sync.Pool wrapper (called pool.Dynamic) that prevents the
pool from holding on to very large buffers indefinitely, while still
amortizing the cost of allocation. The policy appears to give good
results both with the pre-existing tests and the specific tests added
for the pool.

This is useful because the library is also used from long-running server
contexts, where it would be unfortunate to pin very large buffers for
too long. See golang/go#23199.

Example algorithm run (from the test):

```
num  allocs   value      target            capacity
1    1        100000     100000.000000     100000
2    1        1          52048.000000      100000
3    1        1          28072.000000      100000
4    1        1          16084.000000      100000
5    1        1          10090.000000      100000
6    1        1          7093.000000       100000
7    2        10         5594.500000       4096
8    2        1          4845.250000       4096
9    2        1          4470.625000       4096
10   2        1          4283.312500       4096
11   2        1          4189.656250       4096
12   2        1          4142.828125       4096
13   2        1          4119.414062       4096
14   2        1          4107.707031       4096
15   2        12         4101.853516       4096
16   2        1          4098.926758       4096
17   2        1          4097.463379       4096
18   2        1          4096.731689       4096
19   2        1          4096.365845       4096
20   2        1          4096.182922       4096
21   2        1          4096.091461       4096
22   2        1          4096.045731       4096
23   2        1000       4096.022865       4096
24   2        100        4096.011433       4096
25   3        10000      10000.000000      10000
26   4        100000     100000.000000     100000
27   4        1          52048.000000      100000
28   4        100000     100000.000000     100000
29   4        1          52048.000000      100000
30   4        50000      51024.000000      100000
31   4        1          27560.000000      100000
32   4        1          15828.000000      100000
33   4        25000      25000.000000      100000
34   4        1          14548.000000      100000
35   4        1          9322.000000       100000
36   5        1          6709.000000       4096
37   6        100000     100000.000000     100000
38   6        1          52048.000000      100000
39   6        1          28072.000000      100000
40   6        1          16084.000000      100000
41   6        1          10090.000000      100000
42   6        1          7093.000000       100000
43   7        1          5594.500000       4096
44   7        1          4845.250000       4096
45   7        1          4470.625000       4096
46   7        1          4283.312500       4096
47   7        100        4189.656250       4096
48   7        100        4142.828125       4096
49   7        100        4119.414062       4096
50   7        1          4107.707031       4096
51   7        1          4101.853516       4096
52   7        1          4098.926758       4096
53   7        1          4097.463379       4096
54   7        1          4096.731689       4096
55   7        100        4096.365845       4096
56   7        200        4096.182922       4096
57   7        300        4096.091461       4096
58   7        100        4096.045731       4096
59   7        50         4096.022865       4096
60   7        50         4096.011433       4096
61   7        50         4096.005716       4096
62   7        50         4096.002858       4096
63   7        50         4096.001429       4096
64   7        1          4096.000715       4096
65   7        1          4096.000357       4096
66   7        1          4096.000179       4096
67   7        1          4096.000089       4096
68   8        100000000  100000000.000000  100000000
69   8        1000000    50500000.000000   100000000
70   8        100000     25300000.000000   100000000
71   8        10000      12655000.000000   100000000
72   8        1000       6329548.000000    100000000
73   9        100        3166822.000000    4096
74   9        10         1585459.000000    4096
75   9        1          794777.500000     4096
76   9        1          399436.750000     4096
77   9        500        201766.375000     4096
78   9        2020       102931.187500     4096
79   9        400        53513.593750      4096
80   9        3984       28804.796875      4096
81   9        5          16450.398438      4096
82   9        200        10273.199219      4096
83   9        500        7184.599609       4096
84   10       40000      40000.000000      40000
85   10       35000      37500.000000      40000
86   11       45000      45000.000000      45000
87   11       42000      43500.000000      45000
88   11       38000      40750.000000      45000
89   11       38000      39375.000000      45000
90   11       39000      39187.500000      45000
91   11       41000      41000.000000      45000
92   11       42000      42000.000000      45000
93   11       42000      42000.000000      45000
94   11       2000       23048.000000      45000
95   11       4000       13572.000000      45000
96   11       3949       8834.000000       45000
97   11       2011       6465.000000       45000
98   11       4096       5280.500000       45000
99   11       33         4688.250000       45000
100  11       0          4392.125000       45000
101  12       4938       4938.000000       4938
102  12       1          4517.000000       4938
103  12       1          4306.500000       4938
104  12       1200       4201.250000       4938
105  12       2400       4148.625000       4938
106  12       1200       4122.312500       4938
107  12       200        4109.156250       4938
108  12       400        4102.578125       4938
109  12       600        4099.289062       4938
110  12       700        4097.644531       4938
111  12       100        4096.822266       4938
112  12       400        4096.411133       4938
113  12       500        4096.205566       4938
114  12       700        4096.102783       4938
115  12       600        4096.051392       4938
116  12       900        4096.025696       4938
117  12       1000       4096.012848       4938
118  12       1100       4096.006424       4938
119  12       1200       4096.003212       4938
120  12       1000       4096.001606       4938
```

Benchmarks also show that the pool does retain the buffer, as
performance is not worsened over the previous commit:

```
$ git checkout main
TMPDIR="$HOME/tmp/tmpdir"
mkdir "$TMPDIR" || true
for file in jar/testdata/* ; do
  RTMPDIR="$TMPDIR/$(basename $file)"
  mkdir "$RTMPDIR" || true
  ln -fv "$PWD/$file" "$RTMPDIR"
done
for commit in $(git log --pretty=oneline | head -5 | awk '{print $1}' | tac) ; do
  git checkout $commit
  go build
  hyperfine --ignore-failure --warmup 1 "./log4jscanner $TMPDIR/400mb_jar_in_jar.jar"
  rm log4jscanner
done
HEAD is now at 48d70bf jar: add benchmarks with 400mb_jar_in_jar.jar
  Time (mean ± σ):      2.026 s ±  0.324 s    [User: 2.363 s, System: 1.269 s]
  Range (min … max):    1.651 s …  2.749 s    10 runs

HEAD is now at bf524fa jar: close the zip.File reader before recursing
  Time (mean ± σ):      1.908 s ±  0.297 s    [User: 2.084 s, System: 1.218 s]
  Range (min … max):    1.502 s …  2.567 s    10 runs

HEAD is now at 4b23cd3 jar: prefer io.ReadFull over io.ReadAll
  Time (mean ± σ):     445.9 ms ±  51.2 ms    [User: 401.7 ms, System: 79.9 ms]
  Range (min … max):   386.3 ms … 566.1 ms    10 runs

HEAD is now at 37376ef jar: reuse buffers for nested .jar's
  Time (mean ± σ):     464.5 ms ±  41.8 ms    [User: 420.5 ms, System: 93.7 ms]
  Range (min … max):   409.2 ms … 545.5 ms    10 runs

HEAD is now at c17a81b jar: do not keep large buffers unnecessarily
  Time (mean ± σ):     436.1 ms ±  26.2 ms    [User: 409.5 ms, System: 77.6 ms]
  Range (min … max):   390.2 ms … 472.7 ms    10 runs
```
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/455236 mentions this issue: encoding/json: implement more intelligent encoder buffer pooling

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/464344 mentions this issue: log: reduce lock contention

gopherbot pushed a commit that referenced this issue Feb 3, 2023
Logger.Log methods are called in a highly concurrent manner in many servers.
Acquiring a lock in each method call results in high lock contention,
especially since each lock covers a non-trivial amount of work
(e.g., formatting the header and outputting to the writer).

Changes made:
* Modify the Logger to use atomics so that the header formatting
  can be moved out of the critical lock section.
  Acquiring the flags does not occur in the same critical section
  as outputting to the underlying writer, so this introduces
  a very slight consistency instability where concurrently calling
  multiple Logger.Output along with Logger.SetFlags may cause
  the older flags to be used by some ongoing Logger.Output calls
  even after Logger.SetFlags has returned.
* Use a sync.Pool to buffer the intermediate buffer.
  This approach is identical to how fmt does it,
  with the same max cap mitigation for #23199.
* Only protect outputting to the underlying writer with a lock
  to ensure serialized ordering of output.

Performance:
	name           old time/op  new time/op  delta
	Concurrent-24  19.9µs ± 2%   8.3µs ± 1%  -58.37%  (p=0.000 n=10+10)

Updates #19438

Change-Id: I091beb7431d8661976a6c01cdb0d145e37fe3d22
Reviewed-on: https://go-review.googlesource.com/c/go/+/464344
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
johanbrandhorst pushed a commit to Pryz/go that referenced this issue Feb 12, 2023
Logger.Log methods are called in a highly concurrent manner in many servers.
Acquiring a lock in each method call results in high lock contention,
especially since each lock covers a non-trivial amount of work
(e.g., formatting the header and outputting to the writer).

Changes made:
* Modify the Logger to use atomics so that the header formatting
  can be moved out of the critical lock section.
  Acquiring the flags does not occur in the same critical section
  as outputting to the underlying writer, so this introduces
  a very slight consistency instability where concurrently calling
  multiple Logger.Output along with Logger.SetFlags may cause
  the older flags to be used by some ongoing Logger.Output calls
  even after Logger.SetFlags has returned.
* Use a sync.Pool to buffer the intermediate buffer.
  This approach is identical to how fmt does it,
  with the same max cap mitigation for golang#23199.
* Only protect outputting to the underlying writer with a lock
  to ensure serialized ordering of output.

Performance:
	name           old time/op  new time/op  delta
	Concurrent-24  19.9µs ± 2%   8.3µs ± 1%  -58.37%  (p=0.000 n=10+10)

Updates golang#19438

Change-Id: I091beb7431d8661976a6c01cdb0d145e37fe3d22
Reviewed-on: https://go-review.googlesource.com/c/go/+/464344
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/471200 mentions this issue: encoding/json: improve Marshal memory utilization

eric pushed a commit to fancybits/go that referenced this issue Sep 7, 2023
Logger.Log methods are called in a highly concurrent manner in many servers.
Acquiring a lock in each method call results in high lock contention,
especially since each lock covers a non-trivial amount of work
(e.g., formatting the header and outputting to the writer).

Changes made:
* Modify the Logger to use atomics so that the header formatting
  can be moved out of the critical lock section.
  Acquiring the flags does not occur in the same critical section
  as outputting to the underlying writer, so this introduces
  a very slight consistency instability where concurrently calling
  multiple Logger.Output along with Logger.SetFlags may cause
  the older flags to be used by some ongoing Logger.Output calls
  even after Logger.SetFlags has returned.
* Use a sync.Pool to buffer the intermediate buffer.
  This approach is identical to how fmt does it,
  with the same max cap mitigation for golang#23199.
* Only protect outputting to the underlying writer with a lock
  to ensure serialized ordering of output.

Performance:
	name           old time/op  new time/op  delta
	Concurrent-24  19.9µs ± 2%   8.3µs ± 1%  -58.37%  (p=0.000 n=10+10)

Updates golang#19438

Change-Id: I091beb7431d8661976a6c01cdb0d145e37fe3d22
Reviewed-on: https://go-review.googlesource.com/c/go/+/464344
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
eric pushed a commit to fancybits/go that referenced this issue Sep 7, 2023
Logger.Log methods are called in a highly concurrent manner in many servers.
Acquiring a lock in each method call results in high lock contention,
especially since each lock covers a non-trivial amount of work
(e.g., formatting the header and outputting to the writer).

Changes made:
* Modify the Logger to use atomics so that the header formatting
  can be moved out of the critical lock section.
  Acquiring the flags does not occur in the same critical section
  as outputting to the underlying writer, so this introduces
  a very slight consistency instability where concurrently calling
  multiple Logger.Output along with Logger.SetFlags may cause
  the older flags to be used by some ongoing Logger.Output calls
  even after Logger.SetFlags has returned.
* Use a sync.Pool to buffer the intermediate buffer.
  This approach is identical to how fmt does it,
  with the same max cap mitigation for golang#23199.
* Only protect outputting to the underlying writer with a lock
  to ensure serialized ordering of output.

Performance:
	name           old time/op  new time/op  delta
	Concurrent-24  19.9µs ± 2%   8.3µs ± 1%  -58.37%  (p=0.000 n=10+10)

Updates golang#19438

Change-Id: I091beb7431d8661976a6c01cdb0d145e37fe3d22
Reviewed-on: https://go-review.googlesource.com/c/go/+/464344
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
@mitar
Copy link
Contributor

mitar commented Oct 21, 2023

I think one issue is that bytes.Buffer does not have Shrink() method to make the underlying buffer smaller. It could be used as a middle ground between retaining full buffer and dropping it completely. Then one could shrink the buffer to an acceptable size before returning it to the pool.

@puellanivis
Copy link

So I believe, capping the length or capacity of a slice does not actually reduce the size of the underlying backing array. So, this would not actually save or restore any more memory than stuffing the whole buffer anyways. The only way to truly shrink the memory usage of a slice would be to reallocate it at a smaller size, and then copy the data into… but at that point, in this particular use case, that means we would be better off just dropping the whole slice on the ground, and letting the GC clean it up.

@zigo101
Copy link

zigo101 commented Oct 22, 2023

For the buffer specific problem, it would be great to support a new Buffer type with multiple underlying (same-capacity) slices, so that we can instead put the underlying slices in Pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Documentation Issues describing a change to documentation. help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests