Optimize counter inc-by-one #109

lingyuan2014 · 2024-11-12T18:20:48Z

Conceivably increment-by-one is by far the most common operation on counters. This PR optimize such operation, making it alloc-free.

Benchmark with:

func BenchmarkInc(b *testing.B) {
	b.Run("Increment", func(b *testing.B) {
		w := writer.MemoryWriter{}
		id := NewId("foo", nil)
		c := NewCounter(id, &w)
		b.ReportAllocs()
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			c.Increment()
		}
	})
}

before optimization:

goos: darwin
goarch: arm64
pkg: github.com/Netflix/spectator-go/v2/spectator/meter
cpu: Apple M3 Pro
BenchmarkInc/Increment-12               10298648               115.2ns/op           120 B/op          3 allocs/op
PASS
ok      github.com/Netflix/spectator-go/v2/spectator/meter      2.485s

after optimization:

goos: darwin
goarch: arm64
pkg: github.com/Netflix/spectator-go/v2/spectator/meter
cpu: Apple M3 Pro
BenchmarkInc/Increment-12               40374824                31.44ns/op           97 B/op          0 allocs/op
PASS
ok      github.com/Netflix/spectator-go/v2/spectator/meter      3.563s

Conceivably increment-by-one is by far the most common operation on counters. This PR optimize such operation, making it alloc-free. Benchmark with: ``` func BenchmarkInc(b *testing.B) { b.Run("Increment", func(b *testing.B) { w := writer.MemoryWriter{} id := NewId("foo", nil) c := NewCounter(id, &w) b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { c.Increment() } }) } ``` before optimization: ``` goos: darwin goarch: arm64 pkg: github.com/Netflix/spectator-go/v2/spectator/meter cpu: Apple M3 Pro BenchmarkInc/Increment-12 10298648 115.2 ns/op 120 B/op 3 allocs/op PASS ok github.com/Netflix/spectator-go/v2/spectator/meter 2.485s ``` after optimization: ``` goos: darwin goarch: arm64 pkg: github.com/Netflix/spectator-go/v2/spectator/meter cpu: Apple M3 Pro BenchmarkInc/Increment-12 40374824 31.44 ns/op 97 B/op 0 allocs/op PASS ok github.com/Netflix/spectator-go/v2/spectator/meter 3.563s ```

dmuino · 2024-11-12T19:23:42Z

I like the idea in principle but ideally we'd also like to understand the effect of the additional branching where a mixture of increment by one, and increment by other numbers might introduce some stalling.

Is this change motivated by a concrete problem or is it more abstract in nature? In particular making counters significantly different than the other meter types is adding some complexity to the code that needs to be weighed properly.

sgg

So TBH I don't think there's much value to this change given the way the spectator-go/v2 works given that updating the meter lead to I/O today.

Additionally I wouldn't want to expose more surface area to the API unless there was a good justification.

copperlight

Thank you for sharing your thoughts on optimizing performance for this library.

This change causes the design of Counters to drift from all of the meter implementations, introducing cognitive overhead.
The meter type symbols are important and tied to the operation of spectatord. I think they deserve to be enshrined within the struct and the constructors, because it defines what these values mean, and offers the ability to do introspection down the line, if needed.
Removing meterTypeSymbol in favor of incByOneLine obfuscates what is happening with the meter type.
I would rather see incByOneLine added to the struct as-is, so it can be optionally selected, when needed. This can then be used in the Increment method and the optional branch in the Add method. This makes it more obvious that this construct is a caching mechanism for a specific use case.
It would be interesting to see the performance benchmark compare the difference between Add(1) and Add(>1), so we can have a sense of the overhead introduced by branching. It may be minimal.

Were you able to identify a specific use case where the single increments of Counter meters are causing performance or allocation issues?

In the cases where we have seen spectator-go consume the highest amount of CPU in flamegraphs, I/O has been a large contributor, which can also cause allocations.

dmuino approved these changes Nov 12, 2024

View reviewed changes

copperlight requested review from copperlight and dmuino November 12, 2024 19:13

copperlight requested a review from sgg November 12, 2024 19:25

sgg reviewed Nov 12, 2024

View reviewed changes

copperlight reviewed Nov 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize counter inc-by-one #109

Optimize counter inc-by-one #109

lingyuan2014 commented Nov 12, 2024 •

edited

Loading

dmuino commented Nov 12, 2024

sgg left a comment

copperlight left a comment

Optimize counter inc-by-one #109

Are you sure you want to change the base?

Optimize counter inc-by-one #109

Conversation

lingyuan2014 commented Nov 12, 2024 • edited Loading

dmuino commented Nov 12, 2024

sgg left a comment

Choose a reason for hiding this comment

copperlight left a comment

Choose a reason for hiding this comment

lingyuan2014 commented Nov 12, 2024 •

edited

Loading