Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize counter inc-by-one #109

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lingyuan2014
Copy link

@lingyuan2014 lingyuan2014 commented Nov 12, 2024

Conceivably increment-by-one is by far the most common operation on counters. This PR optimize such operation, making it alloc-free.

Benchmark with:

func BenchmarkInc(b *testing.B) {
	b.Run("Increment", func(b *testing.B) {
		w := writer.MemoryWriter{}
		id := NewId("foo", nil)
		c := NewCounter(id, &w)
		b.ReportAllocs()
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			c.Increment()
		}
	})
}

before optimization:

goos: darwin
goarch: arm64
pkg: github.com/Netflix/spectator-go/v2/spectator/meter
cpu: Apple M3 Pro
BenchmarkInc/Increment-12               10298648               115.2ns/op           120 B/op          3 allocs/op
PASS
ok      github.com/Netflix/spectator-go/v2/spectator/meter      2.485s

after optimization:

goos: darwin
goarch: arm64
pkg: github.com/Netflix/spectator-go/v2/spectator/meter
cpu: Apple M3 Pro
BenchmarkInc/Increment-12               40374824                31.44ns/op           97 B/op          0 allocs/op
PASS
ok      github.com/Netflix/spectator-go/v2/spectator/meter      3.563s

Conceivably increment-by-one is by far the most common operation on counters.
This PR optimize such operation, making it alloc-free.

Benchmark with:
```
func BenchmarkInc(b *testing.B) {
	b.Run("Increment", func(b *testing.B) {
		w := writer.MemoryWriter{}
		id := NewId("foo", nil)
		c := NewCounter(id, &w)
		b.ReportAllocs()
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			c.Increment()
		}
	})
}
```

before optimization:
```
goos: darwin
goarch: arm64
pkg: github.com/Netflix/spectator-go/v2/spectator/meter
cpu: Apple M3 Pro
BenchmarkInc/Increment-12               10298648               115.2
ns/op           120 B/op          3 allocs/op
PASS
ok      github.com/Netflix/spectator-go/v2/spectator/meter      2.485s
```

after optimization:
```
goos: darwin
goarch: arm64
pkg: github.com/Netflix/spectator-go/v2/spectator/meter
cpu: Apple M3 Pro
BenchmarkInc/Increment-12               40374824                31.44
ns/op           97 B/op          0 allocs/op
PASS
ok      github.com/Netflix/spectator-go/v2/spectator/meter      3.563s
```
@dmuino
Copy link
Contributor

dmuino commented Nov 12, 2024

I like the idea in principle but ideally we'd also like to understand the effect of the additional branching where a mixture of increment by one, and increment by other numbers might introduce some stalling.

Is this change motivated by a concrete problem or is it more abstract in nature? In particular making counters significantly different than the other meter types is adding some complexity to the code that needs to be weighed properly.

@copperlight copperlight requested a review from sgg November 12, 2024 19:25
Copy link
Contributor

@sgg sgg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So TBH I don't think there's much value to this change given the way the spectator-go/v2 works given that updating the meter lead to I/O today.

Additionally I wouldn't want to expose more surface area to the API unless there was a good justification.

Copy link
Collaborator

@copperlight copperlight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing your thoughts on optimizing performance for this library.

  • This change causes the design of Counters to drift from all of the meter implementations, introducing cognitive overhead.
  • The meter type symbols are important and tied to the operation of spectatord. I think they deserve to be enshrined within the struct and the constructors, because it defines what these values mean, and offers the ability to do introspection down the line, if needed.
  • Removing meterTypeSymbol in favor of incByOneLine obfuscates what is happening with the meter type.
  • I would rather see incByOneLine added to the struct as-is, so it can be optionally selected, when needed. This can then be used in the Increment method and the optional branch in the Add method. This makes it more obvious that this construct is a caching mechanism for a specific use case.
  • It would be interesting to see the performance benchmark compare the difference between Add(1) and Add(>1), so we can have a sense of the overhead introduced by branching. It may be minimal.

Were you able to identify a specific use case where the single increments of Counter meters are causing performance or allocation issues?

In the cases where we have seen spectator-go consume the highest amount of CPU in flamegraphs, I/O has been a large contributor, which can also cause allocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants