-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Replace atomic StateLocker approach in MMSC with Mutex approach #600
WIP: Replace atomic StateLocker approach in MMSC with Mutex approach #600
Conversation
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
b408e44
to
c402556
Compare
I presume this probably applies to Histogram -- maybe even Sum under concurrency > 4 or so? |
@evantorrie there are couple of issues here:
As a suggestion I think we can start by fixing the first issue and see where we are. I have a feeling that in case of MinMaxSumCount a Mutex will be better, but in case of Sum or Histogram (probably 2 atomic adds one for sum one for the bucket count) where we need only atomic add operations, and we don't need to read the result, the cpu/compiler will do a much better job. |
I'm not sure I understand point (1) about core.Number forcing atomics. It's just an Point (3) is a good one. For point (2) I'm not sure -- we were following the Prometheus library in using this design. I wonder why they decided to use a lock-free histogram bucket if your argument is true. Then, I remember situations where mutexes were problematic in the past. We use lock-free structures not because they have the best performance, we choose them because they have the most consistent performance. I'd like to contribute one more benchmark to this debate. |
For histogram you don't need CAS because you just do adds. Only in MinMaxSumCount you need CAS, also see my last sentence which confirms that for Sum and Histogram this is better. |
Please, would like to see this. |
Prometheus uses the lock-free approach to maintain a consistent sum and counter per bucket without a mutex. Also note that the original MMSC implementation used no locking or synchronization, and some users raised flags about this (including @evantorrie). I'll write another benchmark, just to add to this discussion. I specifically remember a case where due to large fan-out, a number of simultaneous RPC responses would be received at once, and if they all have to grab a mutex to finish the RPC, and they all do it at the same moment, we end up with poor performance. This is the benchmark I will write. |
There was some work that went into go 1.14 to address issues with sync.Mutex and high concurrency (particularly high core count machines). See golang/go#33747 |
I ran this benchmark. I suspect I don't have enough CPUs on my machine to test the regime where mutexes are expected to underperform. Does it mean we should remove StateLocker entirely? (Sorry @paivagustavo)
|
I've replicated this for histogram and indeed it is an improvement. We can find the histogram bucket before locking the mutex which reduces the blocking part to 3 simple number operations just like MMSC.
@jmacd Don't need to be sorry, It was a valid attempt and I've learned a lot with it, probably should benchmarked it sooner. After these benchmarks, I'm +1 to remove StateLocker. |
We are going to accept this change, and we'll keep an eye on the performance of MMSC and Histogram aggregators. There is a possibility that high-CPU environments notice a degradation, we think, but in the future new aggregators could be added specifically for those cases (e.g., AtomicMMSC, AtomicHistogram, ...). |
Actually, this PR is just a demonstration. |
Closing this as it's not a complete change. This has been documented and is linked from #657. We discussed in the last OTel-Go SIG call that a Mutex is probably the best default and that should a need for lockless aggregators come along, we can add new implementations in that case. |
In my example I did not use core.Number because I don't need operations to be atomic under the lock.