-
Notifications
You must be signed in to change notification settings - Fork 164
Make EnableClient* methods concurrent safe #90
Conversation
In one of my apps, I'm hitting a data race error in `EnableClientHandlingTimeHistogram`. It looks like the problem is that the `EnableClient*` methods are not concurrent safe. Rather than adding a generic lock, I used `sync.Once` since it looks like these are only intended to be called once. Additionally, I moved the `m.client*HistogramEnabled = true` up within the `if` block, since there's no point in setting it unless it's not `true`... which is what this `if` block checks...
m.clientHandledHistogramEnabled = true | ||
m.doOnceClientHandledHistogramEnable.Do(func() { | ||
for _, o := range opts { | ||
o(&m.clientHandledHistogramOpts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if the opts
handling should be within this block, but I figured there's no point in processing them a second time since the subsequent code will bail and never read them.
m.clientHandledHistogramOpts, | ||
[]string{"grpc_type", "grpc_service", "grpc_method"}, | ||
) | ||
m.clientHandledHistogramEnabled = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this up into the if
block, since there's no point in overwriting true
with true
...
clientHandledHistogramEnabled bool | ||
clientHandledHistogramOpts prom.HistogramOpts | ||
clientHandledHistogram *prom.HistogramVec | ||
doOnceClientHandledHistogramEnable sync.Once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure what a good name is, this seems to work but open to alternatives.
Codecov Report
@@ Coverage Diff @@
## master #90 +/- ##
==========================================
- Coverage 72.45% 72.13% -0.33%
==========================================
Files 11 11
Lines 363 366 +3
==========================================
+ Hits 263 264 +1
- Misses 89 91 +2
Partials 11 11
Continue to review full report at Codecov.
|
nudge @brancz any chance you could look at this? Should be a relatively low-risk change. |
Hi @jeffwidman, Thanks for this, but I think it would be awesome if we can do this on v2 we plan... and the plan big as we design v2 of prometheus middleware to go inside go-grpc-middleware v2 So... I might think your work will be a bit wasted if we would merge this ): But! If you want, you can help us to move prometheus middleware to https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2 We need a brave volunteer! (: I just created an issue for this: #91 |
Hmm... The new v2 plan looks interesting, and I'll need to read more. However, this is a bugfix, and a non-breaking one. So why not merge it to v1? #91 mentions continuing to accept bugfixes to the v1 branch for a period of time during the migration period. In fact, merging to v1 now would make sure it is included in both v1 and v2 stuff, so the fix wouldn't need to later be backported. Again, I think migrating this to be part of all the |
nudge @bwplotka what do you think about my comment ☝️...
|
Not really, it won't be included in v2. We already split branches. |
And we have totally separate implementation so it won't matter much. And yes, if it's a bug we can merge.. But is it? |
BTW I am almost sure you might be using this library wrongly. |
It happens when running unit tests if two unit tests both spin up the server (which is a common thing). If you don't run |
Hi 👋🏽 Sorry for the massive, lag. We consolidating projects and moving to single repo where we have more control and awareness. We are moving this code base longer to https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2 and we moved existing state of Prometheus middleware to https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2/providers/openmetrics This means that before we release Sorry for the confusion, but it's needed for the project sustainability. Cheers! Regarding the change I think I understand it, cool. Please check v2, because we changed the code to never use globals, so things should be concurrency friendly |
In one of my apps, I'm hitting a data race error in
EnableClientHandlingTimeHistogram
. It looks like the problem is that theEnableClient*
methods are not concurrent safe.The problem is triggered when two grpc clients each call
grpc_prometheus.EnableClientHandlingTimeHistogram()
. They end up sharing the sameDefaultClientMetrics
which results in two calls toDefaultClientMetrics.EnableClientHandlingTimeHistogram()
. That causes the race detector to throw an error about that method not being concurrent safe.Rather than adding a generic lock, I used
sync.Once
since it looks like these are only intended to be called once.