Skip to content
This repository has been archived by the owner on Apr 18, 2023. It is now read-only.

Make EnableClient* methods concurrent safe #90

Closed
wants to merge 1 commit into from
Closed

Make EnableClient* methods concurrent safe #90

wants to merge 1 commit into from

Conversation

jeffwidman
Copy link
Contributor

@jeffwidman jeffwidman commented May 7, 2020

In one of my apps, I'm hitting a data race error in EnableClientHandlingTimeHistogram. It looks like the problem is that the EnableClient* methods are not concurrent safe.

The problem is triggered when two grpc clients each call grpc_prometheus.EnableClientHandlingTimeHistogram(). They end up sharing the same DefaultClientMetrics which results in two calls to DefaultClientMetrics.EnableClientHandlingTimeHistogram(). That causes the race detector to throw an error about that method not being concurrent safe.

Rather than adding a generic lock, I used sync.Once since it looks like these are only intended to be called once.

In one of my apps, I'm hitting a data race error in `EnableClientHandlingTimeHistogram`. It looks like the problem is that the `EnableClient*` methods are not concurrent safe.

Rather than adding a generic lock, I used `sync.Once` since it looks like these are only intended to be called once.

Additionally, I moved the `m.client*HistogramEnabled = true` up within the `if` block, since there's no point in setting it unless it's not `true`... which is what this `if` block checks...
m.clientHandledHistogramEnabled = true
m.doOnceClientHandledHistogramEnable.Do(func() {
for _, o := range opts {
o(&m.clientHandledHistogramOpts)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if the opts handling should be within this block, but I figured there's no point in processing them a second time since the subsequent code will bail and never read them.

m.clientHandledHistogramOpts,
[]string{"grpc_type", "grpc_service", "grpc_method"},
)
m.clientHandledHistogramEnabled = true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this up into the if block, since there's no point in overwriting true with true...

clientHandledHistogramEnabled bool
clientHandledHistogramOpts prom.HistogramOpts
clientHandledHistogram *prom.HistogramVec
doOnceClientHandledHistogramEnable sync.Once
Copy link
Contributor Author

@jeffwidman jeffwidman May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure what a good name is, this seems to work but open to alternatives.

@codecov-io
Copy link

Codecov Report

Merging #90 into master will decrease coverage by 0.32%.
The diff coverage is 25.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #90      +/-   ##
==========================================
- Coverage   72.45%   72.13%   -0.33%     
==========================================
  Files          11       11              
  Lines         363      366       +3     
==========================================
+ Hits          263      264       +1     
- Misses         89       91       +2     
  Partials       11       11              
Impacted Files Coverage Δ
client_metrics.go 62.85% <25.92%> (-0.65%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9abf3eb...7961b86. Read the comment docs.

@jeffwidman
Copy link
Contributor Author

jeffwidman commented May 11, 2020

nudge @brancz any chance you could look at this? Should be a relatively low-risk change.

@bwplotka
Copy link
Collaborator

bwplotka commented May 12, 2020

Hi @jeffwidman,

Thanks for this, but I think it would be awesome if we can do this on v2 we plan... and the plan big as we design v2 of prometheus middleware to go inside go-grpc-middleware v2 So... I might think your work will be a bit wasted if we would merge this ):

But!

If you want, you can help us to move prometheus middleware to https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2

We need a brave volunteer! (: I just created an issue for this: #91

@jeffwidman
Copy link
Contributor Author

Hmm... The new v2 plan looks interesting, and I'll need to read more.

However, this is a bugfix, and a non-breaking one.

So why not merge it to v1? #91 mentions continuing to accept bugfixes to the v1 branch for a period of time during the migration period.

In fact, merging to v1 now would make sure it is included in both v1 and v2 stuff, so the fix wouldn't need to later be backported.

Again, I think migrating this to be part of all the go-grpc-middleware might be interesting (I need to read/think on it more), all I'm saying I don't see how that affects whether to merge this given that this will apply to both v1 and v2.

@jeffwidman
Copy link
Contributor Author

nudge @bwplotka what do you think about my comment ☝️...

In fact, merging to v1 now would make sure it is included in both v1 and v2 stuff, so the fix wouldn't need to later be backported.

@bwplotka
Copy link
Collaborator

In fact, merging to v1 now would make sure it is included in both v1 and v2 stuff, so the fix wouldn't need to later be backported.

Not really, it won't be included in v2. We already split branches.

@bwplotka
Copy link
Collaborator

And we have totally separate implementation so it won't matter much. And yes, if it's a bug we can merge.. But is it?

@bwplotka
Copy link
Collaborator

BTW I am almost sure you might be using this library wrongly. .EnableClientHandlingTimeHistogram() is literally config option. You should use it once in your application. Why you use it concurrently?

@jeffwidman
Copy link
Contributor Author

BTW I am almost sure you might be using this library wrongly. .EnableClientHandlingTimeHistogram() is literally config option. You should use it once in your application. Why you use it concurrently?

It happens when running unit tests if two unit tests both spin up the server (which is a common thing). If you don't run go test -p 1 ./... then go runs the unit tests within separate packages concurrently causing the conflict.

@bwplotka
Copy link
Collaborator

Hi 👋🏽

Sorry for the massive, lag. We consolidating projects and moving to single repo where we have more control and awareness. We are moving this code base longer to https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2 and we moved existing state of Prometheus middleware to https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2/providers/openmetrics

This means that before we release v2 this is the best opportunity to shape new https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2/providers/openmetrics with whatever we want to change in API 🤗 If you want to change something effectively long term, I would suggest switching your PR and rebasing it on https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2/providers/openmetrics instead (notice v2 branch, not master!).

Sorry for the confusion, but it's needed for the project sustainability. Cheers!

Regarding the change I think I understand it, cool. Please check v2, because we changed the code to never use globals, so things should be concurrency friendly

@jeffwidman jeffwidman closed this Oct 6, 2022
@jeffwidman jeffwidman deleted the patch-2 branch October 6, 2022 05:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants