Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky gauge value issue #13679

Merged
merged 2 commits into from
Jul 23, 2024

Conversation

tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented Jul 22, 2024

Fix for #13629

Currently, we maintain a map _gaugeValues in the AbstractMetrics class for gauge values. When adding a new gauge, we lock the entire operation of adding the gauge to the _gaugeValues map and the metricsRegistry factory simultaneously (Ref). However, there was a bug where the removal was not done under a lock. This led to a race condition when removeGauge and setValueOfGauge were called almost simultaneously. This could result in a scenario where the gauge is removed from the _gaugeValues map, then setValueOfGauge takes a lock on the map, adds it back to _gaugeValues, and registers it with _metricsRegistry, but then removeFromMetricsRegistry from the removeGauge method kicks in, removing it from _metricsRegistry.

This was specifically caught in our scenario by the way we have implemented our _metricsRegistry. We have a thread where we loop through all the gauges every 10 seconds and emit the value. In our scenario, _metricsRegistry was missing the gauge altogether because of the above race condition.

@jackjlli
Copy link
Member

Thanks @tibrewalpratik17 for identifying the issue, really appreciate it! Could you help add a unit test in AbstractMetricsTest class that can help better test out the functionalities for that?

@tibrewalpratik17 tibrewalpratik17 force-pushed the fix_metric_missing_issue branch from 86db21c to 046a813 Compare July 22, 2024 22:43
Copy link
Member

@jackjlli jackjlli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but LGTM. Thanks!

@tibrewalpratik17 tibrewalpratik17 force-pushed the fix_metric_missing_issue branch from 046a813 to b6f6db0 Compare July 22, 2024 23:06
@tibrewalpratik17 tibrewalpratik17 force-pushed the fix_metric_missing_issue branch from b6f6db0 to a8fd649 Compare July 22, 2024 23:06
@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.97%. Comparing base (59551e4) to head (a8fd649).
Report is 795 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13679      +/-   ##
============================================
+ Coverage     61.75%   61.97%   +0.21%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2554     +118     
  Lines        133233   140567    +7334     
  Branches      20636    21868    +1232     
============================================
+ Hits          82274    87111    +4837     
- Misses        44911    46820    +1909     
- Partials       6048     6636     +588     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.93% <100.00%> (+0.22%) ⬆️
java-21 61.82% <100.00%> (+0.20%) ⬆️
skip-bytebuffers-false 61.95% <100.00%> (+0.20%) ⬆️
skip-bytebuffers-true 61.80% <100.00%> (+34.07%) ⬆️
temurin 61.97% <100.00%> (+0.21%) ⬆️
unittests 61.96% <100.00%> (+0.21%) ⬆️
unittests1 46.45% <100.00%> (-0.44%) ⬇️
unittests2 27.73% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review July 22, 2024 23:48
@jackjlli jackjlli merged commit 870cf57 into apache:master Jul 23, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants