-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Correctly set byte and message out totals per subscription #18451
[fix][broker] Correctly set byte and message out totals per subscription #18451
Conversation
ce05f9f
to
fa9b71d
Compare
4afc9c4
to
216ec00
Compare
Codecov Report
@@ Coverage Diff @@
## master #18451 +/- ##
=============================================
+ Coverage 31.39% 47.27% +15.88%
- Complexity 6651 10456 +3805
=============================================
Files 697 697
Lines 68015 68015
Branches 7285 7285
=============================================
+ Hits 21353 32157 +10804
+ Misses 43667 32267 -11400
- Partials 2995 3591 +596
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @pgier. The fix looks good to me. I think we might want to break this out into two PRs though because the PrometheusMetricStreams
might require larger changes. Let me know what you think. Thanks!
...-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricStreams.java
Outdated
Show resolved
Hide resolved
...-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricStreams.java
Outdated
Show resolved
Hide resolved
I second @michaeljmarshall comments |
216ec00
to
b88a386
Compare
@michaeljmarshall @eolivelli I've removed the changes to the metric type and only left in the changes to fix the metric value. However, I rebased on the latest master, and some recent PR seems to have broken a lot of the tests around prometheus metrics. There are some new metrics introduced that look like this:
And this breaks the count checks in several unit tests. I'm not sure which change introduced this issue, but possibly this one: #17905 |
Fixes apache#15819 The existing code calculates the pulsar_out_bytes_total and pulsar_out_messages_total per subscription metrics by adding the values from the currently connected consumers. This produces incorrect values as soon as one or more of the consumers disconnects from the subscription. This updates these two metrics to directly use the subscription stats for these values, and match the output of `pulsar-admin topic stats`. It also changes these metrics to be defined as `counter` type. Signed-off-by: Paul Gier <paul.gier@datastax.com>
b88a386
to
62514d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@pgier - which test was failing? I can't seem to reproduce that metric output. |
@michaeljmarshall You should be able to reproduce the issue by running the broker metrics tests in the pulsar-broker subdir.
They fail on current master but work fine if you revert 3715934. |
Thanks @pgier. I realized that I wasn't able to reproduce it in my basic test because I wasn't enabling the transaction coordinator. |
…ion (#18451) Signed-off-by: Paul Gier <paul.gier@datastax.com> Fixes #15819 The existing code calculates the pulsar_out_bytes_total and pulsar_out_messages_total per subscription metrics by adding the values from the currently connected consumers. This produces incorrect values as soon as one or more of the consumers disconnects from the subscription. This changes these two metrics to directly use the subscription stats for these values, and match the output of `pulsar-admin topic stats`. Signed-off-by: Paul Gier <paul.gier@datastax.com> Fixes #15819 ### Motivation The prometheus metrics for pulsar_out_bytes_total and pulsar_out_messages_total should never decrease, and they should match the output seen when using pulsar-admin. ### Modifications Changed the calculation of pulsar_out_bytes_total and pulsar_out_messages_total to directly use the subscription stats instead of calculating these values by summing the values of the currently connected consumers. ### Verifying this change - [X] Make sure that the change passes the CI checks. Added a unit test to cover this case. ### Does this pull request potentially affect one of the following parts: *If the box was checked, please highlight the changes* - [ ] Dependencies (add or upgrade a dependency) - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [ ] The binary protocol - [ ] The REST endpoints - [ ] The admin CLI options - [ ] Anything that affects deployment ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc` <!-- Your PR contains doc changes. Please attach the local preview screenshots (run `sh start.sh` at `pulsar/site2/website`) to your PR description, or else your PR might not get merged. --> - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later --> - [X] `doc-not-needed` <!-- Your PR changes do not impact docs --> - [ ] `doc-complete` <!-- Docs have been already added --> ### Matching PR in forked repository PR in forked repository: pgier#2 (cherry picked from commit c03e33e)
…ion (#18451) Signed-off-by: Paul Gier <paul.gier@datastax.com> Fixes #15819 The existing code calculates the pulsar_out_bytes_total and pulsar_out_messages_total per subscription metrics by adding the values from the currently connected consumers. This produces incorrect values as soon as one or more of the consumers disconnects from the subscription. This changes these two metrics to directly use the subscription stats for these values, and match the output of `pulsar-admin topic stats`. Signed-off-by: Paul Gier <paul.gier@datastax.com> Fixes #15819 ### Motivation The prometheus metrics for pulsar_out_bytes_total and pulsar_out_messages_total should never decrease, and they should match the output seen when using pulsar-admin. ### Modifications Changed the calculation of pulsar_out_bytes_total and pulsar_out_messages_total to directly use the subscription stats instead of calculating these values by summing the values of the currently connected consumers. ### Verifying this change - [X] Make sure that the change passes the CI checks. Added a unit test to cover this case. ### Does this pull request potentially affect one of the following parts: *If the box was checked, please highlight the changes* - [ ] Dependencies (add or upgrade a dependency) - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [ ] The binary protocol - [ ] The REST endpoints - [ ] The admin CLI options - [ ] Anything that affects deployment ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc` <!-- Your PR contains doc changes. Please attach the local preview screenshots (run `sh start.sh` at `pulsar/site2/website`) to your PR description, or else your PR might not get merged. --> - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later --> - [X] `doc-not-needed` <!-- Your PR changes do not impact docs --> - [ ] `doc-complete` <!-- Docs have been already added --> ### Matching PR in forked repository PR in forked repository: pgier#2 (cherry picked from commit c03e33e)
…ion (apache#18451) Signed-off-by: Paul Gier <paul.gier@datastax.com> Fixes apache#15819 The existing code calculates the pulsar_out_bytes_total and pulsar_out_messages_total per subscription metrics by adding the values from the currently connected consumers. This produces incorrect values as soon as one or more of the consumers disconnects from the subscription. This changes these two metrics to directly use the subscription stats for these values, and match the output of `pulsar-admin topic stats`. Signed-off-by: Paul Gier <paul.gier@datastax.com> Fixes apache#15819 ### Motivation The prometheus metrics for pulsar_out_bytes_total and pulsar_out_messages_total should never decrease, and they should match the output seen when using pulsar-admin. ### Modifications Changed the calculation of pulsar_out_bytes_total and pulsar_out_messages_total to directly use the subscription stats instead of calculating these values by summing the values of the currently connected consumers. ### Verifying this change - [X] Make sure that the change passes the CI checks. Added a unit test to cover this case. ### Does this pull request potentially affect one of the following parts: *If the box was checked, please highlight the changes* - [ ] Dependencies (add or upgrade a dependency) - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [ ] The binary protocol - [ ] The REST endpoints - [ ] The admin CLI options - [ ] Anything that affects deployment ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc` <!-- Your PR contains doc changes. Please attach the local preview screenshots (run `sh start.sh` at `pulsar/site2/website`) to your PR description, or else your PR might not get merged. --> - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later --> - [X] `doc-not-needed` <!-- Your PR changes do not impact docs --> - [ ] `doc-complete` <!-- Docs have been already added --> ### Matching PR in forked repository PR in forked repository: pgier#2 (cherry picked from commit c03e33e) (cherry picked from commit 54dccf9)
…ion (apache#18451) Signed-off-by: Paul Gier <paul.gier@datastax.com> Fixes apache#15819 The existing code calculates the pulsar_out_bytes_total and pulsar_out_messages_total per subscription metrics by adding the values from the currently connected consumers. This produces incorrect values as soon as one or more of the consumers disconnects from the subscription. This changes these two metrics to directly use the subscription stats for these values, and match the output of `pulsar-admin topic stats`. Signed-off-by: Paul Gier <paul.gier@datastax.com> Fixes apache#15819 ### Motivation The prometheus metrics for pulsar_out_bytes_total and pulsar_out_messages_total should never decrease, and they should match the output seen when using pulsar-admin. ### Modifications Changed the calculation of pulsar_out_bytes_total and pulsar_out_messages_total to directly use the subscription stats instead of calculating these values by summing the values of the currently connected consumers. ### Verifying this change - [X] Make sure that the change passes the CI checks. Added a unit test to cover this case. ### Does this pull request potentially affect one of the following parts: *If the box was checked, please highlight the changes* - [ ] Dependencies (add or upgrade a dependency) - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [ ] The binary protocol - [ ] The REST endpoints - [ ] The admin CLI options - [ ] Anything that affects deployment ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc` <!-- Your PR contains doc changes. Please attach the local preview screenshots (run `sh start.sh` at `pulsar/site2/website`) to your PR description, or else your PR might not get merged. --> - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later --> - [X] `doc-not-needed` <!-- Your PR changes do not impact docs --> - [ ] `doc-complete` <!-- Docs have been already added --> ### Matching PR in forked repository PR in forked repository: pgier#2
Fixes #15819
The existing code calculates the pulsar_out_bytes_total and pulsar_out_messages_total per subscription metrics by adding the values from the currently connected consumers. This produces incorrect values as soon as one or more of the consumers disconnects from the subscription.
This changes these two metrics to directly use the subscription stats for these values, and match the output of
pulsar-admin topic stats
.Signed-off-by: Paul Gier paul.gier@datastax.com
Fixes #15819
Motivation
The prometheus metrics for pulsar_out_bytes_total and pulsar_out_messages_total should never decrease,
and they should match the output seen when using pulsar-admin.
Modifications
Changed the calculation of pulsar_out_bytes_total and pulsar_out_messages_total to directly use the
subscription stats instead of calculating these values by summing the values of the currently connected
consumers.
Verifying this change
Added a unit test to cover this case.
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: pgier#2