Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat][misc] PIP-264: Implement topic lookup metrics using OpenTelemetry #22058

Merged
merged 45 commits into from
Mar 6, 2024

Conversation

dragosvictor
Copy link
Contributor

@dragosvictor dragosvictor commented Feb 15, 2024

PIP-264

Motivation

Part of PIP-264 related work to convert metrics to OpenTelemetry format. This PR handles topic lookup metrics, as currently defined in the documentation.

Note that metric pulsar_broker_load_manager_bundle_assignment fits better into the Load Balancing metrics section, so is not included in this work.

Modifications

Added metrics as described below:

Old metric name New metric details
pulsar_broker_lookup Name pulsar.broker.lookup.request.duration
Type Histogram
Description The duration of topic lookup requests (either binary or HTTP)
Source Internal OpenTelemetry implementation
Attribute Set pulsar.lookup.response=failure replaces pulsar_broker_lookup_failures
Attribute Set pulsar.lookup.response=broker replaces pulsar_broker_lookup_answers
Attribute Set pulsar.lookup.response=redirect replaces pulsar_broker_lookup_redirects
Cardinality 3 per broker (one per attribute set)
pulsar_broker_lookup_pending_requests Name pulsar.broker.topic.lookup.operation.pending.usage
Type ObservableLongUpDownCounter
Description The number of pending lookup operations in the broker. When it reaches threshold "maxConcurrentLookupRequest" defined in broker.conf, new requests are rejected.
Source Call to BrokerService#getPendingLookupRequest
Attributes (none)
Cardinality 1 per broker
(new metric) Name pulsar.broker.topic.lookup.operation.pending.limit
Type ObservableLongUpDownCounter
Description The maximum number of pending lookup operations in the broker. Equal to "maxConcurrentLookupRequest" defined in broker.conf.
Source Call to ServiceConfiguration#getMaxConcurrentLookupRequest
Attributes (none)
Cardinality 1 per broker
pulsar_broker_topic_load_pending_requests Name pulsar.broker.topic.load.operation.pending.usage
Type ObservableLongUpDownCounter
Description The number of pending topic load operations in the broker. When it reaches threshold "maxConcurrentTopicLoadRequest" defined in broker.conf, new requests are rejected.
Source Call to BrokerService#getPendingTopicLoadRequests
Attributes (none)
Cardinality 1 per broker
(new metric) Name pulsar.broker.topic.load.operation.pending.limit
Type ObservableLongUpDownCounter
Description The maximum number of pending topic load operations in the broker. Equal to "maxConcurrentTopicLoadRequest" defined in broker.conf.
Source Call to ServiceConfiguration#getMaxConcurrentTopicLoadRequest
Attributes (none)
Cardinality 1 per broker

Moved instantiation of PulsarBrokerOpenTelemetry object to PulsarService constructor, as it is needed by the NamespaceService and BrokerService. The OpenTelemetry standard recommends setting up the SDK as early as possible, so this suits us well.

Changed PulsarTestContext to allow OpenTelemetry metrics validation as part of the tests.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

(example:)

  • Extended existing lookup tests BrokerServiceLookupTest#testMultipleBrokerLookup and BrokerServiceLookupTest#testLookupConnectionNotCloseIfGetUnloadingExOrMetadataEx to cover the metric validation

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: dragosvictor#11

Copy link

@dragosvictor Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-label-missing doc-required Your PR changes impact docs and you will update later. and removed doc-label-missing labels Feb 15, 2024
@dragosvictor dragosvictor marked this pull request as ready for review February 16, 2024 00:46
@merlimat merlimat added this to the 3.3.0 milestone Feb 16, 2024
@codecov-commenter
Copy link

codecov-commenter commented Feb 16, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (beed0cf) 73.71% compared to head (1e9bc12) 73.59%.
Report is 2 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #22058      +/-   ##
============================================
- Coverage     73.71%   73.59%   -0.13%     
- Complexity    32116    32571     +455     
============================================
  Files          1874     1875       +1     
  Lines        139220   139276      +56     
  Branches      15260    15260              
============================================
- Hits         102628   102500     -128     
- Misses        28695    28858     +163     
- Partials       7897     7918      +21     
Flag Coverage Δ
inttests 24.68% <89.55%> (-0.16%) ⬇️
systests 24.29% <80.59%> (-0.22%) ⬇️
unittests 72.86% <92.53%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...n/java/org/apache/pulsar/broker/PulsarService.java 83.13% <100.00%> (ø)
...ache/pulsar/broker/namespace/NamespaceService.java 73.03% <100.00%> (+0.79%) ⬆️
...pulsar/broker/stats/PulsarBrokerOpenTelemetry.java 100.00% <100.00%> (+9.09%) ⬆️
...che/pulsar/opentelemetry/OpenTelemetryService.java 92.00% <ø> (ø)
...va/org/apache/pulsar/common/stats/MetricsUtil.java 50.00% <50.00%> (ø)
...rg/apache/pulsar/broker/service/BrokerService.java 81.38% <88.57%> (+0.26%) ⬆️

... and 75 files with indirect coverage changes

Copy link
Contributor

@asafm asafm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I liked your idea you explain in the comment you made on
pulsar.broker.topic.lookup.operation.pending.[usage,limit] - having its name align since you said this code is also called outside of AdminAPI.
If this is the case, it makes total sense to align it based on your suggestion.

Copy link
Contributor

@asafm asafm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@asafm
Copy link
Contributor

asafm commented Mar 5, 2024

@lhotari You're up next on the review wrap up :)

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good work @dragosvictor

@merlimat merlimat merged commit 4ff8600 into apache:master Mar 6, 2024
52 of 54 checks passed
@dragosvictor dragosvictor deleted the pip-264-topic-lookup-metrics branch March 6, 2024 20:02
@dragosvictor dragosvictor restored the pip-264-topic-lookup-metrics branch March 6, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required Your PR changes impact docs and you will update later. ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants