-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add observability metrics for CommandPartitionedTopicMetadata requests #18243
Comments
currently, we have metadata store metrics, if it could meet your needs, I'd like to handle the issue. |
How are metadata store metrics used currently? I think it could be a breaking change if CommandPartitionedTopicMetadata requests are tracked as part of some other metric. I think it should be a new metric that is unique for CommandPartitionedTopicMetadata requests. @codelipenghui do you have a suggestion? |
The metadata store metrics are on the metadata store level which can provide the metastore operation latency. The REST API request metrics should be a separate part. The CommandPartitionedTopicMetadata requests metrics should not 100% equal to the metadata store operation. Maybe the jetty thread is blocked somewhere. I think maybe jetty already provides the ability to expose the metrics with the request path label? |
@codelipenghui @lhotari There are 2 ways to get PartitionedTopicMetadata, one is |
I've checked jetty, seems there is no such ability. |
@lhotari @codelipenghui PTAL #18281 |
The PIP discuss thread: https://lists.apache.org/thread/sybl4nno4503w42hzt7b5lsyk6m2rbo6 |
The issue had no activity for 30 days, mark with Stale label. |
Search before asking
Motivation
Currently, there's no way to track CommandPartitionedTopicMetadata requests. There's no metrics or logs that indicate that a broker is handling CommandPartitionedTopicMetadata requests.
Misconfigured clients might flood brokers with CommandPartitionedTopicMetadata requests and cause high CPU consumption.
One example of this is misconfiguration of splunk-otel-collector's Pulsar exporter. The example config configures pulsar-client-go's PartitionsAutoDiscoveryInterval setting to 1 nanosecond. I have sent a PR to fix the example config with signalfx/splunk-otel-collector#2185 . This example shows that it's easy to mix the units and misconfigure a Pulsar client.
Solution
Add observability metrics for CommandPartitionedTopicMetadata requests, similar to what there is for lookup requests added by #8272.
Alternatives
No response
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: