-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karapace metrics #652
Karapace metrics #652
Conversation
@tvainika just fixed more issues and added as many as possible metrics similar to competitive products. Also now I utilize native kafka metrics classes. |
Continuing discussion from original PR. If possible, in the future I'd ask that you please avoid closing and re-opening separate PRs like this, as it makes it quite hard to follow progress and review. If you need to "start over", please feel free to force push to your feature branch instead.
This does not sound accurate to me. In my experience when using Prometheus, it's much more common to avoid calculations on the application side, and expose metrics in their raw forms. Averages, and other aggregations are then produced by posing PromQL queries to the Prometheus service. It seems reasonable to take that route here as well, and for instance rather expose |
My client requires support for metrics that are already supported in competitor projects. Additionally, they anticipate the option to integrate with Prometheus in the future. Consequently, I have implemented the required support accordingly. The test results using Graphite, which I used for testing purposes, show the expected data output. For Prometheus integration, we have two options: utilizing the statsd-prometheus module or developing our own converter to send aggregated data to Prometheus. I believe Prometheus has the capability to handle aggregated data, allowing us to aggregate the metrics as required. |
@libretto Aggregated metrics are out of scope for Karapace, we are not going to accept PRs that implement that since we do not want to carry that maintenance burden. If you need 1-to-1 metrics compatibility, I'd suggest implementing this in a separate application where you can implement the exact aggregations you need. That said, we would consider a PR that adds to Karapace a Prometheus endpoint for exposing metrics. That should probably be implemented in a separate PR though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See prior comments.
@aiven-anton Are You in favor of allowing record calls to the KarapaceMetrics() class, but with the requirement of removing aggregation from the class? Does this mean You are not concerned about the impact on performance caused by the 10-11 stats request calls per karapace request to StatsD or Prometheus? |
What I am saying is that we should not calculate averages, maximums, and rates, and instead only expose raw metrics. This means I think the proposed metrics should be changed like this:
|
@aiven-anton Additionally, I'm planning to add support for Prometheus. The Prometheus client acts as an HTTP server that provides an endpoint for obtaining statistics periodically. Is this acceptable to You? Furthermore, I'm considering a better approach. One option is to inherit from the StatsClient or create an AbstractStatsClient class with interfaces. This way, both the StatsClient and PrometheusStatsClient can inherit from the same class, and my KarapaceMetrics class can utilize it as an interface. What are Your thoughts on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check
@aiven-anton I kindly request that you review my previous comment and provide an answer regarding the proposed Prometheus solution. Specifically, I would like to know if I can utilize the pull model, where the server is inside karapace, or if I need to use the push model and have clients request the use of Pushgateway between karapace and Prometheus. |
Sorry for the delay, I appreciate the update here 👍
Yes, but I strongly encourage doing this in a follow-up PR to keep the scope manageable and reviewable. And you're exactly right, with the Prometheus model we would not use a client, but expose a special metrics endpoint (probably on a separate port). But definitely I think that this is right approach, the Prometheus implementation should be a server that Prometheus "scrapes" from rather than the Pushgateway you mention as alternative.
Yes, it sounds reasonable to have some common interface, and define separate pluggable backends for Statsd and later Prometheus. I'd also encourage deferring the introduction of this until you work on the Prometheus implementation, as doing it beforehand is likely to end up with an abstraction that doesn't meet all requirements. |
@aiven-anton can You check the current PR and approve it yet? |
@aiven-anton Hi, could You review this code and merge it with the main branch? |
could You review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the PR.
karapace/karapacemetrics.py
Outdated
return | ||
if not isinstance(self.stats_client, StatsClient): | ||
raise RuntimeError("no StatsClient available") | ||
self.stats_client.gauge("request-size", size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counter, name as request_size_total
.
The use of Gauge tries to emulate the behavior of Prometheus counter? If StatsD is the backend I think the correct would be the counter, it does get reset to 0 on every flush. Prometheus counter does not, so there is definitely difference. If Prometheus support is added I think the self.stats_client
would be a different implementation suitable for Prometheus. This comment applies to other counter comments too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear about the necessity of using "request_size_total" when our aim is to create a graph depicting request size over time. As far as I can see, there isn't a metric that requires a counter. As I know both Prometheus and StatsD have a similar gauge metric type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@libretto I think gauge simply doesn't make sense here?
our aim is to create a graph depicting request size over time
The way to do that with prometheus is to graph request_size_total / request_count
, I'm unsure what makes sense with StatsD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to calculate "request_size_total / request_count," we would need to store "request_count" and perform the calculation for "request_size_total" on the karapace side. However, I noticed here your comment where You mentioned that this approach might not be advisable. Consequently, I modified the code in this manner because both Statsd and Prometheus can visualize this data without the need for internal calculations. Perhaps this approach may be less efficient than mine, but at least it avoids any calculations within karapace.
karapace/karapacemetrics.py
Outdated
""" | ||
from __future__ import annotations | ||
|
||
from kafka.metrics import Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
karapace/karapacemetrics.py
Outdated
schedule.every(10).seconds.do(self.connections) | ||
self.worker_thread.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the threading needed only for the reading connection count? Maybe push the logic out from this class and implement which would be called from the reader thread. Consider also thread safety.
def connections(self, connections: int) -> None:
...
self.stats_client.gauge("connections", connections)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thread job operates at a frequency of every 10 seconds, which is a relatively resource-efficient approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@libretto Could you shed some further light on this question?
Is the threading needed only for the reading connection count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, if Karapace is handling 100 requests per second, the function that retrieves the connection count would be called with every request. In my solution, we acquire this data every 10 seconds, which means significantly fewer calls (about 1000 times fewer). While this approach might not be as precise, it's effective enough for our purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aiven-anton @jjaakola-aiven could you check?
karapace/karapacemetrics.py
Outdated
@@ -0,0 +1,119 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: On the naming of this module, we're already in the karapace namespace, let's simply name the module metrics
? (Resulting name would be karapace.metrics
instead of repeating karapace.karapacemetrics
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
karapace/karapacemetrics.py
Outdated
schedule.every(10).seconds.do(self.connections) | ||
self.worker_thread.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@libretto Could you shed some further light on this question?
Is the threading needed only for the reading connection count?
karapace/karapacemetrics.py
Outdated
return | ||
if not isinstance(self.stats_client, StatsClient): | ||
raise RuntimeError("no StatsClient available") | ||
self.stats_client.gauge("request-size", size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@libretto I think gauge simply doesn't make sense here?
our aim is to create a graph depicting request size over time
The way to do that with prometheus is to graph request_size_total / request_count
, I'm unsure what makes sense with StatsD.
This is superseded by #711. |
This addon enhances the statsd metrics available for monitoring the karapace server. We have now included the following additional metrics:
The aiohttp library does not provide native support for metrics. Therefore, we will make an effort to gather the required data by collecting statistics from the request handler, aiohttp logs, and various other sections of the karapace code.
We have plans to enhance the metrics support in this module by adding additional metrics.
My objective is to incorporate metrics that are similar to what can be found in competitor products. To achieve this, I am utilizing the calculation methods from kafka.metrics.Metrics class.
Karapace competitors use sampled metrics and perform calculations for avg, min, max on the application side in order to facilitate Pull-based statistics services. This particular model is commonly utilized in platforms like Prometheus, and it is also the preferred choice for our client. However, at present, karapace only supports StatsD. Nevertheless, I anticipate that customers who actively rely on Pull-based statistics will likely require support for such services, and therefore, integrating them into our metrics class would be necessary to meet their needs.