Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karapace metrics #652
Karapace metrics #652
Changes from 15 commits
e3cb524
32ce060
8dab84d
2898e31
c974579
7256f5d
ab6ae96
733d1f2
8751eea
53d3e4b
fedff8f
31d16d4
b70ae03
a0387a3
358facc
a064624
8533959
ac48829
90e221c
4c48576
f9cb6d8
765864b
073aa16
0c73a1a
c495c50
e3a89a1
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: On the naming of this module, we're already in the karapace namespace, let's simply name the module
metrics
? (Resulting name would bekarapace.metrics
instead of repeatingkarapace.karapacemetrics
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the threading needed only for the reading connection count? Maybe push the logic out from this class and implement which would be called from the reader thread. Consider also thread safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thread job operates at a frequency of every 10 seconds, which is a relatively resource-efficient approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@libretto Could you shed some further light on this question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, if Karapace is handling 100 requests per second, the function that retrieves the connection count would be called with every request. In my solution, we acquire this data every 10 seconds, which means significantly fewer calls (about 1000 times fewer). While this approach might not be as precise, it's effective enough for our purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aiven-anton @jjaakola-aiven could you check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counter, name as
request_size_total
.The use of Gauge tries to emulate the behavior of Prometheus counter? If StatsD is the backend I think the correct would be the counter, it does get reset to 0 on every flush. Prometheus counter does not, so there is definitely difference. If Prometheus support is added I think the
self.stats_client
would be a different implementation suitable for Prometheus. This comment applies to other counter comments too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear about the necessity of using "request_size_total" when our aim is to create a graph depicting request size over time. As far as I can see, there isn't a metric that requires a counter. As I know both Prometheus and StatsD have a similar gauge metric type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@libretto I think gauge simply doesn't make sense here?
The way to do that with prometheus is to graph
request_size_total / request_count
, I'm unsure what makes sense with StatsD.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to calculate "request_size_total / request_count," we would need to store "request_count" and perform the calculation for "request_size_total" on the karapace side. However, I noticed here your comment where You mentioned that this approach might not be advisable. Consequently, I modified the code in this manner because both Statsd and Prometheus can visualize this data without the need for internal calculations. Perhaps this approach may be less efficient than mine, but at least it avoids any calculations within karapace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counter, name as
response_size_total
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why "Total"? why counter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counter, name as
error_total
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our goal is to create a graph showing the request error rate, which represents the number of error requests per second. However, considering your previous advice that we shouldn't perform calculations or aggregation on the karapace side, I'm currently uncertain about whether to use a gauge or counter to achieve the desired rate graph as the output.
If we use
self.stats_client.increase("error_total", 1)
Do we have the possibility to create an error-rate graph with this data on statsd?