-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: envoy prometheus endpoint fails promlint #2597 #2757
Conversation
test/server/http/admin_test.cc
Outdated
std::vector<Stats::Tag> tags1; | ||
Stats::Tag tag2 = {"another_tag_name", "another_tag-value"}; | ||
tags1.push_back(tag2); | ||
Stats::HeapRawStatDataAllocator alloc1_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitiwari I think the issue is a little subtle, but I think this is because the allocators are being destructed before the stats themselves.
Fix: try declaring a single allocator as the first line in this test (to ensure it is the last thing to be destructed), and then use that allocator to allocate the data for all of the stats you create below.
Quick explanation of what's happening: the two lists counters
and gauges
are created before some of the allocators used to allocate the data for the CounterImpl
s and GuageImpl
s that they own. This will delay the destruction of these stats until after their allocators are destructed, which means that the CounterImpl
and GaugeImpl
destructors will try to call methods on an object that has already been destructed. The rule with these allocators is that they must outlive everything they allocate.
@mrice32 thanks. I had modified ut to create heap data allocator only once . UTS are passing now without needing any change in destructor, but still ref counting issue in HeapDataAllocator needs to be resolved since it is never incrementing or decrementing ref count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff. Thanks for your work on this! Mostly test and docs nits.
test/server/http/admin_test.cc
Outdated
@@ -293,5 +294,105 @@ TEST(PrometheusStatsFormatter, FormattedTags) { | |||
EXPECT_EQ(expected, actual); | |||
} | |||
|
|||
TEST(PrometheusStatsFormatter, statsAsPrometheus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you give these tests more descriptive names so we know what each is testing? For example, something like NameCollision
, etc?
test/server/http/admin_test.cc
Outdated
EXPECT_EQ(2UL, PrometheusStatsFormatter::statsAsPrometheus(counters, gauges, response)); | ||
} | ||
|
||
TEST(PrometheusStatsFormatter, statsAsPrometheus2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: statsAsPrometheus2
-> StatsAsPrometheus2
(same for the test above)
test/server/http/admin_test.cc
Outdated
std::vector<Stats::Tag> tags; | ||
Stats::Tag tag1 = {"a.tag-name", "a.tag-value"}; | ||
tags.push_back(tag1); | ||
Stats::HeapRawStatDataAllocator alloc_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: alloc_
-> alloc
test/server/http/admin_test.cc
Outdated
|
||
Stats::TagExtractorImpl tag_extractor("cluster_name", "^cluster\\.((.+?)\\.)"); | ||
std::string name = "cluster.test_cluster_1.upstream_cx_total"; | ||
std::vector<Stats::Tag> tags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the var_name1
and var_name2
are a little confusing to read when they go all the way to 4. Also, they occasionally don't line up (tag2
is inserted into tags1
). It might make things easier and more readable to add {}
around each group to scope it so you can use the same variable names (without the numbers) without worrying about accessing the same data. The only items that need to be declared outside the scopes should be alloc
, counters
, and gauges
. The other option would be to make a function for it.
test/server/http/admin_test.cc
Outdated
// different tag names. In the end there only two metric names | ||
// should be inserted, one for gauge and one for counter | ||
|
||
Stats::TagExtractorImpl tag_extractor("cluster_name", "^cluster\\.((.+?)\\.)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: are you using tag_extractor
?
test/server/http/admin_test.cc
Outdated
@@ -293,5 +294,105 @@ TEST(PrometheusStatsFormatter, FormattedTags) { | |||
EXPECT_EQ(expected, actual); | |||
} | |||
|
|||
TEST(PrometheusStatsFormatter, statsAsPrometheus) { | |||
|
|||
// create two clusters and two gauges with metric name being same same and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would suggest that you change the comment to the following (similar for the comment in the second test):
Create two counters and two gauges with each pair having the same name, but having different tag names and values. `statsAsPrometheus()` should return two implying it found two unique stat names.
static void statsAsPrometheus(const std::list<Stats::CounterSharedPtr>& counters, | ||
const std::list<Stats::GaugeSharedPtr>& gauges, | ||
Buffer::Instance& response); | ||
static uint64_t statsAsPrometheus(const std::list<Stats::CounterSharedPtr>& counters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please update the method documentation with @return uint64_t explanation of return value.
test/server/http/admin_test.cc
Outdated
// In the end there should be 4 metric names inserted 2 for gauge | ||
//`and 2 for counter metric names | ||
|
||
Stats::TagExtractorImpl tag_extractor("cluster_name", "^cluster\\.((.+?)\\.)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same as above - is this variable being used anywhere?
@mrice32 thanks i had updated the code based on review comments |
@pitiwari Make sure you sign off on all your commits to allow the DCO to pass (see https://github.com/probot/dco#how-it-works). Also, make sure you run the formatter on your code. You can do this with docker by running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! A few more nits, then we should be good to go.
source/server/http/admin.h
Outdated
@@ -248,11 +248,12 @@ class PrometheusStatsFormatter { | |||
public: | |||
/** | |||
* Extracts counters and gauges and relevant tags, appending them to | |||
* the response buffer after sanitizing the metric / label names. | |||
* the response buffer after sanitizing the metric / label names. Returns | |||
* total number of metric types inserted in response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please use @return
doxygen syntax:
/**
* Extracts counters and gauges and relevant tags, appending them to
* the response buffer after sanitizing the metric / label names.
* @return int64_t total number of metric types inserted in response.
*/
test/server/http/admin_test.cc
Outdated
|
||
TEST(PrometheusStatsFormatter, UniqueMetricName) { | ||
|
||
//Create two counters and two gauges with each pair having the different name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - can we rephrase to: "Create two counters and two gauges, all with unique names. statsAsPrometheus() should return four implying it found four unique stat names."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for working through this PR. Awesome work! Also, please remember to add a signoff to all of your commits: https://github.com/probot/dco#how-it-works.
|
||
// Create two counters and two gauges with each pair having the same name, | ||
// but having different tag names and values. | ||
//`statsAsPrometheus()` should return two implying it found two unique stat names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one more nit that I didn't notice the first few times through: please remove the backticks from statsAsPrometheus()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Just need to DCO-sign all the commits and I think it's ready.
looks like other than the DCO, @mrice32 is done with comments. @ggreenway or @htuch I think we can kick this up to you. |
rebased all the commits and added dco |
It looks like your DCO value doesn't match the author value in your commits, so the DCO-bot still isn't accepting it. Sorry for the pedantic-ness of the DCO-bot. Can you please make the two match? |
Description: The admin endpoint of envoy when queried as a prometheus endpoint fails promlint. Promlint is a sanity check tool offered by Prometheus to check the correctness of a prometheus endpoint. Signed-off-by: pitiwari <pitiwari@ebay.com>
seems like all tests passed, can we go ahead and commit ? |
source/server/http/admin.h
Outdated
@@ -249,10 +249,11 @@ class PrometheusStatsFormatter { | |||
/** | |||
* Extracts counters and gauges and relevant tags, appending them to | |||
* the response buffer after sanitizing the metric / label names. | |||
* @return int64_t total number of metric types inserted in response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/int64_t/uint64_t/
test/server/http/admin_test.cc
Outdated
Stats::Tag tag = {"a.tag-name", "a.tag-value"}; | ||
cluster_tags.push_back(tag); | ||
Stats::RawStatData* data = alloc.alloc(name); | ||
Stats::CounterSharedPtr c( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c = std::make_shared<...
(here and below)
test/server/http/admin_test.cc
Outdated
Stats::RawStatData* data = alloc.alloc(name); | ||
Stats::CounterSharedPtr c( | ||
new Stats::CounterImpl(*data, alloc, std::string(name), std::move(cluster_tags))); | ||
counters.push_back(c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario seems contrived. How does it happen in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counterImpl/GuageImpl are derived classes from counter/gauge class so i create two counters and two gauges and pass them with same or different metric names in two different tests and check if metric name was same then gauge/counter are inserted only once for each unique metric name . Promlint tool was failing due to same metric name printed multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand how this happens in the test, I'm curious though how this happens in actual Envoy use. How did you discover this? What were the offending counters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The envoy_listener_http_downstream_rq_xx counter appears many times, each time with a different set of tags and metrics that come with that tag (dimension).
For each upstream server you get 5 envoy_listener_http_downstream_rq_xx counters: one for each http responsecode (1, 2, 3, 4, 5) as the value for the "envoy_response_code_class" tag.
made changes based on review, please let me know if anything more needs to be changed after that i can squash all commits into single commit and push again |
Signed-off-by: pitiwari <pitiwari@ebay.com>
@htuch can you please have a look. I had updated code based on your review comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix, this is great.
Description: Fixes #2597"
The admin endpoint of envoy when queried as a prometheus endpoint fails promlint.
Promlint is a sanity check tool offered by Prometheus to check the correctness
of a prometheus endpoint.
following issues were fixed
-fixed issue with metric type being printed twice
-added unit test as well
Risk Level: Low