Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: envoy prometheus endpoint fails promlint #2597 #2757

Merged
merged 2 commits into from
Mar 16, 2018

Conversation

pitiwari
Copy link
Contributor

@pitiwari pitiwari commented Mar 8, 2018

Description: Fixes #2597"

The admin endpoint of envoy when queried as a prometheus endpoint fails promlint.
Promlint is a sanity check tool offered by Prometheus to check the correctness
of a prometheus endpoint.

following issues were fixed
-fixed issue with metric type being printed twice
-added unit test as well

Risk Level: Low

std::vector<Stats::Tag> tags1;
Stats::Tag tag2 = {"another_tag_name", "another_tag-value"};
tags1.push_back(tag2);
Stats::HeapRawStatDataAllocator alloc1_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitiwari I think the issue is a little subtle, but I think this is because the allocators are being destructed before the stats themselves.

Fix: try declaring a single allocator as the first line in this test (to ensure it is the last thing to be destructed), and then use that allocator to allocate the data for all of the stats you create below.

Quick explanation of what's happening: the two lists counters and gauges are created before some of the allocators used to allocate the data for the CounterImpls and GuageImpls that they own. This will delay the destruction of these stats until after their allocators are destructed, which means that the CounterImpl and GaugeImpl destructors will try to call methods on an object that has already been destructed. The rule with these allocators is that they must outlive everything they allocate.

@pitiwari
Copy link
Contributor Author

pitiwari commented Mar 8, 2018

@mrice32 thanks. I had modified ut to create heap data allocator only once . UTS are passing now without needing any change in destructor, but still ref counting issue in HeapDataAllocator needs to be resolved since it is never incrementing or decrementing ref count.

@mrice32
Copy link
Member

mrice32 commented Mar 8, 2018

@pitiwari, agreed. IIUC, that's being tracked #2453. Are you interested in tackling that in #2752? If so, that'd be awesome!

@pitiwari
Copy link
Contributor Author

pitiwari commented Mar 8, 2018

@mrice32 i can start looking into #2752 (#2453) . I have some other tasks in a plate so might take a week or two to fix it. In the meantime please review this pr and update with your comments

Copy link
Member

@mrice32 mrice32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff. Thanks for your work on this! Mostly test and docs nits.

@@ -293,5 +294,105 @@ TEST(PrometheusStatsFormatter, FormattedTags) {
EXPECT_EQ(expected, actual);
}

TEST(PrometheusStatsFormatter, statsAsPrometheus) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you give these tests more descriptive names so we know what each is testing? For example, something like NameCollision, etc?

EXPECT_EQ(2UL, PrometheusStatsFormatter::statsAsPrometheus(counters, gauges, response));
}

TEST(PrometheusStatsFormatter, statsAsPrometheus2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: statsAsPrometheus2 -> StatsAsPrometheus2 (same for the test above)

std::vector<Stats::Tag> tags;
Stats::Tag tag1 = {"a.tag-name", "a.tag-value"};
tags.push_back(tag1);
Stats::HeapRawStatDataAllocator alloc_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: alloc_ -> alloc


Stats::TagExtractorImpl tag_extractor("cluster_name", "^cluster\\.((.+?)\\.)");
std::string name = "cluster.test_cluster_1.upstream_cx_total";
std::vector<Stats::Tag> tags;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the var_name1 and var_name2 are a little confusing to read when they go all the way to 4. Also, they occasionally don't line up (tag2 is inserted into tags1). It might make things easier and more readable to add {} around each group to scope it so you can use the same variable names (without the numbers) without worrying about accessing the same data. The only items that need to be declared outside the scopes should be alloc, counters, and gauges. The other option would be to make a function for it.

// different tag names. In the end there only two metric names
// should be inserted, one for gauge and one for counter

Stats::TagExtractorImpl tag_extractor("cluster_name", "^cluster\\.((.+?)\\.)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are you using tag_extractor?

@@ -293,5 +294,105 @@ TEST(PrometheusStatsFormatter, FormattedTags) {
EXPECT_EQ(expected, actual);
}

TEST(PrometheusStatsFormatter, statsAsPrometheus) {

// create two clusters and two gauges with metric name being same same and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would suggest that you change the comment to the following (similar for the comment in the second test):

Create two counters and two gauges with each pair having the same name, but having different tag names and values.  `statsAsPrometheus()` should return two implying it found two unique stat names.

static void statsAsPrometheus(const std::list<Stats::CounterSharedPtr>& counters,
const std::list<Stats::GaugeSharedPtr>& gauges,
Buffer::Instance& response);
static uint64_t statsAsPrometheus(const std::list<Stats::CounterSharedPtr>& counters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please update the method documentation with @return uint64_t explanation of return value.

// In the end there should be 4 metric names inserted 2 for gauge
//`and 2 for counter metric names

Stats::TagExtractorImpl tag_extractor("cluster_name", "^cluster\\.((.+?)\\.)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as above - is this variable being used anywhere?

@mrice32
Copy link
Member

mrice32 commented Mar 9, 2018

@pitiwari side note: if you add "Fixes #2597" in the description of the PR, it will automatically close the issue when it gets merged.

@pitiwari
Copy link
Contributor Author

pitiwari commented Mar 9, 2018

@mrice32 thanks i had updated the code based on review comments

@mrice32
Copy link
Member

mrice32 commented Mar 9, 2018

@pitiwari Make sure you sign off on all your commits to allow the DCO to pass (see https://github.com/probot/dco#how-it-works). Also, make sure you run the formatter on your code. You can do this with docker by running ./ci/run_envoy_docker.sh './ci/do_ci.sh fix_format'

Copy link
Member

@mrice32 mrice32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few more nits, then we should be good to go.

@@ -248,11 +248,12 @@ class PrometheusStatsFormatter {
public:
/**
* Extracts counters and gauges and relevant tags, appending them to
* the response buffer after sanitizing the metric / label names.
* the response buffer after sanitizing the metric / label names. Returns
* total number of metric types inserted in response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please use @return doxygen syntax:

/**
 * Extracts counters and gauges and relevant tags, appending them to
 * the response buffer after sanitizing the metric / label names. 
 * @return int64_t total number of metric types inserted in response.
 */


TEST(PrometheusStatsFormatter, UniqueMetricName) {

//Create two counters and two gauges with each pair having the different name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - can we rephrase to: "Create two counters and two gauges, all with unique names. statsAsPrometheus() should return four implying it found four unique stat names."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

Copy link
Member

@mrice32 mrice32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working through this PR. Awesome work! Also, please remember to add a signoff to all of your commits: https://github.com/probot/dco#how-it-works.


// Create two counters and two gauges with each pair having the same name,
// but having different tag names and values.
//`statsAsPrometheus()` should return two implying it found two unique stat names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one more nit that I didn't notice the first few times through: please remove the backticks from statsAsPrometheus().

ggreenway
ggreenway previously approved these changes Mar 12, 2018
Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Just need to DCO-sign all the commits and I think it's ready.

@junr03
Copy link
Member

junr03 commented Mar 12, 2018

looks like other than the DCO, @mrice32 is done with comments. @ggreenway or @htuch I think we can kick this up to you.

@ggreenway
Copy link
Contributor

@pitiwari
Copy link
Contributor Author

rebased all the commits and added dco

@ggreenway
Copy link
Contributor

It looks like your DCO value doesn't match the author value in your commits, so the DCO-bot still isn't accepting it. Sorry for the pedantic-ness of the DCO-bot. Can you please make the two match?

Description:

The admin endpoint of envoy when queried as a prometheus endpoint fails promlint.
Promlint is a sanity check tool offered by Prometheus to check the correctness
of a prometheus endpoint.

Signed-off-by: pitiwari <pitiwari@ebay.com>
@pitiwari
Copy link
Contributor Author

seems like all tests passed, can we go ahead and commit ?

ggreenway
ggreenway previously approved these changes Mar 12, 2018
@ggreenway
Copy link
Contributor

@pitiwari Thanks for making the DCO-bot happy. I'm going to let @htuch do a final review pass and merge.

@@ -249,10 +249,11 @@ class PrometheusStatsFormatter {
/**
* Extracts counters and gauges and relevant tags, appending them to
* the response buffer after sanitizing the metric / label names.
* @return int64_t total number of metric types inserted in response.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/int64_t/uint64_t/

Stats::Tag tag = {"a.tag-name", "a.tag-value"};
cluster_tags.push_back(tag);
Stats::RawStatData* data = alloc.alloc(name);
Stats::CounterSharedPtr c(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c = std::make_shared<... (here and below)

Stats::RawStatData* data = alloc.alloc(name);
Stats::CounterSharedPtr c(
new Stats::CounterImpl(*data, alloc, std::string(name), std::move(cluster_tags)));
counters.push_back(c);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario seems contrived. How does it happen in practice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counterImpl/GuageImpl are derived classes from counter/gauge class so i create two counters and two gauges and pass them with same or different metric names in two different tests and check if metric name was same then gauge/counter are inserted only once for each unique metric name . Promlint tool was failing due to same metric name printed multiple times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand how this happens in the test, I'm curious though how this happens in actual Envoy use. How did you discover this? What were the offending counters?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The envoy_listener_http_downstream_rq_xx counter appears many times, each time with a different set of tags and metrics that come with that tag (dimension).

For each upstream server you get 5 envoy_listener_http_downstream_rq_xx counters: one for each http responsecode (1, 2, 3, 4, 5) as the value for the "envoy_response_code_class" tag.

@pitiwari
Copy link
Contributor Author

pitiwari commented Mar 13, 2018

made changes based on review, please let me know if anything more needs to be changed after that i can squash all commits into single commit and push again

Signed-off-by: pitiwari <pitiwari@ebay.com>
@pitiwari
Copy link
Contributor Author

@htuch can you please have a look. I had updated code based on your review comments

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, this is great.

@htuch htuch merged commit 4d5c1a6 into envoyproxy:master Mar 16, 2018
Shikugawa pushed a commit to Shikugawa/envoy that referenced this pull request Mar 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

envoy prometheus endpoint fails promlint
6 participants