Skip to content

Commit

Permalink
admin: Add support for prometheus summary metrics (envoyproxy#30479)
Browse files Browse the repository at this point in the history
Commit Message: Add support for prometheus summary metrics on the admin endpoint
Additional Description: Adds support emitting prometheus "summary" metrics for the internal histogram quantiles by supplying a query parameter. Multiple modes are supported, as in envoyproxy#25812, and can be either histogram, summary, or histogram,summary.
Risk Level: Low, no changes to existing default behavior
Testing: Added unit tests for histogram, summary, and summary+histogram emission
Docs Changes: Added documentation to the admin home page, and to the published admin docs around an optional query parameter.
Release Notes: Added a note in the small_feature section.

Fixes envoyproxy#30471

Signed-off-by: Andy Bradshaw <abradshaw@palantir.com>
  • Loading branch information
andybradshaw authored and alyssawilk committed Apr 29, 2024
1 parent ba25ac6 commit d0309b8
Show file tree
Hide file tree
Showing 17 changed files with 358 additions and 27 deletions.
4 changes: 4 additions & 0 deletions changelogs/current.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,10 @@ new_features:
- area: aws
change: |
Update credential_provider utility to support EKS Pod Identity provided via token file.
- area: admin
change: |
The ``/stats/prometheus`` endpoint can now emit prometheus ``summary`` metric types by explicitly setting the
``histogram_buckets`` query parameter to ``summary``.
deprecated:
- area: listener
Expand Down
53 changes: 53 additions & 0 deletions docs/root/operations/admin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -770,6 +770,59 @@ modify different aspects of the server:
Text readout stats create a new label value every time the value
of the text readout stat changes, which could create an unbounded number of time series.

.. http:get:: /stats?format=prometheus&histogram_buckets=summary
Optional ``histogram_buckets`` query parameter is used to control how histogram metrics get reported.
If unset, histograms get reported as the "histogram" prometheus metric type, but can also be used to
emit prometheus "summary" metrics if set to ``summary``. Each emitted summary is over the interval
of the last :ref:`stats_flush_interval <envoy_v3_api_field_config.bootstrap.v3.Bootstrap.stats_flush_interval>`.

Example histogram output:

.. code-block:: text
# TYPE envoy_server_initialization_time_ms histogram
envoy_server_initialization_time_ms_bucket{le="0.5"} 0
envoy_server_initialization_time_ms_bucket{le="1"} 0
envoy_server_initialization_time_ms_bucket{le="5"} 0
envoy_server_initialization_time_ms_bucket{le="10"} 0
envoy_server_initialization_time_ms_bucket{le="25"} 0
envoy_server_initialization_time_ms_bucket{le="50"} 0
envoy_server_initialization_time_ms_bucket{le="100"} 0
envoy_server_initialization_time_ms_bucket{le="250"} 1
envoy_server_initialization_time_ms_bucket{le="500"} 1
envoy_server_initialization_time_ms_bucket{le="1000"} 1
envoy_server_initialization_time_ms_bucket{le="2500"} 1
envoy_server_initialization_time_ms_bucket{le="5000"} 1
envoy_server_initialization_time_ms_bucket{le="10000"} 1
envoy_server_initialization_time_ms_bucket{le="30000"} 1
envoy_server_initialization_time_ms_bucket{le="60000"} 1
envoy_server_initialization_time_ms_bucket{le="300000"} 1
envoy_server_initialization_time_ms_bucket{le="600000"} 1
envoy_server_initialization_time_ms_bucket{le="1800000"} 1
envoy_server_initialization_time_ms_bucket{le="3600000"} 1
envoy_server_initialization_time_ms_bucket{le="+Inf"} 1
envoy_server_initialization_time_ms_sum{} 115.000000000000014210854715202
envoy_server_initialization_time_ms_count{} 1
Example summary output:

.. code-block:: text
# TYPE envoy_server_initialization_time_ms summary
envoy_server_initialization_time_ms{quantile="0"} 110.00000000000001
envoy_server_initialization_time_ms{quantile="0.25"} 112.50000000000001
envoy_server_initialization_time_ms{quantile="0.5"} 115.00000000000001
envoy_server_initialization_time_ms{quantile="0.75"} 117.50000000000001
envoy_server_initialization_time_ms{quantile="0.9"} 119.00000000000001
envoy_server_initialization_time_ms{quantile="0.95"} 119.50000000000001
envoy_server_initialization_time_ms{quantile="0.99"} 119.90000000000002
envoy_server_initialization_time_ms{quantile="0.995"} 119.95000000000002
envoy_server_initialization_time_ms{quantile="0.999"} 119.99000000000001
envoy_server_initialization_time_ms{quantile="1"} 120.00000000000001
envoy_server_initialization_time_ms_sum{} 115.000000000000014210854715202
envoy_server_initialization_time_ms_count{} 1
.. http:get:: /stats/recentlookups
This endpoint helps Envoy developers debug potential contention
Expand Down
6 changes: 5 additions & 1 deletion source/server/admin/admin.cc
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,11 @@ AdminImpl::AdminImpl(const std::string& profile_path, Server::Instance& server,
"Render text_readouts as new gaugues with value 0 (increases Prometheus "
"data size)"},
{ParamDescriptor::Type::String, "filter",
"Regular expression (Google re2) for filtering stats"}}),
"Regular expression (Google re2) for filtering stats"},
{ParamDescriptor::Type::Enum,
"histogram_buckets",
"Histogram bucket display mode",
{"cumulative", "summary"}}}),
makeHandler("/stats/recentlookups", "Show recent stat-name lookups",
MAKE_ADMIN_HANDLER(stats_handler_.handlerStatsRecentLookups), false, false),
makeHandler("/stats/recentlookups/clear", "clear list of stat-name lookups and counter",
Expand Down
66 changes: 64 additions & 2 deletions source/server/admin/prometheus_stats.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#include "source/server/admin/prometheus_stats.h"

#include <cmath>

#include "source/common/common/empty_string.h"
#include "source/common/common/macros.h"
#include "source/common/common/regex.h"
Expand Down Expand Up @@ -274,6 +276,35 @@ uint64_t outputPrimitiveStatType(Buffer::Instance& response, const StatsParams&
return result;
}

/*
* Returns the prometheus output for a summary. The output is a multi-line string (with embedded
* newlines) that contains all the individual quantile values and sum/count for a single histogram
* (metric_name plus all tags).
*/
std::string generateSummaryOutput(const Stats::ParentHistogram& histogram,
const std::string& prefixed_tag_extracted_name) {
const std::string tags = PrometheusStatsFormatter::formattedTags(histogram.tags());
const std::string hist_tags = histogram.tags().empty() ? EMPTY_STRING : (tags + ",");

const Stats::HistogramStatistics& stats = histogram.intervalStatistics();
Stats::ConstSupportedBuckets& supported_quantiles = stats.supportedQuantiles();
const std::vector<double>& computed_quantiles = stats.computedQuantiles();
std::string output;
for (size_t i = 0; i < supported_quantiles.size(); ++i) {
double quantile = supported_quantiles[i];
double value = computed_quantiles[i];
output.append(fmt::format("{0}{{{1}quantile=\"{2}\"}} {3:.32g}\n", prefixed_tag_extracted_name,
hist_tags, quantile, value));
}

output.append(fmt::format("{0}_sum{{{1}}} {2:.32g}\n", prefixed_tag_extracted_name, tags,
stats.sampleSum()));
output.append(fmt::format("{0}_count{{{1}}} {2}\n", prefixed_tag_extracted_name, tags,
stats.sampleCount()));

return output;
};

} // namespace

std::string PrometheusStatsFormatter::formattedTags(const std::vector<Stats::Tag>& tags) {
Expand All @@ -285,6 +316,22 @@ std::string PrometheusStatsFormatter::formattedTags(const std::vector<Stats::Tag
return absl::StrJoin(buf, ",");
}

absl::Status PrometheusStatsFormatter::validateParams(const StatsParams& params) {
absl::Status result;
switch (params.histogram_buckets_mode_) {
case Utility::HistogramBucketsMode::Summary:
case Utility::HistogramBucketsMode::Unset:
case Utility::HistogramBucketsMode::Cumulative:
result = absl::OkStatus();
break;
case Utility::HistogramBucketsMode::Detailed:
case Utility::HistogramBucketsMode::Disjoint:
result = absl::InvalidArgumentError("unsupported prometheus histogram bucket mode");
break;
}
return result;
}

absl::optional<std::string>
PrometheusStatsFormatter::metricName(const std::string& extracted_name,
const Stats::CustomStatNamespaces& custom_namespaces) {
Expand Down Expand Up @@ -332,8 +379,23 @@ uint64_t PrometheusStatsFormatter::statsAsPrometheus(
metric_name_count += outputStatType<Stats::TextReadout>(
response, params, text_readouts, generateTextReadoutOutput, "gauge", custom_namespaces);

metric_name_count += outputStatType<Stats::ParentHistogram>(
response, params, histograms, generateHistogramOutput, "histogram", custom_namespaces);
// validation of bucket modes is handled separately
switch (params.histogram_buckets_mode_) {
case Utility::HistogramBucketsMode::Summary:
metric_name_count += outputStatType<Stats::ParentHistogram>(
response, params, histograms, generateSummaryOutput, "summary", custom_namespaces);
break;
case Utility::HistogramBucketsMode::Unset:
case Utility::HistogramBucketsMode::Cumulative:
metric_name_count += outputStatType<Stats::ParentHistogram>(
response, params, histograms, generateHistogramOutput, "histogram", custom_namespaces);
break;
// "Detailed" and "Disjoint" don't make sense for prometheus histogram semantics
case Utility::HistogramBucketsMode::Detailed:
case Utility::HistogramBucketsMode::Disjoint:
IS_ENVOY_BUG("unsupported prometheus histogram bucket mode");
break;
}

// Note: This assumes that there is no overlap in stat name between per-endpoint stats and all
// other stats. If this is not true, then the counters/gauges for per-endpoint need to be combined
Expand Down
5 changes: 5 additions & 0 deletions source/server/admin/prometheus_stats.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ class PrometheusStatsFormatter {
*/
static std::string formattedTags(const std::vector<Stats::Tag>& tags);

/**
* Validate the given params, returning an error on invalid arguments
*/
static absl::Status validateParams(const StatsParams& params);

/**
* Format the given metric name, and prefixed with "envoy_" if it does not have a custom
* stat namespace. If it has a custom stat namespace AND the name without the custom namespace
Expand Down
16 changes: 11 additions & 5 deletions source/server/admin/stats_handler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Admin::RequestPtr StatsHandler::makeRequest(AdminStream& admin_stream) {
// Ideally we'd find a way to do this without slowing down
// the non-Prometheus implementations.
Buffer::OwnedImpl response;
prometheusFlushAndRender(params, response);
Http::Code code = prometheusFlushAndRender(params, response);
return Admin::makeStaticTextRequest(response, code);
}

Expand Down Expand Up @@ -131,16 +131,22 @@ Http::Code StatsHandler::prometheusStats(absl::string_view path_and_query,
server_.flushStats();
}

prometheusFlushAndRender(params, response);
return Http::Code::OK;
return prometheusFlushAndRender(params, response);
}

void StatsHandler::prometheusFlushAndRender(const StatsParams& params, Buffer::Instance& response) {
Http::Code StatsHandler::prometheusFlushAndRender(const StatsParams& params,
Buffer::Instance& response) {
absl::Status paramsStatus = PrometheusStatsFormatter::validateParams(params);
if (!paramsStatus.ok()) {
response.add(paramsStatus.message());
return Http::Code::BadRequest;
}
if (server_.statsConfig().flushOnAdmin()) {
server_.flushStats();
}
prometheusRender(server_.stats(), server_.api().customStatNamespaces(), server_.clusterManager(),
params, response);
return Http::Code::OK;
}

void StatsHandler::prometheusRender(Stats::Store& stats,
Expand Down Expand Up @@ -180,7 +186,7 @@ Admin::UrlHandler StatsHandler::statsHandler(bool active_mode) {
Admin::ParamDescriptor histogram_buckets{Admin::ParamDescriptor::Type::Enum,
"histogram_buckets",
"Histogram bucket display mode",
{"cumulative", "disjoint", "detailed", "none"}};
{"cumulative", "disjoint", "detailed", "summary"}};
Admin::ParamDescriptor format{Admin::ParamDescriptor::Type::Enum,
"format",
"Format to use",
Expand Down
2 changes: 1 addition & 1 deletion source/server/admin/stats_handler.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ class StatsHandler : public HandlerContextBase {
* @params params the already-parsed parameters.
* @param response buffer into which to write response
*/
void prometheusFlushAndRender(const StatsParams& params, Buffer::Instance& response);
Http::Code prometheusFlushAndRender(const StatsParams& params, Buffer::Instance& response);

/**
* Renders the stats as prometheus. This is broken out as a separately
Expand Down
2 changes: 1 addition & 1 deletion source/server/admin/stats_params.h
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ struct StatsParams {
HiddenFlag hidden_{HiddenFlag::Exclude};
std::string filter_string_;
std::shared_ptr<re2::RE2> re2_filter_;
Utility::HistogramBucketsMode histogram_buckets_mode_{Utility::HistogramBucketsMode::NoBuckets};
Utility::HistogramBucketsMode histogram_buckets_mode_{Utility::HistogramBucketsMode::Unset};
Http::Utility::QueryParamsMulti query_;

/**
Expand Down
9 changes: 6 additions & 3 deletions source/server/admin/stats_render.cc
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ void StatsTextRender::generate(Buffer::Instance& response, const std::string& na
}

switch (histogram_buckets_mode_) {
case Utility::HistogramBucketsMode::NoBuckets:
case Utility::HistogramBucketsMode::Unset:
case Utility::HistogramBucketsMode::Summary:
response.addFragments({name, ": ", histogram.quantileSummary(), "\n"});
break;
case Utility::HistogramBucketsMode::Cumulative:
Expand Down Expand Up @@ -153,7 +154,8 @@ void StatsJsonRender::generate(Buffer::Instance& response, const std::string& na
}

switch (histogram_buckets_mode_) {
case Utility::HistogramBucketsMode::NoBuckets: {
case Utility::HistogramBucketsMode::Unset:
case Utility::HistogramBucketsMode::Summary: {
Json::Streamer::MapPtr map = json_->histogram_array_->addMap();
map->addEntries({{"name", name}});
map->addKey("values");
Expand Down Expand Up @@ -219,7 +221,8 @@ void StatsJsonRender::renderHistogramStart() {
json_->histogram_map2_->addKey("details");
json_->histogram_array_ = json_->histogram_map2_->addArray();
break;
case Utility::HistogramBucketsMode::NoBuckets:
case Utility::HistogramBucketsMode::Unset:
case Utility::HistogramBucketsMode::Summary:
json_->histogram_map2_ = json_->histogram_map1_->addMap();
json_->histogram_map2_->addKey("supported_quantiles");
{ populateSupportedPercentiles(*json_->histogram_map2_->addArray()); }
Expand Down
15 changes: 10 additions & 5 deletions source/server/admin/utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -24,23 +24,28 @@ void populateFallbackResponseHeaders(Http::Code code, Http::ResponseHeaderMap& h
Http::Headers::get().XContentTypeOptionValues.Nosniff);
}

// Helper method to get the histogram_buckets parameter. Returns false if histogram_buckets query
// param is found and value is not "cumulative" or "disjoint", true otherwise.
// Helper method to get the histogram_buckets parameter. Returns an InvalidArgumentError
// if histogram_buckets query param is found and value is not "cumulative" or "disjoint",
// Ok otherwise.
absl::Status histogramBucketsParam(const Http::Utility::QueryParamsMulti& params,
HistogramBucketsMode& histogram_buckets_mode) {
absl::optional<std::string> histogram_buckets_query_param =
nonEmptyQueryParam(params, "histogram_buckets");
histogram_buckets_mode = HistogramBucketsMode::NoBuckets;
histogram_buckets_mode = HistogramBucketsMode::Unset;
if (histogram_buckets_query_param.has_value()) {
if (histogram_buckets_query_param.value() == "cumulative") {
histogram_buckets_mode = HistogramBucketsMode::Cumulative;
} else if (histogram_buckets_query_param.value() == "disjoint") {
histogram_buckets_mode = HistogramBucketsMode::Disjoint;
} else if (histogram_buckets_query_param.value() == "detailed") {
histogram_buckets_mode = HistogramBucketsMode::Detailed;
} else if (histogram_buckets_query_param.value() != "none") {
// "none" is a synonym for "summary", and exists to maintain backwards compatibility
} else if (histogram_buckets_query_param.value() == "summary" ||
histogram_buckets_query_param.value() == "none") {
histogram_buckets_mode = HistogramBucketsMode::Summary;
} else {
return absl::InvalidArgumentError(
"usage: /stats?histogram_buckets=(cumulative|disjoint|none)\n");
"usage: /stats?histogram_buckets=(cumulative|disjoint|detailed|summary)\n");
}
}
return absl::OkStatus();
Expand Down
5 changes: 4 additions & 1 deletion source/server/admin/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,10 @@ namespace Envoy {
namespace Server {
namespace Utility {

enum class HistogramBucketsMode { NoBuckets, Cumulative, Disjoint, Detailed };
// HistogramBucketsMode determines how histogram statistics get reported. Not
// all modes are supported for all formats, with the "Unset" variant allowing
// different formats to have different default behavior.
enum class HistogramBucketsMode { Unset, Summary, Cumulative, Disjoint, Detailed };

void populateFallbackResponseHeaders(Http::Code code, Http::ResponseHeaderMap& header_map);

Expand Down
3 changes: 2 additions & 1 deletion test/server/admin/admin_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -181,11 +181,12 @@ TEST_P(AdminInstanceTest, Help) {
filter: Regular expression (Google re2) for filtering stats
format: Format to use; One of (html, active-html, text, json)
type: Stat types to include.; One of (All, Counters, Histograms, Gauges, TextReadouts)
histogram_buckets: Histogram bucket display mode; One of (cumulative, disjoint, detailed, none)
histogram_buckets: Histogram bucket display mode; One of (cumulative, disjoint, detailed, summary)
/stats/prometheus: print server stats in prometheus format
usedonly: Only include stats that have been written by system since restart
text_readouts: Render text_readouts as new gaugues with value 0 (increases Prometheus data size)
filter: Regular expression (Google re2) for filtering stats
histogram_buckets: Histogram bucket display mode; One of (cumulative, summary)
/stats/recentlookups: Show recent stat-name lookups
/stats/recentlookups/clear (POST): clear list of stat-name lookups and counter
/stats/recentlookups/disable (POST): disable recording of reset stat-name lookup names
Expand Down
35 changes: 35 additions & 0 deletions test/server/admin/prometheus_stats_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,41 @@ envoy_histogram1_count{} 0
EXPECT_EQ(expected_output, response.toString());
}

TEST_F(PrometheusStatsFormatterTest, SummaryWithNoValuesAndNoTags) {
Stats::CustomStatNamespacesImpl custom_namespaces;
HistogramWrapper h1_interval;
Stats::HistogramStatisticsImpl h1_interval_statistics(h1_interval.getHistogram());

auto histogram = makeHistogram("histogram1", {});
ON_CALL(*histogram, intervalStatistics()).WillByDefault(ReturnRef(h1_interval_statistics));

addHistogram(histogram);
StatsParams params = StatsParams();
params.histogram_buckets_mode_ = Utility::HistogramBucketsMode::Summary;
Buffer::OwnedImpl response;
const uint64_t size = PrometheusStatsFormatter::statsAsPrometheus(
counters_, gauges_, histograms_, textReadouts_, endpoints_helper_->cm_, response, params,
custom_namespaces);
EXPECT_EQ(1UL, size);

const std::string expected_output = R"EOF(# TYPE envoy_histogram1 summary
envoy_histogram1{quantile="0"} nan
envoy_histogram1{quantile="0.25"} nan
envoy_histogram1{quantile="0.5"} nan
envoy_histogram1{quantile="0.75"} nan
envoy_histogram1{quantile="0.9"} nan
envoy_histogram1{quantile="0.95"} nan
envoy_histogram1{quantile="0.99"} nan
envoy_histogram1{quantile="0.995"} nan
envoy_histogram1{quantile="0.999"} nan
envoy_histogram1{quantile="1"} nan
envoy_histogram1_sum{} 0
envoy_histogram1_count{} 0
)EOF";

EXPECT_EQ(expected_output, response.toString());
}

// Replicate bug https://github.com/envoyproxy/envoy/issues/27173 which fails to
// coalesce stats in different scopes with the same tag-extracted-name.
TEST_F(PrometheusStatsFormatterTest, DifferentNamedScopeSameStat) {
Expand Down
Loading

0 comments on commit d0309b8

Please sign in to comment.