Add functionality to flush metrics periodically and a few metrics (latency, total requests sent, ...) #344

qqustc · 2020-06-05T15:01:38Z

Currently Nighthawk does not flush statistics and metrics in real time during its run. We plan to add the functionality to flush Nighthawk metrics periodically (at a configured time interval) to Envoy stats Sinks following the example of what has been done in Envoy.

We also plan to add new counters and histograms to monitor
number of total requests sent (counter)
request latency (histogram)
request/response header/body size (histogram)
number of total requests blocked (counter)

/cc @mum4k

oschaaf · 2020-06-05T18:37:35Z

This sounds awesome. Some thoughts that are potentially relevant:

I found this counter intuitive, but the tls counters aren't actually thread local. So at this time we explicitly use a namespace to isolate them per worker. At the time I did have - from the top of my head - a derived Envoy counter implementation which would track both shared and local counters. This would avoid having to use the namespaces to isolate. This may (or may not) be worth revisiting.
Another thing is that there's also an open issue, where we concluded we'd like to switch from HdrHistogram to libcirclhist. Envoy uses libcirclhist these days, so actually we could just switch to Envoy's internal histogram implementation. Upon doing so, we'd have a similar issue with the per worker namespace we use, like above with the counters. I have not looked into what it takes to address that.
A cross cutting concern might be how we sync histograms. Do we want to shoot for a unified way of sync/merging histograms when we think about threads, processes, machines, etc? If so, the way Envoy facilitates merging of histograms to the master thread might seem low hanging fruit -- but it might not be ideal when also considering the other scenarios.

qqustc · 2020-06-05T19:50:30Z

Thanks Otto for the information!

We plan to define per-worker Envoy counters and histograms by using the scope/namespace (similar to http_2xx counter in benchmark_client_impl.h), so the issue about tls counters aren't actually thread local should not matter here.
Yes, our plan is to use Envoy's internal histogram implementation at per-worker level.
Haven't decided on how we sync histograms but maybe it is natural to follow the way Envoy sync histograms.

This sounds awesome. Some thoughts that are potentially relevant:

I found this counter intuitive, but the tls counters aren't actually thread local. So at this time we explicitly use a namespace to isolate them per worker. At the time I did have - from the top of my head - a derived Envoy counter implementation which would track both shared and local counters. This would avoid having to use the namespaces to isolate. This may (or may not) be worth revisiting.

Another thing is that there's also an open issue, where we concluded we'd like to switch from HdrHistogram to libcirclhist. Envoy uses libcirclhist these days, so actually we could just switch to Envoy's internal histogram implementation. Upon doing so, we'd have a similar issue with the per worker namespace we use, like above with the counters. I have not looked into what it takes to address that.

A cross cutting concern might be how we sync histograms. Do we want to shoot for a unified way of sync/merging histograms when we think about threads, processes, machines, etc? If so, the way Envoy facilitates merging of histograms to the master thread might seem low hanging fruit -- but it might not be ideal when also considering the other scenarios.

With this, --duration 0 will be treated as a request to infinitely execute. Other default predicates will still exist, to execution may stop when connection failures or error response codes are observed. This leaves some small tech debt: - We shouldn't need to create a foo root termination predicate - We need test coverage, but that's much to add once envoyproxy#280 or envoyproxy#344 goes in Fixes envoyproxy#333 Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Adds `--no-duration`, which will be treated as a request to infinitely continue load generation. Other default predicates will still exist, so\ execution may still stop when connection failures or error response codes are observed. This leaves some small tech debt: - We need test coverage, but that's much easier to add once #280 or #344 goes in Fixes #333 Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

#391) Add new NH statistic class CircllhistStatistic and SinkableCircllhistStatistic. CircllhistStatistic uses Circllhist under the hood to compute statistics where Circllhist is used in the implementation of Envoy Histograms. Compared to HdrHistogram Circllhist trades precision for fast performance in merge and insertion according to #115. SinkableCircllhistStatistic wraps the Envoy::Stats::Histogram interface and is used to flush histogram value to downstream Envoy stats Sinks. Linked Issues: #344 Testing: unit tests

…figure stat sinks in NH (#418) Add CLI flags - `--stats-sinks` - Stats sinks (in json or compact yaml) where Nighthawk metrics will be flushed. This argument is intended to be specified multiple times. - `--stats-flush-interval` - Time interval (in seconds) between flushes to configured stats sinks. Default: 5 (seconds). The flags' values will be used to update [`stats_sinks `](https://github.com/envoyproxy/envoy/blob/54652f03ea0b3504bd4d64e8ae31985be2892a95/api/envoy/config/bootstrap/v3/bootstrap.proto#L129) and [`stats_flush_interval `](https://github.com/envoyproxy/envoy/blob/54652f03ea0b3504bd4d64e8ae31985be2892a95/api/envoy/config/bootstrap/v3/bootstrap.proto#L139) in `bootstrap.proto`. Linked Issues: #344 Testing: unit tests Docs Changes: README.md

Add the following latency histogram in benchmark_client_impl : - benchmark_http_client.latency_1xx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code 1xx - benchmark_http_client.latency_2xx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code 2xx - benchmark_http_client.latency_3xx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code 3xx - benchmark_http_client.latency_4xx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code 4xx - benchmark_http_client.latency_5xx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code 5xx - benchmark_http_client.latency_xxx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code <100 or >=600 Linked Issues: #344 Testing: unit tests Docs Changes: Add docs/root/statistics.md Signed-off-by: Qin Qin <qqin@google.com>

Add a flush worker to periodically flush stats to sinks. Add the functionality to configure sinks in process_impl.cc Start the flush worker in process_impl.cc when there is stats sink configured. Change the SinkableStatistic's tagExtractedName() to return `worker_id + "." + Nighthawk::Statistic::id()` which can be used in customized sinks. Linked Issues: #344 Signed-off-by: Qin Qin <qqin@google.com>

qqustc · 2020-09-23T19:42:31Z

Closed this now since the metric flush functionality has been added.

qqustc changed the title ~~Add functionality to flush metrics periodically and also add a few metrics (latency, total requests sent, ...)~~ Add functionality to flush metrics periodically and a few metrics (latency, total requests sent, ...) Jun 5, 2020

oschaaf mentioned this issue Jun 16, 2020

Allow --duration=0 or infinite duration #333

Closed

oschaaf mentioned this issue Jun 16, 2020

Add "no-duration" to facilitate infinite load generation #366

Merged

This was referenced Jul 1, 2020

Add new NH statistic class CircllhistStatistic and SinkableCircllhistStatistic qqustc/nighthawk#3

Merged

Add new NH statistic class CircllhistStatistic and SinkableCircllhist… #391

Merged

qqustc mentioned this issue Jul 15, 2020

Add stats to record latency metrics for each response code in NH #381

Merged

qqustc mentioned this issue Jul 22, 2020

Add CLI flags "stats-sinks" and "stats-flush-interval" to support configure stat sinks in NH #418

Merged

qqustc mentioned this issue Aug 4, 2020

Add the functionality to flush stats to sinks periodically in NH #437

Merged

qqustc closed this as completed Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functionality to flush metrics periodically and a few metrics (latency, total requests sent, ...) #344

Add functionality to flush metrics periodically and a few metrics (latency, total requests sent, ...) #344

qqustc commented Jun 5, 2020 •

edited

Loading

oschaaf commented Jun 5, 2020

qqustc commented Jun 5, 2020

qqustc commented Sep 23, 2020

Add functionality to flush metrics periodically and a few metrics (latency, total requests sent, ...) #344

Add functionality to flush metrics periodically and a few metrics (latency, total requests sent, ...) #344

Comments

qqustc commented Jun 5, 2020 • edited Loading

oschaaf commented Jun 5, 2020

qqustc commented Jun 5, 2020

qqustc commented Sep 23, 2020

qqustc commented Jun 5, 2020 •

edited

Loading