Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to flush metrics periodically and a few metrics (latency, total requests sent, ...) #344

Closed
qqustc opened this issue Jun 5, 2020 · 3 comments

Comments

@qqustc
Copy link
Contributor

qqustc commented Jun 5, 2020

Currently Nighthawk does not flush statistics and metrics in real time during its run. We plan to add the functionality to flush Nighthawk metrics periodically (at a configured time interval) to Envoy stats Sinks following the example of what has been done in Envoy.

We also plan to add new counters and histograms to monitor
number of total requests sent (counter)
request latency (histogram)
request/response header/body size (histogram)
number of total requests blocked (counter)

/cc @mum4k

@qqustc qqustc changed the title Add functionality to flush metrics periodically and also add a few metrics (latency, total requests sent, ...) Add functionality to flush metrics periodically and a few metrics (latency, total requests sent, ...) Jun 5, 2020
@oschaaf
Copy link
Member

oschaaf commented Jun 5, 2020

This sounds awesome. Some thoughts that are potentially relevant:

  • I found this counter intuitive, but the tls counters aren't actually thread local. So at this time we explicitly use a namespace to isolate them per worker. At the time I did have - from the top of my head - a derived Envoy counter implementation which would track both shared and local counters. This would avoid having to use the namespaces to isolate. This may (or may not) be worth revisiting.
  • Another thing is that there's also an open issue, where we concluded we'd like to switch from HdrHistogram to libcirclhist. Envoy uses libcirclhist these days, so actually we could just switch to Envoy's internal histogram implementation. Upon doing so, we'd have a similar issue with the per worker namespace we use, like above with the counters. I have not looked into what it takes to address that.
  • A cross cutting concern might be how we sync histograms. Do we want to shoot for a unified way of sync/merging histograms when we think about threads, processes, machines, etc? If so, the way Envoy facilitates merging of histograms to the master thread might seem low hanging fruit -- but it might not be ideal when also considering the other scenarios.

@qqustc
Copy link
Contributor Author

qqustc commented Jun 5, 2020

Thanks Otto for the information!

  • We plan to define per-worker Envoy counters and histograms by using the scope/namespace (similar to http_2xx counter in benchmark_client_impl.h), so the issue about tls counters aren't actually thread local should not matter here.

  • Yes, our plan is to use Envoy's internal histogram implementation at per-worker level.

  • Haven't decided on how we sync histograms but maybe it is natural to follow the way Envoy sync histograms.

This sounds awesome. Some thoughts that are potentially relevant:

  • I found this counter intuitive, but the tls counters aren't actually thread local. So at this time we explicitly use a namespace to isolate them per worker. At the time I did have - from the top of my head - a derived Envoy counter implementation which would track both shared and local counters. This would avoid having to use the namespaces to isolate. This may (or may not) be worth revisiting.
  • Another thing is that there's also an open issue, where we concluded we'd like to switch from HdrHistogram to libcirclhist. Envoy uses libcirclhist these days, so actually we could just switch to Envoy's internal histogram implementation. Upon doing so, we'd have a similar issue with the per worker namespace we use, like above with the counters. I have not looked into what it takes to address that.
  • A cross cutting concern might be how we sync histograms. Do we want to shoot for a unified way of sync/merging histograms when we think about threads, processes, machines, etc? If so, the way Envoy facilitates merging of histograms to the master thread might seem low hanging fruit -- but it might not be ideal when also considering the other scenarios.

oschaaf added a commit to oschaaf/nighthawk that referenced this issue Jun 16, 2020
With this, --duration 0 will be treated as a request to infinitely
execute. Other default predicates will still exist, to execution
may stop when connection failures or error response codes are
observed.

This leaves some small tech debt:

- We shouldn't need to create a foo root termination predicate
- We need test coverage, but that's much to add once
  envoyproxy#280 or envoyproxy#344 goes in

Fixes envoyproxy#333

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
mum4k pushed a commit that referenced this issue Jun 18, 2020
Adds `--no-duration`, which will be treated as a request to infinitely
continue load generation.  Other default predicates will still exist, so\
execution may still stop when connection failures or error response codes
are observed.

This leaves some small tech debt:

- We need test coverage, but that's much easier to add once
  #280 or #344 goes in

Fixes #333

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
mum4k pushed a commit that referenced this issue Jul 14, 2020
#391)

Add new NH statistic class CircllhistStatistic and SinkableCircllhistStatistic.

CircllhistStatistic uses Circllhist under the hood to compute statistics where Circllhist is used in the implementation of Envoy Histograms. Compared to HdrHistogram Circllhist trades precision for fast performance in merge and insertion according to #115.

SinkableCircllhistStatistic wraps the Envoy::Stats::Histogram interface and is used to flush histogram value to downstream Envoy stats Sinks.

Linked Issues: #344

Testing: unit tests
mum4k pushed a commit that referenced this issue Jul 23, 2020
…figure stat sinks in NH (#418)

Add CLI flags 
- `--stats-sinks` - Stats sinks (in json or compact yaml) where Nighthawk metrics will be flushed. This argument is intended to be specified multiple times.
- `--stats-flush-interval` - Time interval (in seconds) between flushes to configured stats sinks. Default: 5 (seconds).

The flags' values will be used to update [`stats_sinks `](https://github.com/envoyproxy/envoy/blob/54652f03ea0b3504bd4d64e8ae31985be2892a95/api/envoy/config/bootstrap/v3/bootstrap.proto#L129) and [`stats_flush_interval `](https://github.com/envoyproxy/envoy/blob/54652f03ea0b3504bd4d64e8ae31985be2892a95/api/envoy/config/bootstrap/v3/bootstrap.proto#L139) in `bootstrap.proto`.

Linked Issues: #344
Testing: unit tests
Docs Changes: README.md
mum4k pushed a commit that referenced this issue Jul 23, 2020
Add the following latency histogram in benchmark_client_impl :

- benchmark_http_client.latency_1xx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code 1xx	
- benchmark_http_client.latency_2xx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code 2xx
- benchmark_http_client.latency_3xx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code 3xx	
- benchmark_http_client.latency_4xx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code 4xx
- benchmark_http_client.latency_5xx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code 5xx	
- benchmark_http_client.latency_xxx | SinkableHdrStatistic | Latency (in Nanosecond) histogram of request with code <100 or >=600

Linked Issues: #344


Testing: unit tests
Docs Changes: Add docs/root/statistics.md

Signed-off-by: Qin Qin <qqin@google.com>
mum4k pushed a commit that referenced this issue Aug 13, 2020
Add a flush worker to periodically flush stats to sinks.

Add the functionality to configure sinks in process_impl.cc

Start the flush worker in process_impl.cc when there is stats sink configured.

Change the SinkableStatistic's tagExtractedName() to return `worker_id + "." + Nighthawk::Statistic::id()` which can be used in customized sinks.

Linked Issues: #344

Signed-off-by: Qin Qin <qqin@google.com>
@qqustc
Copy link
Contributor Author

qqustc commented Sep 23, 2020

Closed this now since the metric flush functionality has been added.

@qqustc qqustc closed this as completed Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants