stats: refactoring MetricImpl without strings #4190

ambuc · 2018-08-16T22:03:31Z

Signed-off-by: James Buckland jbuckland@google.com

Description: This reworks MetricImpl to get rid of the std::string it keeps as a private member variable. Since MetricImpl never gets instantiated as-is, I let each of its extending classes implement name() for themselves, often without the need to copy strings. (Part of #3585)

In my estimation, running

bazel build //source/exe:envoy-static --copt='-ltcmalloc'
timeout 90 bazel-bin/source/exe/envoy-static --disable-hot-restart --config-path <a_config_with_10k_clusters>
pprof -text bazel-bin/source/exe/envoy-static main_common_base*

on upstream as-is vs. this branch saves ~20% memory consumption on the heap.

Risk Level: Medium -- anything that changes stats is tricky. But this doesn't change RawStatData, just MetricImpl.
Testing: This required editing the MockCounter/MockGauge classes, but otherwise it passes all tests.
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: James Buckland <jbuckland@google.com>

jmarantz · 2018-08-16T22:06:00Z

source/common/stats/metric_impl.h


-  const std::string& name() const override { return name_; }
+  const std::string name() const override { NOT_IMPLEMENTED_GCOVR_EXCL_LINE; }


just delete this declaration; then classes derived from MetricImpl will be forced to override the PURE method from Metric.

Sounds good.

jmarantz · 2018-08-16T22:06:31Z

include/envoy/stats/stats.h

@@ -23,7 +23,7 @@ class Metric {
  /**
   * Returns the full name of the Metric.
   */
-  virtual const std::string& name() const PURE;
+  virtual const std::string name() const PURE;


do you need to make this change in this PR? I understand why you'd need to make it in the SymbolTable PR.

Ditto for the other changes where you were returning a reference and now you are returning a copy.

I think it has to be this way because we are relying on the private member variable name_ inside RawStatData. name_ is a char[], so we could return a string_view to it or a string of it, but not a string ref. If we returned a string_view I think we would have to rework lots of other function interfaces where we expect a string. (lookups in maps, etc.)

Right, makes sense. In the context of this PR, string_view is the right option. You'll have to return string-by-value if/when we move to SymbolTable.

The advantage of your approach here (returning string-by-value) is that it will work for SymbolTable. Switching callers to take string_view is optimal in this PR, but would actually would be the wrong direction to go if we are heading to SymbolTable. I'm hopeful the compiler will tell us if we write:

absl::string_view_name = stat->name();

if name() returns a string-by-value. If not, in the SymbolTable PR, you can just rename the method (even temporarily) to force you to examine all methods via compiler errors.

So I think my feeling is you should do the right thing or this PR and return a string_view.

I want to keep this as-is, since we might have to convert this back into a string at one of the aforementioned interfaces/callsites, and since an incumbent symbol_table PR will play more cleanly with a string interface. Converting this to a string_view everywhere means the reworking of maps and structures which store strings now to store container<string_view, _, StringViewHash_, StringViewCompare_>, and they'll just end up churning again into container<StatNamePtr, _, StatNamePtrHash_, StatNamePtrCompare_>.

jmarantz · 2018-08-16T22:17:31Z

source/common/stats/metric_impl.h

@@ -30,7 +32,6 @@ class MetricImpl : public virtual Metric {
  };

 private:
-  const std::string name_;
  const std::string tag_extracted_name_;


It looks like we could get rid of this too; we'd have to change consumers of tagExtractedName() and tags() to compute them on demand from TagProducerImpl::produceTags(). That could be a separate PR.

I don't know what the CPU cost of that would be; if that happens during operation, or would be only while setting up a stats sinks. @mrice32 , WDYT?

If not, the tag_extracted_name_ string would be ripe for symbolizing later.

I'm OK with spinning this off into a separate PR, but yeah, if we do this for one variable we might well do it for several.

Signed-off-by: James Buckland <jbuckland@google.com>

ambuc · 2018-08-17T14:31:27Z

OK, here are some more performance testing results:

(all run under --disable-hot-restart)

10^	Clusters	Run for	Upstream	This Branch	Reduction
1	10	30 sec	2534.52kB	2480.34kB	2.1%
2	100	60 sec	8530.54kB	7990.80kB	6.3%
3	1000	90 sec	68589.73kB	63132.50kB	8.0%
4	10000	90 sec	4366736.40kB	3566095.10 kB	18.3%

jmarantz · 2018-08-17T14:35:13Z

cool, although I think the reduction would be 100-97.9%, 100-93.7%, etc. Still awesome, as this reduces by 18.3% the scaled case.

ambuc · 2018-08-17T14:36:27Z

@jmarantz Oops, you're right. I edited the table to reflect that.

jmarantz

nice!

jmarantz · 2018-08-17T14:42:25Z

one more thing while we wait for a maintainer to review: you measured memory high-water here, right? Do you want to leave some notes somewhere (in the PR and maybe also in some developer-oriented doc) about how you did that so they can be repro'd by others?

Also, curious what impact this has on CPU and elapsed time for these tests.

ambuc · 2018-08-17T14:46:27Z

Sounds good to me -- I will do that in a separate PR, so that we can figure out where in the repo to put it (at our leisure).

dnoe · 2018-08-17T15:01:35Z

The PR description says that MetricImpl is never instantiated directly. But I don't see any comments indicating that it is never intended to be instantiated directly. I think the way things are structured right now it has the pure virtual name() so at least it shouldn't compile rather than failing mysteriously at runtime. Do you want to add a comment to the MetricImpl header file too?

ambuc · 2018-08-17T16:11:46Z

@dnoe Sounds good, I'll put a comment to that effect -- this might happen to other string private member variables, but it does make sense to have a non-virtual MetricImpl for encapsulating the Flags logic, at least.

Signed-off-by: James Buckland <jbuckland@google.com>

mattklein123

Cool, LGTM!

mattklein123 · 2018-08-17T16:49:33Z

test/mocks/stats/mocks.h

@@ -27,10 +27,13 @@ class MockCounter : public Counter {
  MockCounter();
  ~MockCounter();

+  // Note: cannot be mocked because it is accessed as a Property in a gmock EXPECT_CALL. This


FWIW I still don't understand what this means and why it happens (I realize this was copied).

I don't really understand it either. I think it was introduced in #1803. @mrice32, can you supply context here?

PiotrSikora · 2018-08-18T13:07:26Z

@ambuc I'm a bit confused by the numbers presented in #4190 (comment), how come memory usage doesn't grow proportionally to number of clusters?

For the record, I see decent improvements with this change (~8%), but in my tests the allocated memory is ~72kB per static cluster, and I'm getting ~73MB for 1000 clusters and ~717MB for 10000 clusters...

How come you're getting 5x more for 10000 clusters? Where are you getting the memory usage from?

ambuc · 2018-08-20T14:41:27Z

@PiotrSikora I don't know why the nonlinearity, but here's an experiment I just ran:

I have two cluster configs with 10k clusters. In the smaller, each cluster name is ~10 characters long; in the larger, each cluster name is ~50 characters long.

clusters	10k	10k
average cluster name length	~10	~50
allocated	3.56 GB	5.51 GB

This difference is too large -- maybe this explains the 5x increase we saw above? There's something about string allocations I don't understand which may be driving this nonlinearity.

Signed-off-by: James Buckland jbuckland@google.com Description: Follow-up to #4190, explaining how pprof/tcmalloc can be used for memory consumption performance testing. Risk Level: None Testing: None Docs Changes: Added a developer-facing documentation under bazel/PPROF.md. Release Notes: N/A

* origin/master: (34 commits) fuzz: fixes oss-fuzz: 9895 (envoyproxy#4189) docs: added developer docs for pprof/tcmalloc testing (envoyproxy#4194) fix sometimes returns response body to HEAD requests (envoyproxy#3985) admin: fix config dump order (envoyproxy#4197) thrift: allow translation between downstream and upstream protocols (envoyproxy#4136) Use local_info.node() instead of bootstrap.node() whenever possible (envoyproxy#4120) upstream: allow extension protocol options to be used with http filters (envoyproxy#4165) [thrift_proxy] Replace local HeaderMap with Http::HeaderMap[Impl] (envoyproxy#4169) docs: improve gRPC-JSON filter doc (envoyproxy#4184) stats: refactoring MetricImpl without strings (envoyproxy#4190) fuzz: coverage profile generation instructions. (envoyproxy#4193) upstream: rebuild cluster when health check config is changed (envoyproxy#4075) build: use clang-6.0. (envoyproxy#4168) thrift_proxy: introduce header transport (envoyproxy#4082) tcp: allow connection pool callers to store protocol state (envoyproxy#4131) configs: match latest API changes (envoyproxy#4185) Fix wrong mock function name. (envoyproxy#4187) Bump yaml-cpp so it builds with Visual Studio 15.8 (envoyproxy#4182) Converting envoy configs to V2 (envoyproxy#2957) Add timestamp to HealthCheckEvent definition (envoyproxy#4119) ... Signed-off-by: Snow Pettersen <snowp@squareup.com>

PiotrSikora · 2018-08-20T23:02:56Z

@ambuc in my tests the cluster name is ~18 characters long and the memory usage is ~717MB (~72kB per cluster), so that's not it.

There is also per thread/worker overhead of about 1850 bytes, but for the math to work for that, you'd need to be running your tests with --concurrency 160 or similar.

…#1938) This is far from finished, but it reduces memory usage by ~10%. Pulling the following changes from github.com/envoyproxy/envoy: c1cc68d stats: refactoring MetricImpl without strings (envoyproxy#4190) 36809d8 fuzz: coverage profile generation instructions. (envoyproxy#4193) ba40cc9 upstream: rebuild cluster when health check config is changed (envoyproxy#4075) 05c0d52 build: use clang-6.0. (envoyproxy#4168) 01f403e thrift_proxy: introduce header transport (envoyproxy#4082) 564d256 tcp: allow connection pool callers to store protocol state (envoyproxy#4131) 3e1d643 configs: match latest API changes (envoyproxy#4185) bc6a10c Fix wrong mock function name. (envoyproxy#4187) e994c1c Bump yaml-cpp so it builds with Visual Studio 15.8 (envoyproxy#4182) 3d1325e Converting envoy configs to V2 (envoyproxy#2957) 8d0680f Add timestamp to HealthCheckEvent definition (envoyproxy#4119) 497efb9 server: handle non-EnvoyExceptions safely if thrown in constructor. (envoyproxy#4173) 6d6fafd config: strengthen validation for gRPC config sources. (envoyproxy#4171) 132302c fuzz: reduce log level when running under fuzz engine. (envoyproxy#4161) 7c04ac2 test: fix V6EmptyOptions in coverage with IPv6 enable (envoyproxy#4155) 1b2219b ci: remove deprecated bazel --batch option (envoyproxy#4166) 2db6a4c ci: update clang to version 6.0 in the Ubuntu build image. (envoyproxy#4157) 71152b7 ratelimit: Add ratelimit custom response headers (envoyproxy#4015) 3062874 ssl: make Ssl::Connection const everywhere (envoyproxy#4179) 706e262 Handle validation of non expiring tokens in jwt_authn filter (envoyproxy#4007) f06e958 fuzz: tag trivial fuzzers with no_fuzz. (envoyproxy#4156) 27fb1d3 thrift_proxy: add service name matching to router implementation (envoyproxy#4130) 8c189a5 Make over provisioning factor configurable (envoyproxy#4003) 6c08fb4 Making test less flaky (envoyproxy#4149) 592775b fuzz: bare bones HCM fuzzer. (envoyproxy#4118) For istio/istio#7912. Signed-off-by: Piotr Sikora <piotrsikora@google.com>

MetricImpl without std::string

2fb3aae

Signed-off-by: James Buckland <jbuckland@google.com>

jmarantz reviewed Aug 16, 2018

View reviewed changes

ambuc added 2 commits August 17, 2018 09:30

Merge remote-tracking branch 'upstream/master' into metric-impl-dedup

5b149f5

Signed-off-by: James Buckland <jbuckland@google.com>

Remove MetricImpl::name(), to be overridden everywhere

d40d1cd

Signed-off-by: James Buckland <jbuckland@google.com>

dnoe self-assigned this Aug 17, 2018

jmarantz previously approved these changes Aug 17, 2018

View reviewed changes

Better documentation for MetricImpl

506006a

Signed-off-by: James Buckland <jbuckland@google.com>

ambuc dismissed jmarantz’s stale review via 506006a August 17, 2018 16:13

ambuc mentioned this pull request Aug 17, 2018

docs: added developer docs for pprof/tcmalloc testing #4194

Merged

mattklein123 approved these changes Aug 17, 2018

View reviewed changes

mattklein123 merged commit c1cc68d into envoyproxy:master Aug 17, 2018

ambuc deleted the metric-impl-dedup branch August 17, 2018 17:40

PiotrSikora mentioned this pull request Aug 18, 2018

Per listener and per cluster memory overhead is too high #4196

Closed

PiotrSikora mentioned this pull request Sep 5, 2018

stats: symbolize strings in HeapStatData and ThreadLocalStore #4281

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stats: refactoring MetricImpl without strings #4190

stats: refactoring MetricImpl without strings #4190

ambuc commented Aug 16, 2018 •

edited

Loading

jmarantz Aug 16, 2018

ambuc Aug 17, 2018

jmarantz Aug 16, 2018

ambuc Aug 17, 2018 •

edited

Loading

jmarantz Aug 17, 2018

ambuc Aug 17, 2018

jmarantz Aug 16, 2018

ambuc Aug 17, 2018

ambuc commented Aug 17, 2018 •

edited

Loading

jmarantz commented Aug 17, 2018

ambuc commented Aug 17, 2018

jmarantz left a comment

jmarantz commented Aug 17, 2018

ambuc commented Aug 17, 2018

dnoe commented Aug 17, 2018

ambuc commented Aug 17, 2018

mattklein123 left a comment

mattklein123 Aug 17, 2018

ambuc Aug 17, 2018

PiotrSikora commented Aug 18, 2018

ambuc commented Aug 20, 2018

PiotrSikora commented Aug 20, 2018


		const std::string& name() const override { return name_; }
		const std::string name() const override { NOT_IMPLEMENTED_GCOVR_EXCL_LINE; }

stats: refactoring MetricImpl without strings #4190

stats: refactoring MetricImpl without strings #4190

Conversation

ambuc commented Aug 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ambuc Aug 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ambuc commented Aug 17, 2018 • edited Loading

jmarantz commented Aug 17, 2018

ambuc commented Aug 17, 2018

jmarantz left a comment

Choose a reason for hiding this comment

jmarantz commented Aug 17, 2018

ambuc commented Aug 17, 2018

dnoe commented Aug 17, 2018

ambuc commented Aug 17, 2018

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PiotrSikora commented Aug 18, 2018

ambuc commented Aug 20, 2018

PiotrSikora commented Aug 20, 2018

ambuc commented Aug 16, 2018 •

edited

Loading

ambuc Aug 17, 2018 •

edited

Loading

ambuc commented Aug 17, 2018 •

edited

Loading