Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: refactoring MetricImpl without strings #4190

Merged
merged 4 commits into from
Aug 17, 2018

Conversation

ambuc
Copy link
Contributor

@ambuc ambuc commented Aug 16, 2018

Signed-off-by: James Buckland jbuckland@google.com

Description: This reworks MetricImpl to get rid of the std::string it keeps as a private member variable. Since MetricImpl never gets instantiated as-is, I let each of its extending classes implement name() for themselves, often without the need to copy strings. (Part of #3585)

In my estimation, running

bazel build //source/exe:envoy-static --copt='-ltcmalloc'
timeout 90 bazel-bin/source/exe/envoy-static --disable-hot-restart --config-path <a_config_with_10k_clusters>
pprof -text bazel-bin/source/exe/envoy-static main_common_base*

on upstream as-is vs. this branch saves ~20% memory consumption on the heap.

Risk Level: Medium -- anything that changes stats is tricky. But this doesn't change RawStatData, just MetricImpl.
Testing: This required editing the MockCounter/MockGauge classes, but otherwise it passes all tests.
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: James Buckland <jbuckland@google.com>

const std::string& name() const override { return name_; }
const std::string name() const override { NOT_IMPLEMENTED_GCOVR_EXCL_LINE; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just delete this declaration; then classes derived from MetricImpl will be forced to override the PURE method from Metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@@ -23,7 +23,7 @@ class Metric {
/**
* Returns the full name of the Metric.
*/
virtual const std::string& name() const PURE;
virtual const std::string name() const PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to make this change in this PR? I understand why you'd need to make it in the SymbolTable PR.

Ditto for the other changes where you were returning a reference and now you are returning a copy.

Copy link
Contributor Author

@ambuc ambuc Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has to be this way because we are relying on the private member variable name_ inside RawStatData. name_ is a char[], so we could return a string_view to it or a string of it, but not a string ref. If we returned a string_view I think we would have to rework lots of other function interfaces where we expect a string. (lookups in maps, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes sense. In the context of this PR, string_view is the right option. You'll have to return string-by-value if/when we move to SymbolTable.

The advantage of your approach here (returning string-by-value) is that it will work for SymbolTable. Switching callers to take string_view is optimal in this PR, but would actually would be the wrong direction to go if we are heading to SymbolTable. I'm hopeful the compiler will tell us if we write:

  absl::string_view_name = stat->name();

if name() returns a string-by-value. If not, in the SymbolTable PR, you can just rename the method (even temporarily) to force you to examine all methods via compiler errors.

So I think my feeling is you should do the right thing or this PR and return a string_view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep this as-is, since we might have to convert this back into a string at one of the aforementioned interfaces/callsites, and since an incumbent symbol_table PR will play more cleanly with a string interface. Converting this to a string_view everywhere means the reworking of maps and structures which store strings now to store container<string_view, _, StringViewHash_, StringViewCompare_>, and they'll just end up churning again into container<StatNamePtr, _, StatNamePtrHash_, StatNamePtrCompare_>.

@@ -30,7 +32,6 @@ class MetricImpl : public virtual Metric {
};

private:
const std::string name_;
const std::string tag_extracted_name_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we could get rid of this too; we'd have to change consumers of tagExtractedName() and tags() to compute them on demand from TagProducerImpl::produceTags(). That could be a separate PR.

I don't know what the CPU cost of that would be; if that happens during operation, or would be only while setting up a stats sinks. @mrice32 , WDYT?

If not, the tag_extracted_name_ string would be ripe for symbolizing later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with spinning this off into a separate PR, but yeah, if we do this for one variable we might well do it for several.

ambuc added 2 commits August 17, 2018 09:30
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
@dnoe dnoe self-assigned this Aug 17, 2018
@ambuc
Copy link
Contributor Author

ambuc commented Aug 17, 2018

OK, here are some more performance testing results:

(all run under --disable-hot-restart)

10^ Clusters Run for Upstream This Branch Reduction
1 10 30 sec 2534.52kB 2480.34kB 2.1%
2 100 60 sec 8530.54kB 7990.80kB 6.3%
3 1000 90 sec 68589.73kB 63132.50kB 8.0%
4 10000 90 sec 4366736.40kB 3566095.10 kB 18.3%

@jmarantz
Copy link
Contributor

cool, although I think the reduction would be 100-97.9%, 100-93.7%, etc. Still awesome, as this reduces by 18.3% the scaled case.

@ambuc
Copy link
Contributor Author

ambuc commented Aug 17, 2018

@jmarantz Oops, you're right. I edited the table to reflect that.

jmarantz
jmarantz previously approved these changes Aug 17, 2018
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@jmarantz
Copy link
Contributor

one more thing while we wait for a maintainer to review: you measured memory high-water here, right? Do you want to leave some notes somewhere (in the PR and maybe also in some developer-oriented doc) about how you did that so they can be repro'd by others?

Also, curious what impact this has on CPU and elapsed time for these tests.

@ambuc
Copy link
Contributor Author

ambuc commented Aug 17, 2018

Sounds good to me -- I will do that in a separate PR, so that we can figure out where in the repo to put it (at our leisure).

@dnoe
Copy link
Contributor

dnoe commented Aug 17, 2018

The PR description says that MetricImpl is never instantiated directly. But I don't see any comments indicating that it is never intended to be instantiated directly. I think the way things are structured right now it has the pure virtual name() so at least it shouldn't compile rather than failing mysteriously at runtime. Do you want to add a comment to the MetricImpl header file too?

@ambuc
Copy link
Contributor Author

ambuc commented Aug 17, 2018

@dnoe Sounds good, I'll put a comment to that effect -- this might happen to other string private member variables, but it does make sense to have a non-virtual MetricImpl for encapsulating the Flags logic, at least.

Signed-off-by: James Buckland <jbuckland@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, LGTM!

@@ -27,10 +27,13 @@ class MockCounter : public Counter {
MockCounter();
~MockCounter();

// Note: cannot be mocked because it is accessed as a Property in a gmock EXPECT_CALL. This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I still don't understand what this means and why it happens (I realize this was copied).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand it either. I think it was introduced in #1803. @mrice32, can you supply context here?

@mattklein123 mattklein123 merged commit c1cc68d into envoyproxy:master Aug 17, 2018
@ambuc ambuc deleted the metric-impl-dedup branch August 17, 2018 17:40
@PiotrSikora
Copy link
Contributor

@ambuc I'm a bit confused by the numbers presented in #4190 (comment), how come memory usage doesn't grow proportionally to number of clusters?

For the record, I see decent improvements with this change (~8%), but in my tests the allocated memory is ~72kB per static cluster, and I'm getting ~73MB for 1000 clusters and ~717MB for 10000 clusters...

How come you're getting 5x more for 10000 clusters? Where are you getting the memory usage from?

@ambuc
Copy link
Contributor Author

ambuc commented Aug 20, 2018

@PiotrSikora I don't know why the nonlinearity, but here's an experiment I just ran:

I have two cluster configs with 10k clusters. In the smaller, each cluster name is ~10 characters long; in the larger, each cluster name is ~50 characters long.

clusters 10k 10k
average cluster name length ~10 ~50
allocated 3.56 GB 5.51 GB

This difference is too large -- maybe this explains the 5x increase we saw above? There's something about string allocations I don't understand which may be driving this nonlinearity.

dnoe pushed a commit that referenced this pull request Aug 20, 2018
Signed-off-by: James Buckland jbuckland@google.com

Description: Follow-up to #4190, explaining how pprof/tcmalloc can be used for memory consumption performance testing.
Risk Level: None
Testing: None
Docs Changes: Added a developer-facing documentation under bazel/PPROF.md.
Release Notes: N/A
snowp added a commit to snowp/envoy that referenced this pull request Aug 20, 2018
* origin/master: (34 commits)
  fuzz: fixes oss-fuzz: 9895 (envoyproxy#4189)
  docs: added developer docs for pprof/tcmalloc testing (envoyproxy#4194)
  fix sometimes returns response body to HEAD requests (envoyproxy#3985)
  admin: fix config dump order (envoyproxy#4197)
  thrift: allow translation between downstream and upstream protocols (envoyproxy#4136)
  Use local_info.node() instead of bootstrap.node() whenever possible (envoyproxy#4120)
  upstream: allow extension protocol options to be used with http filters (envoyproxy#4165)
  [thrift_proxy] Replace local HeaderMap with Http::HeaderMap[Impl] (envoyproxy#4169)
  docs: improve gRPC-JSON filter doc (envoyproxy#4184)
  stats: refactoring MetricImpl without strings (envoyproxy#4190)
  fuzz: coverage profile generation instructions. (envoyproxy#4193)
  upstream: rebuild cluster when health check config is changed (envoyproxy#4075)
  build: use clang-6.0. (envoyproxy#4168)
  thrift_proxy: introduce header transport (envoyproxy#4082)
  tcp: allow connection pool callers to store protocol state (envoyproxy#4131)
  configs: match latest API changes (envoyproxy#4185)
  Fix wrong mock function name. (envoyproxy#4187)
  Bump yaml-cpp so it builds with Visual Studio 15.8 (envoyproxy#4182)
  Converting envoy configs to V2 (envoyproxy#2957)
  Add timestamp to HealthCheckEvent definition (envoyproxy#4119)
  ...

Signed-off-by: Snow Pettersen <snowp@squareup.com>
@PiotrSikora
Copy link
Contributor

@ambuc in my tests the cluster name is ~18 characters long and the memory usage is ~717MB (~72kB per cluster), so that's not it.

There is also per thread/worker overhead of about 1850 bytes, but for the math to work for that, you'd need to be running your tests with --concurrency 160 or similar.

rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018
…#1938)

This is far from finished, but it reduces memory usage by ~10%.

Pulling the following changes from github.com/envoyproxy/envoy:

c1cc68d stats: refactoring MetricImpl without strings (envoyproxy#4190)
36809d8 fuzz: coverage profile generation instructions. (envoyproxy#4193)
ba40cc9 upstream: rebuild cluster when health check config is changed (envoyproxy#4075)
05c0d52 build: use clang-6.0. (envoyproxy#4168)
01f403e thrift_proxy: introduce header transport (envoyproxy#4082)
564d256 tcp: allow connection pool callers to store protocol state (envoyproxy#4131)
3e1d643 configs: match latest API changes (envoyproxy#4185)
bc6a10c Fix wrong mock function name. (envoyproxy#4187)
e994c1c Bump yaml-cpp so it builds with Visual Studio 15.8 (envoyproxy#4182)
3d1325e Converting envoy configs to V2 (envoyproxy#2957)
8d0680f Add timestamp to HealthCheckEvent definition (envoyproxy#4119)
497efb9 server: handle non-EnvoyExceptions safely if thrown in constructor. (envoyproxy#4173)
6d6fafd config: strengthen validation for gRPC config sources. (envoyproxy#4171)
132302c fuzz: reduce log level when running under fuzz engine. (envoyproxy#4161)
7c04ac2 test: fix V6EmptyOptions in coverage with IPv6 enable (envoyproxy#4155)
1b2219b ci: remove deprecated bazel --batch option (envoyproxy#4166)
2db6a4c ci: update clang to version 6.0 in the Ubuntu build image. (envoyproxy#4157)
71152b7 ratelimit: Add ratelimit custom response headers (envoyproxy#4015)
3062874 ssl: make Ssl::Connection const everywhere (envoyproxy#4179)
706e262 Handle validation of non expiring tokens in jwt_authn filter (envoyproxy#4007)
f06e958 fuzz: tag trivial fuzzers with no_fuzz. (envoyproxy#4156)
27fb1d3 thrift_proxy: add service name matching to router implementation (envoyproxy#4130)
8c189a5 Make over provisioning factor configurable (envoyproxy#4003)
6c08fb4 Making test less flaky (envoyproxy#4149)
592775b fuzz: bare bones HCM fuzzer. (envoyproxy#4118)

For istio/istio#7912.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants