-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per listener and per cluster memory overhead is too high #4196
Comments
@PiotrSikora thanks for the writeup and data, this is a good start to understanding the problem. I'd address the solution differently for clusters/listeners. For clusters, we already have a plan-of-record to do (3) in the long term, this has been formalized in the incremental xDS API additions in envoy/api/envoy/api/v2/discovery.proto Line 97 in e341004
I think (3) is where we want to shoot for in a lasting solution that will scale to very high numbers of clusters (e.g. 100k), (2) would only be a stop gap and not support some applications like serverless that inherently need on-demand xDS. We have internal demand from teams that want this. I realize that the current goal is to reduce overheads when we are talking about 1-10k clusters. In that case, I think we should just fix stats for now and no other solution is needed. We also have the problem that the LB data structures (depending on LB type) are O(num clusters * num workers * num endpoints) in memory use, which also needs some work. For listeners, how many listeners that are actually binding to ports do we require? I'd like to understand how the listener count is climbing > O(10) in the application, do we really have O(1k) listeners? Again, stats improvement should be helpful here, but not a silver bullet, where is the rest of the memory coming from? |
Agreed, except that (3) is probably months away, and (1) and possibly (2) could be done in a few days.
Istio is using
When you say "just fix stats", do you have anything particular in mind? Even if we optimize strings (each fully evaluated stat name resides 3x in memory) with something like #3927, the maximum savings for the Alternatively, we could delay |
The way I think of #3927 <#3927> is
that it should be very impactful to stat name memory, once we knock off a
bunch of other copies of the stat names first. That 17% looks
disappointing now, but the absolute byte-count saved on James' benchmark
but will be a much higher percentage when the other copies have been
removed.
However there's still a non-zero per-stat overhead that is unavoidable,
even if the names are free. And depending on the deployment, aggregating
stats across clusters may be desirable anyway. In another product I know,
sharing stats is the default:
https://www.modpagespeed.com/doc/admin#virtual-hosts-and-stats
…On Tue, Aug 21, 2018 at 8:12 AM Piotr Sikora ***@***.***> wrote:
I think (3) is where we want to shoot for in a lasting solution that will
scale to very high numbers of clusters (e.g. 100k), (2) would only be a
stop gap and not support some applications like serverless that inherently
need on-demand xDS. We have internal demand from teams that want this.
Agreed, except that (3) is probably months away, and (1) and possibly (2)
could be done in a few days.
For listeners, how many listeners that are actually binding to ports do we
require? I'd like to understand how the listener count is climbing > O(10)
in the application, do we really have O(1k) listeners? Again, stats
improvement should be helpful here, but not a silver bullet, where is the
rest of the memory coming from?
Istio is using iptables interception and use_original_dst, so we have 1
virtual listener for each TCP destination, but all HTTP destinations are
sent to a wildcard listener, where they are routed based on the Host
header, so we have only 1 of those for each destination port. Because of
that, the listeners are not a huge problem for us (at least not yet), but
the per cluster overhead is blocking deployment for our customers, and
disabling per cluster stats (that we don't use anyway), which lowers the
overhead from ~72kB to ~14kB (so by ~80%) is a huge win, so I'd like to
purse this option, since that's the quickest thing to implement.
I realize that the current goal is to reduce overheads when we are talking
about 1-10k clusters. In that case, I think we should just fix stats for
now and no other solution is needed.
When you say "just fix stats", do you have anything particular in mind?
Even if we optimize strings (each fully evaluated stat name resides 3x in
memory) with something like #3927
<#3927>, the maximum savings for
the service_google_xxxx cluster name used in my experiments are around
~12kB (~17%), compared to ~58kB (~80%) with disabled stats, so the
difference is quite significant, and I think that disabling stats would be
preferable solution for Istio until we have incremental xDS.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4196 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2kPWB9SwqSf2h3k8ri4C-S1p0mzFD0ks5uS_kogaJpZM4WChjD>
.
|
@PiotrSikora (3) should not take months to implement on Envoy side, the implementation is O(1-2 weeks) IMHO. Support on the management side is probably where the bottleneck is. I think we're in agreement that the biggest benefit will come from reducing cluster stats. I still think fundamentally, given what we're tracking in both key/value teams, the essential memory overhead is more like O(1kb), so we should be able to fix it. |
@htuch it seems that the string optimizations didn't help as much as we hoped, and even with #4281, we're still looking at ~69kB overhead for each static cluster, which is way too high. Could you please reconsider disabling stats (since we don't have any use for them anyway)? This could be achieved by adding either no-op or lazily-loaded cc @ambuc |
@PiotrSikora lazy loading |
I think disabling stats via flag / build parameter / etc is the most expedient solution to Istio's problem, and is much easier than incremental XDS / lazy-loaded stats / whatever. There are probably lots of users who don't care about stats, and until we have the right solution, this would work best. |
@PiotrSikora doesn't Istio make use of cluster stats? It's basically impossible to diagnose what goes wrong on Envoy if you don't have cluster level stats. I think almost every user cares about these stats, even if they don't know it. |
I thought a fair number of Envoy users use out-of-the-box binaries. Is it hard to do via flag? Also is what we want to disable stats? Or just make them non-cluster-specific? Or both? |
I think a flag to disable all stats (or just cluster stats) and a targeted lazy load (lazy loading as a rule for all stats would take a decent amount of work and might add some real overhead to the request path) would both be relatively easy. I think there's already a flag to disable/enable some subset of the stats today, but I can't remember which component. @htuch I could see a use case where some users might care enough about memory consumption to only turn on stats on a small subset of Envoy instances to get just enough info to create an approximate sample of fleet-wide stats, but I'm not very aware of the typical Envoy deployment today. |
I am not fully understanding the lazy-loading solution; that may be a great way to go. My first thought on how I would disable stats is I'd make a NullStatAllocator which would provide methods to generate NullCounter, NullGauge, NullHistogram, which would have all the methods and do none of the work and take minimal space, for some definition of 'minimal'. Maybe truncating all stat names to zero characters would make the maps small and fold lots of stats together. The advantage here is that you could implement this with one new class, and a few injection points during startup. There would be zero extra runtime overhead. |
After some offline chat with @PiotrSikora, it seems that Istio has in practice not had to rely heavily on /stats due to the availability of a number of other diagnosis mechanisms (e.g. Mixer, access logs). A CLI flag to disable (default enable) seems a reasonable optional for now. It doesn't fix the problem - users shouldn't have to chose between stats and scalability, but it unblocks Istio and other folks who need scalability right now. |
@jmarantz That's basically my thought as well. I think we could just create a single NullCounter, NullGauge, etc. Then, the disabled store could give back that one instance to all calls to I think the lazy loading for clusters could be done as @PiotrSikora suggests above - we could just change when generateStats() gets called (and load it into a unique_ptr). There is one issue with lazy loading, however. We'll need to put a lock around the initialization/check to make it thread-safe, which would add a short duration lock to the request path. |
@jmarantz wrt lazily-loading (i.e. allocating on first use) - Istio currently loads routes to all the services in the cluster, which means that even though a particular service might be talking only to a service or two, it has routes and accompanying configuration to all the services, stats for which are allocated but never used, so the savings would be coming from the fact that Envoy is configured with a bunch of clusters that in never uses. |
FWIW, we have debated about this internally as well and we have similar situation like Istio where our Envoys have all the clusters irrespective of whether a service uses a particular cluster or not. One of the options we discussed was to disable mgmt update related metrics like which we really do not care. It might be give us at least 30% reduction and still give the latency/other cluster related metrics for those who want instead of completely disabling entire cluster metrics. Some thing to consider... |
Possible solution: what if, in the config, you set the alt_stat_name to the same string for every cluster? Wouldn't that result in all clusters sharing a common set of stats? |
Catching up here. In general I think allowing people to disable stats they don't want to save memory is acceptable. I would prefer a solution though that maybe is a bit more flexible (though might be slower). Something along the lines of allowing people to supply a list of stats they either want or don't want, possibly via regex? People that don't want any stats can just exclude them all? This would incur a penalty at state creation much like tagging, but I think could be done pretty efficiently otherwise, modulo the fact that there will still need to be maps for keeping track of stat names that point to some NullStat interface that goes to dev/null. Thoughts? |
All of that sounds fine to me. Here's a spectrum of options:
|
@PiotrSikora is there an easy way to repro the results you got with a static command line? |
@ all, there is a WIP PR at #4469 which implements a regex-based exclusion list. This feature would be opt-in, so I feel better about adding a potential performance hit. This PR is not in lieu of a disable-all-stats feature. |
FWIW, I think this is intellectually interesting and if people want to work on it, I'm clearly not going to object, but I think effort is better spent on the other efforts, as well as IMO better educating people on how to use Envoy properly (i.e., let's not just send 150K clusters to an Envoy...). |
@mattklein123 that makes sense...especially as there are relatively straightforward approaches that will yield immediate gains. I just did a quick survey and I think most or all of the maps I was worried about are in admin.cc with brief lifetime, although they still might affect things in practice. And actually that's a very small local PR. The question of whether an Envoy should ever scale to 100k of clusters is an interesting one, and worth getting a definitive answer on before embarking on scale to that. But it seems that even 10k clusters presents scaling challenges. |
IMO even 10K clusters is on the outer edges of what we would expect, at least without moving to incremental with cluster TTLs, but I would be curious what people think about this. In any case, I think we are all in agreement that starting with allowing for stat pruning to get rid of stats people don't care about is probably the way to go? I'm guessing this will save a huge amount of memory, though I do worry about the perf impact but that is easy to measure. |
Generate test files:
Start Envoy and collect memory stats:
|
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
/wait |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
/wait hopefully not too much longer :) |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions. |
I'm going to close this as we have made a lot of progress here thanks to @jmarantz. Let's open new issues on specific enhancements. |
As a result of istio/istio#7912, I've been running bunch of experiments, and it looks that the per listener and per cluster memory overhead is a bit too high...
(numbers with
--concurrency 1 --disable-hot-restart
, include improvements from #4190)There is ~18KB of memory allocated per each TCP listener (no TLS), ~6KB (~35%) of which is allocated for 5 listener stats (assuming the same
stat_prefix
is used for all TCP stats). There is also ~1450 bytes of overhead per thread/worker for each TCP listener, ~900 bytes of which come from stats.There is ~63kB of memory allocated per each HTTP listener (1 route, no TLS), ~18kB (~30%) of which is allocated for 13 listener stats (assuming the same
stat_prefix
is used for all HTTP stats). There is also ~1350 bytes of overhead per thread/worker for each HTTP listener, ~900 bytes of which come from stats.There is ~72kB of memory allocated per each static cluster (no TLS, no health-check), ~58kB (~80%) of which is allocated for 72 stats. There is also ~1850 bytes of overhead per thread/worker for each static cluster.
That means that for each listener and cluster pair, there is ~90kB (TCP) or ~135kB (HTTP) of memory allocated, majority of which is allocated for stats (roughly ~800 bytes per counter/gauge and ~3000 bytes per histogram). Those numbers get even worse when we add TLS to the mix, since it adds 8 more stats for each.
I've noticed that @jmarantz and @ambuc are already working on optimizing stats memory usage in #3585, so I won't be duplicating their work, which leaves a few more drastic options, that potentially yield much bigger savings:
#ifdef
), this is pretty trivial and a huge win, assuming those stats are not consumed,Thoughts?
cc @htuch @louiscryan @costinm @lizan
The text was updated successfully, but these errors were encountered: