Implement optional in-memory proxy cache #3320

klizhentas · 2020-02-02T19:54:12Z

This commit resolves #3227

In IOT mode, 10K nodes are connecting back to the proxies, putting
a lot of pressure on the proxy cache.

Before this commit, Proxy's only cache option were persistent
sqlite-backed caches. The advantage of those caches that Proxies
could continue working after reboots with Auth servers unavailable.

The disadvantage is that sqlite backend breaks down on many concurrent
reads due to performance issues.

This commit introduces the new cache configuration option, 'in-memory':

teleport:
  cache:
    # default value sqlite,
    # the only supported values are sqlite or in-memory
    type: in-memory

This cache mode allows two m4.4xlarge proxies to handle 10K IOT mode connected
nodes with no issues.

The second part of the commit disables the cache reload on timer that caused
inconsistent view results for 10K displayed nodes with servers disappearing
from the view.

The third part of the commit increases the channels buffering discovery
requests 10x. The channels were overfilling in 10K nodes and nodes
were disconnected. The logic now does not treat the channel overflow
as a reason to close the connection. This is possible due to the changes
in the discovery protocol that allow target nodes to handle missing
entries, duplicate entries or conflicting values.

benarent · 2020-02-02T20:18:39Z

In HA mode does is this cache set for the Auth node or the Proxy Node, or both? Depending what it controls we might want to put the setting under the specific yaml section.

Should we provide guidelines or how the in-memory cache works in AWS, if a customer is using DynamoDB, do they also need to set the cache for Teleport to scale?

Lastly, should we output diagnostic information to /metrics

Also, while clearing out my e-mail. I wonder if this will also help this issue #2870 (comment)

russjones

@benarent We should document that in this mode proxies will initialize their cache on boot. This means you trade availability (if proxies are rebooted during an outage of Auth Servers, they won't be able to start) for performance (can scale to a larger number of nodes).

russjones · 2020-02-04T00:21:33Z

lib/reversetunnel/conn.go

-		return trace.ConnectionProblem(nil, "discovery channel overflow at %v", len(c.newProxiesC))
+		// Missing proxies update is no longer critical with more permissive
+		// discovery protocol that tolerates conflicting, stale or missing updates
+		c.log.Warnf("discovery channel overflow at %v", len(c.newProxiesC))


Capitalization and punctuation.

klizhentas · 2020-02-04T04:28:17Z

@fspmarshall ping

klizhentas · 2020-02-04T04:30:29Z

@benarent

Auth servers are always using in-memory cache, they do not persist the private key material of CA to disk.

In HA mode this affects proxies and nodes, with this cache, as @russjones noted, proxies will not be able to tolerate the auth servers outage after the proxies restart, the cache data will be lost. Right now by default proxy servers will tolerate auth servers outage even if proxies reboot.

fspmarshall

Looks good to me. Note that this PR includes an older version of the changes from #3305 which could create minor merge conflict. Might be best to either remove those changes here, or port the new state of #3305 and close that PR.

klizhentas · 2020-02-05T16:50:49Z

retest this please

This commit resolves #3227 In IOT mode, 10K nodes are connecting back to the proxies, putting a lot of pressure on the proxy cache. Before this commit, Proxy's only cache option were persistent sqlite-backed caches. The advantage of those caches that Proxies could continue working after reboots with Auth servers unavailable. The disadvantage is that sqlite backend breaks down on many concurrent reads due to performance issues. This commit introduces the new cache configuration option, 'in-memory': ```yaml teleport: cache: # default value sqlite, # the only supported values are sqlite or in-memory type: in-memory ``` This cache mode allows two m4.4xlarge proxies to handle 10K IOT mode connected nodes with no issues. The second part of the commit disables the cache reload on timer that caused inconsistent view results for 10K displayed nodes with servers disappearing from the view. The third part of the commit increases the channels buffering discovery requests 10x. The channels were overfilling in 10K nodes and nodes were disconnected. The logic now does not treat the channel overflow as a reason to close the connection. This is possible due to the changes in the discovery protocol that allow target nodes to handle missing entries, duplicate entries or conflicting values.

klizhentas · 2020-02-05T17:22:18Z

retest this please

klizhentas · 2020-02-05T17:38:41Z

retest this please

klizhentas · 2020-02-05T19:07:41Z

retest this please

klizhentas · 2020-02-05T19:16:14Z

retest this please

klizhentas · 2020-02-05T19:43:18Z

retest this please

klizhentas · 2020-02-05T19:54:40Z

retest this please

klizhentas · 2020-02-05T21:02:36Z

retest this please

klizhentas · 2020-02-05T21:10:03Z

retest this please

klizhentas · 2020-02-05T22:22:23Z

retest this please

klizhentas · 2020-02-05T22:41:02Z

retest this please

klizhentas · 2020-02-05T23:59:43Z

retest this please

klizhentas · 2020-02-06T00:34:34Z

retest this please

klizhentas · 2020-02-06T04:13:50Z

retest this please

klizhentas · 2020-02-06T16:20:23Z

retest this please

klizhentas · 2020-02-06T16:21:58Z

retest this please

klizhentas requested review from benarent, russjones and fspmarshall February 2, 2020 19:54

webvictim changed the title ~~Sasha/in memory~~ Implement optional in-memory proxy cache Feb 3, 2020

russjones approved these changes Feb 4, 2020

View reviewed changes

fspmarshall approved these changes Feb 4, 2020

View reviewed changes

klizhentas force-pushed the sasha/in-memory branch 2 times, most recently from 41413c8 to 6be9331 Compare February 5, 2020 16:28

klizhentas force-pushed the sasha/in-memory branch from 6be9331 to cf83fb3 Compare February 5, 2020 17:03

klizhentas merged commit a22f7be into master Feb 6, 2020

klizhentas deleted the sasha/in-memory branch March 15, 2021 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement optional in-memory proxy cache #3320

Implement optional in-memory proxy cache #3320

klizhentas commented Feb 2, 2020

benarent commented Feb 2, 2020 •

edited

Loading

russjones left a comment

russjones Feb 4, 2020

klizhentas commented Feb 4, 2020

klizhentas commented Feb 4, 2020

fspmarshall left a comment

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 6, 2020

klizhentas commented Feb 6, 2020

klizhentas commented Feb 6, 2020

klizhentas commented Feb 6, 2020

Implement optional in-memory proxy cache #3320

Implement optional in-memory proxy cache #3320

Conversation

klizhentas commented Feb 2, 2020

benarent commented Feb 2, 2020 • edited Loading

russjones left a comment

Choose a reason for hiding this comment

russjones Feb 4, 2020

Choose a reason for hiding this comment

klizhentas commented Feb 4, 2020

klizhentas commented Feb 4, 2020

fspmarshall left a comment

Choose a reason for hiding this comment

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 5, 2020

klizhentas commented Feb 6, 2020

klizhentas commented Feb 6, 2020

klizhentas commented Feb 6, 2020

klizhentas commented Feb 6, 2020

benarent commented Feb 2, 2020 •

edited

Loading