-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement optional in-memory proxy cache #3320
Conversation
In HA mode does is this cache set for the Auth node or the Proxy Node, or both? Depending what it controls we might want to put the setting under the specific yaml section. Should we provide guidelines or how the in-memory cache works in AWS, if a customer is using DynamoDB, do they also need to set the cache for Teleport to scale? Lastly, should we output diagnostic information to Also, while clearing out my e-mail. I wonder if this will also help this issue #2870 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benarent We should document that in this mode proxies will initialize their cache on boot. This means you trade availability (if proxies are rebooted during an outage of Auth Servers, they won't be able to start) for performance (can scale to a larger number of nodes).
lib/reversetunnel/conn.go
Outdated
return trace.ConnectionProblem(nil, "discovery channel overflow at %v", len(c.newProxiesC)) | ||
// Missing proxies update is no longer critical with more permissive | ||
// discovery protocol that tolerates conflicting, stale or missing updates | ||
c.log.Warnf("discovery channel overflow at %v", len(c.newProxiesC)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalization and punctuation.
@fspmarshall ping |
Auth servers are always using in-memory cache, they do not persist the private key material of CA to disk. In HA mode this affects proxies and nodes, with this cache, as @russjones noted, proxies will not be able to tolerate the auth servers outage after the proxies restart, the cache data will be lost. Right now by default proxy servers will tolerate auth servers outage even if proxies reboot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
41413c8
to
6be9331
Compare
retest this please |
This commit resolves #3227 In IOT mode, 10K nodes are connecting back to the proxies, putting a lot of pressure on the proxy cache. Before this commit, Proxy's only cache option were persistent sqlite-backed caches. The advantage of those caches that Proxies could continue working after reboots with Auth servers unavailable. The disadvantage is that sqlite backend breaks down on many concurrent reads due to performance issues. This commit introduces the new cache configuration option, 'in-memory': ```yaml teleport: cache: # default value sqlite, # the only supported values are sqlite or in-memory type: in-memory ``` This cache mode allows two m4.4xlarge proxies to handle 10K IOT mode connected nodes with no issues. The second part of the commit disables the cache reload on timer that caused inconsistent view results for 10K displayed nodes with servers disappearing from the view. The third part of the commit increases the channels buffering discovery requests 10x. The channels were overfilling in 10K nodes and nodes were disconnected. The logic now does not treat the channel overflow as a reason to close the connection. This is possible due to the changes in the discovery protocol that allow target nodes to handle missing entries, duplicate entries or conflicting values.
6be9331
to
cf83fb3
Compare
retest this please |
14 similar comments
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
This commit resolves #3227
In IOT mode, 10K nodes are connecting back to the proxies, putting
a lot of pressure on the proxy cache.
Before this commit, Proxy's only cache option were persistent
sqlite-backed caches. The advantage of those caches that Proxies
could continue working after reboots with Auth servers unavailable.
The disadvantage is that sqlite backend breaks down on many concurrent
reads due to performance issues.
This commit introduces the new cache configuration option, 'in-memory':
This cache mode allows two m4.4xlarge proxies to handle 10K IOT mode connected
nodes with no issues.
The second part of the commit disables the cache reload on timer that caused
inconsistent view results for 10K displayed nodes with servers disappearing
from the view.
The third part of the commit increases the channels buffering discovery
requests 10x. The channels were overfilling in 10K nodes and nodes
were disconnected. The logic now does not treat the channel overflow
as a reason to close the connection. This is possible due to the changes
in the discovery protocol that allow target nodes to handle missing
entries, duplicate entries or conflicting values.