-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] SPIRE Agent Cache Redesign Proposal #2940
Comments
Expanding on changes required to FetchX509SVID rpc: |
Background
The RFC provides the background and discusses the scaling problem caused due to current SPIRE Agent cache design which stores X.509-SVIDs. This proposal builds on top of approaches discussed in RFC and provides a more detailed redesign proposal.
Existing Behavior
All SPIRE Agent RPCs and SPIRE Agent-Server sync implementations are based on the fact that all authorized entries and corresponding SVIDs are cached locally. The existing cache implementation is as following:
SPIRE Agent-Server sync
During the periodic sync, SPIRE Agent fetches all authorized entries and bundles from the SPIRE Server. It calculates the missing/deleted entries and expiring SVIDs and updates them in the cache. The subscribers of these affected SVIDs are notified accordingly.
SPIRE Agent RPCs
SPIRE Agent supports 2 types of RPCs:
In a nutshell, SVIDs get added/updated/removed to/from cache during SPIRE Agent-Server sync. The RPCs which need SVIDs, are listening on grpc streams for SVID updates. Some RPCs(like FetchJWTSVID) need details about SPIFFE IDs/ authorized registration entries which the Agent is responsible for in order to validate request parameters.
Proposal
Today the existing cache model stores registration entries along with X509-SVIDs. If registrationEntryCache is separated from x509SVIDCache then it will allow us to store all authorized registration entries and a limited number of X509-SVIDs in Agent memory. The X509SVIDCacheSize will have a high default value which can be overridden via a new experimental configuration field of SPIRE Agent.
FetchJWTSVID
Since all the authorized registration entries will be cached, FetchJWTSVID implementation will mostly remain the same.
FetchX509SVID
RegistrationEntryAndBundleSync
SVIDCacheSync
4.1. The most recently-used statistic is not necessarily an indicator of an identity being needed again soon. However, there could be cases where some workloads run on dedicated hardware and are more likely to be scheduled on the same host. This heuristic is better when compared against randomly adding entries for the remainder of size of cache.
4.2. If the remaining cache size is negative after all SVID records with active subscribers are accounted for; inactive SVID records containing lowest “last subscription timestamp” will be removed to get the cache size down to the configured limit. This timestamp will be 0 for SVID records which never had subscribers since joining the cache.
4.3. If the remaining cache size is positive, then we request SVID signings for Registration Entries not represented in the SVID cache, up to the cache limit size.
7.1. Prevents potential DDoS concerns for the scenario when a large number of workloads across the infrastructure are launched around the same time and don't have their identities cached in the local agent. If we made the SVID signing calls to the Server synchronous in the Agent handlers for this case, we could have a potentially unbounded number of signing requests to SPIRE Server.
7.2. Server requests happen in a different context than the Agent Workload API handler contexts, which eliminates some potential retry complexity in the client code for the case when the Server APIs return an error.
New Cache Models
authorizedEntries < X509SVIDCacheSize
then there is no functional change to any of RPC behavior compared at present. There may be a short wait for first fetch SVID due to new delay introduced between the entry sync and SVID sync.Please find the following high level approach of how we can split the current Cache struct.
Request For Comments
SVIDCacheSync
frequency?The text was updated successfully, but these errors were encountered: