Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: JWT SVID support for SVIDStore #2752

Closed
hellerda opened this issue Feb 8, 2022 · 3 comments
Closed

Feature Request: JWT SVID support for SVIDStore #2752

hellerda opened this issue Feb 8, 2022 · 3 comments
Assignees
Labels
priority/backlog Issue is approved and in the backlog stale unscoped The issue needs more design or understanding in order for the work to progress

Comments

@hellerda
Copy link

hellerda commented Feb 8, 2022

Hi Folks. My organization is looking to apply SPIFFE and SPIRE for Serverless-style authentication, but we use primarily the JWT token form of credential for service-to-service authentication, with OIDC as the primary interface for federating with other trust domains and ID providers.

As full Push-style Serverless is not yet available in SPIRE, we are considering the SVID Store approach. But SVID Store currently supports x509 SVID only; we would like to see it support JWT SVID. From looking at the SVIDStore code (and thanks for the awesome contribuition, @amartinezfayo!) it seems like it could be done like this.

  1. Attech an JWTSVIDCache to the "svid_store" cache, in the same way one is attached to the "workload" cache.
  2. Add a way to "pre-fetch" a JWT SVID for a given "audience" list, when the workload registration entry is created, in the same way a x509 SVID is pre-fetched. I envision this done through an "-audiences" option to the "entry create" command, with the audience list stored with the "registered_entries" record in the DB, or in a separated DB table. To do this, the "svid_store" cache manager would call Manager's FetchJWTSVID() function immediately after that workload's x509 SVID is fetched.
  3. Surface the cached JWT SVID(s) to the "svidstore" plugins, like by attaching it to the record returned by the "svid_store" cache ReadyToStore() function.

Does this seem doable, and can you elaborate on any concerns you may have?

Thanks!

@dennisgove
Copy link
Contributor

[I'm very sorry for the brain-dump that this comment has turned into]

tl;dr: The ability for SPIRE to pre-fetch X509-SVIDs is a fantastic feature and allows for significant scaling of agents and workloads while still providing fast and reliable X509-SVID acquisitions. I think SPIRE would benefit from supporting pre-fetching for all current and future SVID types.

I've been exploring this path as well and also have a desire to pre-fetch JWT-SVIDs. Afaik, there are two main blockers for such support.

  1. JWT-SVIDs had (until v1.5.0) a hard-coded and unchangeable TTL of 5 minutes. This restriction led to a fairly reasonable statement that because JWT-SVIDs need to be re-generated so frequently it just doesn't make sense to pre-fetch them. The ROI of that pre-fetching just wasn't significant enough to justify the complexity of it. However, PR Issue #2700: Adds support for X509 and JWT specific SVID TTLs #3445 has added support for configurable JWT-SVID TTLs thus allowing them to have lifetimes far larger than the previous 5 minutes. This makes pre-fetching more practical and (I think) removes the main difference between X509 and JWT SVIDs wrt pre-fetching support. Because they both have equally configurable lifetimes, I think they're both on equal footing regarding the ROI for pre-fetching.

  2. When requesting a JWT-SVID, the client is able to provide an arbitrary aud (audience) claim value which will be included in the generated token. This request-time value makes it impossible to pre-generate (and thus pre-fetch) JWT-SVIDs. @hellerda is correct to suggest that if pre-fetching of JWT-SVIDs is to be supported then there must be a way for us to indicate for each entry registration a list of known aud values so that a JWT-SVID could be generated for each one.

I'm considering blocker 1 solved and a non-issue. Going forward, I'll focus on blocker 2.

This particular issue is related to JWT-SVIDs and as such the discussion and examples will focus on JWTs, but I believe the concepts may also apply to X509-SVIDs.

The basic problem comes about because of how different SVIDs may be used by a Workload and their generally accepted industry best-practices. While not required, it's generally standard practice for JWTs to include an aud claim whose value is the "the intended receiver of this JWT". For example, in the following diagram, the client application sends requests to two different services. The JWT-SVID sent to each includes an aud value of the FQDN of the receiving service.

jwt-basic-aud

Such a thing distorts SPIFFE's SVID definition of "an SVID is the document with which a workload proves its identity to a resource or caller" because a JWT-SVID is no longer identifying just the Workload, it's also identifying the context in which this identity may be used.

Other claims within a JWT-SVID may also be context-specific. For example, even requests to the same service but for different purposes may each require their own scoped SVIDs with differing TTLs.

jwt-basic-scope

The point is, while the SPIFFE ID is static, the context in which the SVID may be used requires distinct documents with varying claims. I'm not familiar enough with X509s to say for certain that the same situation exists for those SVID types, but I imagine that there exists at least one Certificate Extension which contextualizes/limits the use of the certificate. As such, I'm approaching this issue with the idea that it needs to be solved for all SVID types, not just JWT-SVIDs.

SPIRE's Workload Registration process allows us to create multiple registration entries pointing to the same SPIFFE ID. The purpose is to support situations where different attestable values (selectors) result in the same identity. That is, a client-app instance running in a K8s pod can have the same identity as an instance running as a Windows process - each may be attested with different selectors, but in the end they are the same identity. This mechanism allows us to pre-register acceptable Workload run contexts.

However, there is no equivalent mechanism to pre-register acceptable Workload usage contexts. Currently, the only mechanism allowing usage context to come into play is the FetchJWTSVID API (and accompanying CLI commands) in which one can specify the aud value(s) they'd like included in the generated JWT-SVID. However, this is a runtime action requiring that a new JWT-SVID be generated when requested, and all of the associated costs of that generation. (NOTE: there is some level of caching after a JWT-SVID has been generated, but that benefit ends when the JWT expires)

This missing mechanism prevents the pre-generation (and thus pre-fetching) of any usage context specific SVIDs.

Issue #3253 discusses the introduction of a mechanism to customize SVID fields. Multiple approaches are discussed, including changes to the registration entry and a new plugin type. @azdagron's comment here explains that project Maintainers would like to avoid expanding the data stored registration entry because doing so requires a significant amount of effort. The result is that a new CredentialComposer plugin was added to allow customization of SVID fields.

The plugin allows for conditional and programmatic inclusion of additional (or exclusion of existing) field values in multiple SVID types. During SVID generation, the plugin is sent the SPIFFE ID and current set of attributes that will be put into the SVID, and is able to return an alternative set of attributes to include in the SVID.

jwt-credcomp

However, the CredentialComposer plugin, while fantastic in all accounts, doesn't resolve the above issue. The plugin allows for changing the attributes that will be placed into a single SVID, but the SVID must already be going through the generation process for the plugin to have any effect. As such, it can't be a mechanism to pre-generate JWT-SVIDs with context-specific claims.

This inability to store desired SVID attributes with an entry registration record will continue to hinder SPIRE's ability to manage context specific SVIDs and, I believe, will lead to necessary misuse of the selector concept. I believe the only solution is to support SVID attributes as a property of entry registration records.

There are numerous ways such support could be implemented, each with their own pros and cons. I'd like to list a few, but these are by no means the only ones.

Individual Attribute Columns in registered_entries Table

Every attribute that might be included in a SVID would be given it's own column.

id spiffe_id ... audience scope
123 "spiffe://..." ... "https://foo.example.com" "read-data"
124 "spiffe://..." ... "https://foo.example.com" "write-data"

PROS:

  • Doesn't require any fundamental datastore changes, interaction will remain the same
  • Is very clear what data will appear in the resulting SVID

CONS:

  • Every new attribute requires a very expensive database change
  • Number of columns in registered_entries table grows indefinitely
  • Not all users of SPIRE will want to support all supported attribute values, but will still pay the cost of each
  • Multi-value attributes will be tricky to support
  • Requires a unique registration entry for each possible SVID
  • Additional database read/write load
  • No distinction between SVID type specific attributes (X509, JWT, other)

Single Attribute Column in registered_entries Table

A single new column svid_attributes would be added to the registered_entries table, supporting a dynamic key-value structure.

id spiffe_id ... svid_attributes
123 "spiffe://..." ...
{ 
  "aud": "https://foo.example.com", 
  "scope": ["read-data"] 
}
124 "spiffe://..." ...
{ 
  "aud": "https://foo.example.com", 
  "scope": ["write-data"] 
}

PROS:

  • Doesn't require any fundamental datastore changes
  • Is very clear what data will appear in the resulting SVID
  • Arbitrary attributes are easily supported without added cost

CONS:

  • Requires a unique registration entry for each possible SVID
  • Requires parsing of svid_attributes column
  • Allows for accidental duplicate records
  • No distinction between SVID type specific attributes (X509, JWT, other)

Add New svid_attributes Table

Similar to the selectors table, this new svid_attributes table would have one row per entry/attribute pair.

registered_entry_id name value
123 "aud" "https://foo.example.com"
123 "scope" "read-data"
124 "aud" "https://foo.example.com"
124 "scope" "write-data"

PROS:

  • Conceptually similar to how selectors are handled
  • Does not modify registered_entries table
  • Is very clear what data will appear in the resulting SVID
  • Arbitrary attributes are easily supported without added cost

CONS:

  • Suffers same read/write performance costs of selectors table
  • Increases use of a star-schema style structure
  • May be hard to reason about
  • Drastically increases the total # of rows necessary to read when getting registration entries
  • Requires app-level consolidation of SVID attributes from multiple row reads
  • Multi-value attributes will be tricky to support
  • No distinction between SVID type specific attributes (X509, JWT, other)

Full Restructuring of Entry Registration

This is more of a wholesale change to the structure of Workload registration. The central idea is that there are fundamental differences between Entries, Entry Instances, and Entry Instance SVIDs which SPIRE could expose more concretely.

Table entries

id type spiffe_id ...
1 "agent" "spiffe://example.com/agent/abcdef" ...
2 "workload" "spiffe://example.com/client-app" ...
3 "workload" "spiffe://example.com/foo-service" ...
4 "workload" "spiffe://example.com/baz-service" ...

Table entry_instances

id entry_id selectors ...
1 1
{
  "k8s_sat:cluster":  "MyCluster",
  "k8s_sat:agent_ns": "production",
  "k8s_sat:agent_sa": "spire-agent"
}
...
2 2
{
  "k8s:ns": "example-namespace",
  "k8s:sa": "client-app",
  "k8s:container-image": "..../client-app:1.2.3"
}
...
3 2
{
  "windows:user_name": "client-app",
  "windows:path":      "C:\....\client-app.tsk",
  "windows:sha256":    "abc...."
}
...

Table entry_svids

id entry_instance_id type ttl attributes ...
1 1 x509 24h
{
  "country":  "US",
  "organization": "Example",
  "common_name": "....my-awesome-agent"
}
...
2 2 x509 1h
{
  "country":  "US",
  "organization": "Example",
  "common_name": "...client-app"
}
...
3 2 jwt 1h
{
  "aud":  "https://foo.example.com",
  "scope": ["read-data"],
}
...
4 2 jwt 1m
{
  "aud":  "https://foo.example.com",
  "scope": ["write-data"],
}
...

PROS:

  • Concretely separates the meaning of an Entry, and Entry Instance, and Entry SVIDs
  • Easily supports new SVID types with no necessary database changes
  • Supports explicit declaration of what appears in each generated SVID
  • Allows for pre-generation of all SVID types
  • Is very clear what data will appear in each resulting SVID
  • Arbitrary attributes are easily supported without added cost
  • Adds a level of future-proofing

CONS:

  • It is a massive change to SPIRE's datastore and subsequent handling
  • The migration path requires data duplication for a period of time

There are probably other approaches that I haven't listed.

The ability for SPIRE to pre-fetch X509-SVIDs is a fantastic feature and allows for significant scaling of agents and workloads while still providing fast and reliable X509-SVID acquisitions. I think SPIRE would benefit from supporting pre-fetching for all current and future SVID types.

@evan2645 evan2645 added priority/backlog Issue is approved and in the backlog unscoped The issue needs more design or understanding in order for the work to progress labels Nov 15, 2022
Copy link

This issue is stale because it has been open for 365 days with no activity.

@github-actions github-actions bot added the stale label Nov 15, 2023
Copy link

This issue was closed because it has been inactive for 30 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Issue is approved and in the backlog stale unscoped The issue needs more design or understanding in order for the work to progress
Projects
None yet
Development

No branches or pull requests

4 participants