-
Notifications
You must be signed in to change notification settings - Fork 7k
Description
Status: Open for comments
Need
This proposal is a replacement for #3870, which will be closed. There is a lot of context and discussion in that issue, which serves as background to this. Rather than polluting and extending all the discussion in there, let's start with a clean slate - but we suggest reading it.
The short summary is:
- There is an authentication system that is responsible for establishing the identity of the current user in Backstage.
- There is a catalog that may or may not contain organizational data: users and groups, and their relationships.
- There are entities in the catalog, that have an
ownerfield that is filed in either by users or by auxiliary systems such as codeowners.
Given the above, devise a solution that
- In a sensible way lets the logged in user see the entities that they own, for a definition of the word "own" that makes sense to the user
- Supports hand written
spec.ownerentity fields primarily - Does not require that organizational data exists or is complete in the catalog
As a lower priority goal, the solution might
- Support ownership of multiple distinct "domains" of entities, for example if there are entities that are sourced from multiple different companies (perhaps into separate namespaces), or from open-source VCS as well as internal VCS, where the user has different identities in each.
- Play well with codeowners
Current State
The current identity API presents the following pieces of information, that the auth backend supplies after signing in:
- A user ID, which is an opaque string that somehow represents the user's identity
- An ID token, issued by Backstage, that can be used to verify the user
- Some profile information such as the full name, email, and picture.
Different auth providers in the backend may provide this info in different ways. They are all hard coded and not pluggable - except if you replace or add an entire provider yourself.
When the catalog frontend plugin wants to get the current user entity, it does so by looking for a User kind entity in the default namespace with a metadata.name that matches what came out of the identity API.
When the catalog wants to deduce whether you are an owner of an entity, it does so by requiring the above User entity and comparing it with the target entity via relations. If you do not have a User entity that matches, you are out of luck.
Proposal
Let the identity of the current user, as returned from the auth backend plugin and exposed to the frontend via the identityApiRef, be extended as follows:
- It contains a new field
claims, which is an array of entity references. Each is a claim about an identity or a membership that is relevant to the user, for exampleUser:default/frebenandGroup:default/my-team-name. There is no requirement that these correspond to actual existing catalog entities. - The existing ID token is extended to contain these claims as well.
Let the definition of the ownership of an entity E, for a user U, be as follows:
- Get all the
ownedByrelations of E, and call them O - Get all the claims of the user U and call them C
- If any C matches any O, return
true - Get all
Groupentities that U is a member of, using the regularmemberOf/hasMemberrelation mechanism, and call them G - If any G matches any O, return
true - Otherwise, return
false
Of course, this is a formalization and the actual process can be heavily cached with a single lookup at login of all your claims and extending them with the corresponding memberships etc.
The auth providers of the auth backend will be extended with a new plugin interface, effectively
// Takes basically the passport response, and somehow produces claims
type AuthClaimBuilder = (authResponse: any) => Promise<List<EntityName>>;The default implementation of these, when not overridden by the Backstage integrator, will be similar to the user ID logic of today, but on a User entity form. So if today's identityApiRef would have a getUserId() returning 'freben', then there would be a single corresponding claim User:default/freben.
Users may replace this logic in any way that they see fit. An example flow could be,
- Authentication happens with a Google sign in, resulting in a
john.knowles@example.comemail as the established identity in the passport response. - The custom
AuthClaimBuildermakes a company specific lookup in the local LDAP system by email, to find that the LDAP identity of the user isjohn, member of the groupinfra-ninjas. Via custom attributes it is found that his public GitHub user isjohnHaxxesBezt, and in GHE he is known to bejohnbecause the company by convention has GHE usernames equal to LDAP usernames. - The same function makes a request to GitHub's and GHE's APIs to look up the domain specific group memberships of the above users.
- The final list of claims is
user:default/john,group:default/infra-ninjas,user:github/johnHaxxesBezt, and a a number of groups that were returned by the GitHub/GHE APIs.
An important point here is that the auth backend does not need to have any connection to the catalog. The only way that this is linked to the catalog, is that the claims are on the form of complete entity references instead of just standalone IDs. Whether you import your org data or not, the Backstage integrator can choose the usage of ID spaces and namespaces freely here, and make those correctly aligned with other plugins' configuration.
Further Work (Optional)
There is one missing piece to sew this up properly with e.g. codeowners in multiple version control systems. Note how this is not explicitly in scope for this RFC, as mentioned at the top.
Notice how the johnHaxxesBezt user was placed in a namespace they chose to call github. Entity definition files placed on public GitHub can't reasonably be expected to have their spec.owner set to internal, secret LDAP group names; you'd put the GitHub team names in there. Same thing goes if you are using codeowners; those would use the public GitHub user or team identities as well.
This means that as we ingest these yaml files out of GitHub into the catalog, the end result entity as stored in the catalog should have ownedBy relations using the namespace github - because that's what the Backstage integrator chose to use here. For example, User:github/johnHaxxesBezt or Group:github/snoo-maintainers. Remember that this github string is not set in stone nor is it a convention; it could be anything that the integrator chose, on a per-version-control-system basis.
There are a few ways of achieving this.
One is to tell users to write their yaml files on public GitHub with an explicit namespace in the owner field. They will then have to remember to do so, because if they don't, the generated relation will instead point to the default namespace where it likely will not match an actual group - or, perhaps worse, match the wrong group that happens to be similarly named. It also doesn't make a lot of sense as seen from "the outside"; the string github was chosen entirely by the internal integrator and therefore this is a very leaky abstraction. Nonetheless, it is a solution that will work on a technical level, without any additional code changes. If it is a rare occurrence, it can be the initial approach to pick.
spec:
owner: github/snoo-maintainersThe alternative approach is to make processors namespace aware, on a per-integration basis. For example, we could conceivably add a defaultNamespace parameter to the integrations config. The codeowners processor could take this into account, and fill in the owner field with that namespace. We could even make a general pre processor step that injects these namespaces, only when missing, in all such fields for all entities, based on their origin location.
integrations:
github:
- host: github.com
defaultNamespace: github
token:
$env: GITHUB_TOKENAlternatives
See #3870 for discussions.
Specifically, this RFC does NOT address the purely token-based ideas mentioned therein. We welcome discussions under this new RFC, to see how it may align with a possible token-based approach.
Risks
See #3870.