Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate a dedicated uuid for federated users instead of using the one from the remote instance #9927

Closed
Tracked by #9735
butonic opened this issue Aug 27, 2024 · 10 comments
Assignees

Comments

@butonic
Copy link
Member

butonic commented Aug 27, 2024

we are only comaring the opaque id of users when checking permissions. Anywhere we are using utils.UserIDEqual().

We want to assign our own uuid for users anyway and store the identity used to authenticate a user as an additional property (list).

For normal users the proxy should exchange the sub@iss claims for our internal uuid and then only compare that uuid.

For federated users we need to create a new uuid when accepting an invite so the user is guaranteed to have a non colliding user id. If there are cases where we should return the original remote userid, e.g. in the graph user listing we can use the opaque id for now.

duh the userid has no opaque id ... only the user struct.

@kulmann
Copy link
Member

kulmann commented Aug 28, 2024

For federated users we need to create a new uuid when accepting an invite so the user is guaranteed to have a non colliding user id.

I don't understand that part. I thought we're using uuids so that we have globally unique ids and don't need to add some issuer or instance id to it.

@butonic
Copy link
Member Author

butonic commented Aug 28, 2024

we just had a lengthy discussion about this ...

The graph api models users with their own id and with a list of identities whereas the CS3 API models user identifiers differently.

On the CS3 API a user has a userid property:

Property Type Description
idp string The identity provider for the user.
opaque_id string the unique identifier for the user in the scope of the identity provider.
type UserType The type of user.

Graph user (only relevant properties)

Property Type Description
id String The unique identifier for the user. Should be treated as an opaque identifier. Inherited from directoryObject. Key. Not nullable. Read-only.
Returned by default. Supports $filter (eq, ne, not, in).
identities objectIdentity collection Represents the identities that can be used to sign in to this user account. Microsoft (also known as a local account), organizations, or social identity providers such as Facebook, Google, and Microsoft can provide identity and tie it to a user account. It might contain multiple items with the same signInType value.
Returned only on $select.
Supports $filter (eq) with limitations.
userType String A string value that can be used to classify user types in your directory. The possible values are Member and Guest. Returned only on $select. Supports $filter (eq, ne, not, in, and eq on null values).
NOTE: For more information about the permissions for members and guests, see What are the default user permissions in Microsoft Entra ID?
libreGraph: For libreGraph we added the possible value Federated that corresponds to the CS3 userid type.

The objectIdentity:

Property Type Description
signInType String Specifies the user sign-in types in your directory, such as emailAddress, userName, federated, or userPrincipalName. federated represents a unique identifier for a user from an issuer that can be in any format chosen by the issuer. Setting or updating a userPrincipalName identity updates the value of the userPrincipalName property on the user object. The validations performed on the userPrincipalName property on the user object, for example, verified domains and acceptable characters, are performed when setting or updating a userPrincipalName identity. Extra validation is enforced on issuerAssignedId when the sign-in type is set to emailAddress or userName. This property can also be set to any custom string. For more information about filtering behavior for this property, see Filtering on the identities property of a user.
issuer String Specifies the issuer of the identity, for example facebook.com. 512 character limit. For local accounts (where signInType isn't federated), this property is the local default domain name for the tenant, for example contoso.com. For guests from other Microsoft Entra organizations, this is the domain of the federated organization, for example contoso.com. For more information about filtering behavior for this property, see Filtering on the identities property of a user.
issuerAssignedId String Specifies the unique identifier assigned to the user by the issuer. 64 character limit. The combination of issuer and issuerAssignedId must be unique within the organization. Represents the sign-in name for the user, when signInType is set to emailAddress or userName (also known as local accounts).When signInType is set to:
  • emailAddress (or a custom string that starts with emailAddress like emailAddress1), issuerAssignedId must be a valid email address
  • userName, issuerAssignedId must begin with an alphabetical character or number, and can only contain alphanumeric characters and the following symbols: - or _
For more information about filtering behavior for this property, see Filtering on the identities property of a user.

Unfortunately, these specifications don't match. The CS3 cannot deal with a user moving from one research institute to another (sub and issuer will change) or with the change of the identity provider product (sub will change because it cannot be imported) because the userid would change. We currently address the mismatch by allowing to configure the oidc claim to use
as the internal userid ... and by ignoring the idp of the CS3 userid when assembling permissions. The proxy makes a user lookup and will use whatever CS3 userid was returned.

While we could just ignore the CS3 userid IdP property internally we still need to be compatible with other CS3 gateways that expect a userid with type, idp and opaque_id.

Internally, it might be easier to conceptualize this using a list of users that have in int id
Instance A has users:

id username password
1 admin secret

Instance B happens to have the same users

id username password
1 admin othersecret

This is basically whet the oc10 user table looks like. It stores credentials. For LDAP users we had a dedicated table that would store the LDAP dn and a fallback uuid to identify which ldap user was trying to log in. And then we had openid connect auth ... which only had configuration on which claim to use as the username, displayname and email.

Ultimately, in oc10 all user metadata was tied together with the accounts table. I am not advocating for a database, just trying to point out that every instance maintains its own set of user metadata. How users are authenticated (basic auth against the users table, ldap auth that uses the dn from the ldap table or oidc which would use a configurable claim to find users in the accounts table) does not affect the internal user id.

Now in oc10 we had to synchronize all users because otherwise it was not passible to share with users that had not signed in. I would argue that we now have an invite service that can be extended to persist share invites for users that have not yet signed in. Searching for users can be done against a central directory or IMO it would be more efficient to use email adresses when inviting someone to a share. That can be fully async and using the invite mechanism even works for users that are not part of the system, yet. It would also unify the invite process for guest accounts. The search could return email addresses from a central LDAP as well as from a user personal list of contacts or recent share recipients. The invite service is responsible for checking if a user is a member of the organization, or if he should provision a guest account and send an email.

I don't think we need to synchronize all users beforehand, but I do think that we need to assign our own internal id to every user and have a mapping between the sign in credentials and the user. The graph api already models that properly, even allowing multiple identities for the same user. Great.

So, should we expose our internal user id on the graph api? Currently, we are doing that. And I think the graph user id works similar to the oidc sub&iss pair. The OIDC sub corresponds to the graph user id and the OIDC issuer (eg https://idp.ocis.test/realms/foo) corresponts to the graph provider (eg http://cloud.ocis.test/graph/v1.0). In theory we could model users to have an internal numeric id und use that when persisting grants in addition to a graphuuid that we expose on the graph as the user id.

Note that two ocis instances will have a different user uuid for the same identifier: I have an IdP at https://idp.mine.tld ind I log in at https://cloud.foo.tld and https://cloud.bar.tld. Both instances will have to exchange whatever sub&iss my idp gives them (and it might give them a PPID so they CANNOT correlate my id by design). If IdPs are meant to give us no user information we will need to ask them to enter missing information, or ask the admin to send at least a displayname claim. user notifications can only be sent when the user han an email. Invites to guest accounts will give us an email address that we can then also use as the displayname in our internal user metadata.

Now I only need to find a way how to fin in the CS3 api ... 🤔

@butonic
Copy link
Member Author

butonic commented Aug 28, 2024

AFAICT even if the CS3 api is intended to be consumed directly by clients it still only represents a single instance. If instances talk to each other they do that using the OCM apis. The gateway has dedicated endpoints to forward and accept invites. as a consequence of inviting a user two federated users are created: one in each instance. So an instance implementing the CS3 api still nedds to keep track of all users. If an implementation would use integers for its users it would have to store the id the remote system provided for the user in addition to the internal id. In our case we roll our own uuid instead of assigning an int.

I think consequently we would have to replace the remote issuer with ourself ... or ... since we are already ignoring it ... drop the issuer.

In any case, to be able to list the identity in the graph api we could keep the original issuer ... but I'd prefer to make identity a dedicated property in the cs3 user. Or we add a graphid property to the user object and treat the userid property as a single identity? urgt hat sucks.

Hm, the cleanest way to represent users with the CS3 user and userid is to use the userid to store our uuid and the type of user. Instead of filling the userid IdP property I'd prefer to store the identities in the opaque blob as json.

@butonic
Copy link
Member Author

butonic commented Aug 28, 2024

After revisiting the code ... and how ocm and federated sharing works I can say that the invite process is only used to create local representations of remote users in both instances. They are completely independent of any actual users and are only relevant when searching for recipients.

They are not used to check access to the ocm share. When sharing a token is generated that is used in bearer auth when accessing the public link share.

So, instead of extending the ocm implementation with sth like this:

type RemoteUser struct {
	*userpb.User
	LocalID string `json:"local_id"`
}

And then having to implement a migration I think we can base64(userid.opaqueid@userid.idp) and use that as the id of the federated user.

The sql implementation for ocm shares uses a ocm_remote_users table:

CREATE TABLE IF NOT EXISTS ocm_remote_users (
    initiator VARCHAR(255) NOT NULL,
    opaque_user_id VARCHAR(255) NOT NULL,
    idp VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL,
    display_name VARCHAR(255) NOT NULL,
    PRIMARY KEY (initiator, opaque_user_id, idp)
)

we could add a migration to add a colum if someone is really using that ... same goes for the json implementation...

@butonic
Copy link
Member Author

butonic commented Aug 28, 2024

hrhr so we are storing a grant in decomposedfs, but only to be able to list it in the propfind... the actual access is only granted on based on the token. The ocmshare scope only jails the requests in the subtree, same as public link shares / scopes.

🤪

@butonic
Copy link
Member Author

butonic commented Aug 28, 2024

so whd do we still have a user id collision? because we mix the federated users into the normal users without replacing their userid with a local id in the graph api. we could in the graph endpoint base64 encode userid@idp and use that as the userid for federated users only.

That leaves the permission checks in decomposedfs ... they still might collide:

normal grants are stored and actually grant permission.
ocm grants are stored as well, but they are only used to efficiently list the remote ocm users that a resource has been shared with.

I need to double check if we take into account the user type

👀

@butonic
Copy link
Member Author

butonic commented Aug 28, 2024

ok, when persisting a grant for a federated user we store a different string than for other users:

func UserAce(id *userpb.UserId) string {
	if id.GetType() == userpb.UserType_USER_TYPE_FEDERATED {
		return "u:" + id.OpaqueId + "@" + id.Idp
	}

	return "u:" + id.OpaqueId
}

UserACE is used as the principal.

Now, when parsing it back we never try to split this again. Turning an ACE inte a grant blindly uses the complete principal:

// Grant returns a CS3 grant
func (e *ACE) Grant() *provider.Grant {
	// if type equals "D" we have a full denial which means an empty permission set
	permissions := &provider.ResourcePermissions{}
	if e._type == "A" {
		permissions = e.grantPermissionSet()
	}
	g := &provider.Grant{
		Grantee: &provider.Grantee{
			Type: e.granteeType(),
		},
		Permissions: permissions,
		Creator:     userIDFromString(e.creator),
	}
	id := e.principal[2:]
	if e.granteeType() == provider.GranteeType_GRANTEE_TYPE_GROUP {
		g.Grantee.Id = &provider.Grantee_GroupId{GroupId: &grouppb.GroupId{OpaqueId: id}}
	} else if e.granteeType() == provider.GranteeType_GRANTEE_TYPE_USER {
		g.Grantee.Id = &provider.Grantee_UserId{UserId: &userpb.UserId{OpaqueId: id}}
	}

	if e.expires != 0 {
		g.Expiration = &typesv1beta1.Timestamp{
			Seconds: uint64(e.expires / int64(time.Second)),
			Nanos:   uint32(e.expires % int64(time.Second)),
		}
	}

	return g
}

@butonic
Copy link
Member Author

butonic commented Sep 2, 2024

I need a review of cs3org/reva#4833

@butonic
Copy link
Member Author

butonic commented Sep 4, 2024

cs3org/reva#4833 got merged, needs reva bump #9981

@butonic
Copy link
Member Author

butonic commented Sep 6, 2024

all merged

@butonic butonic closed this as completed Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants