[DRAFT] Add support for Silo groups #1358

jmpesp · 2022-07-06T03:12:52Z

Add Silo groups, and the ability to grant them roles.

Add the necessary logic to:

provision an "admin" silo group when a silo is created, which is
granted silo admin role.
after successful authentication, create groups during silo user
provision if the Silo's provision type is JIT.
add a group's roles to a user's role set if they're part of that
group.

Silos now have an optional admin_group_name that is configured at silo
provision time. If this is left out, users will currently have no way to
be granted roles when they first log in. In the future, this may be
selected and groups would be created another way.

SAML identity providers now have an optional group_attribute_name that
configures what attribute represents a group name.

Add Silo groups, and the ability to grant them roles. Add the necessary logic to: - provision an "admin" silo group when a silo is created, which is granted silo admin role. - after successful authentication, create groups during silo user provision if the Silo's provision type is JIT. - add a group's roles to a user's role set if they're part of that group. Silos now have an optional admin_group_name that is configured at silo provision time. If this is left out, users will currently have no way to be granted roles when they first log in. In the future, this may be selected and groups would be created another way. SAML identity providers now have an optional group_attribute_name that configures what attribute represents a group name.

jmpesp · 2022-07-08T18:48:01Z

This is ready for review now :)

davepacheco

Nice. I gather we're also going to need an API to list groups in the Silo, groups that you're a member of, and maybe members of a group that you're in. I'm not saying that has to be in this PR.

I gather that like users, we're going to identify groups in the API by their uuid and treat the external id like the user's display name -- that's what we show everywhere, but not what we use as the id.

common/src/sql/dbinit.sql

davepacheco · 2022-07-13T15:39:39Z

common/src/sql/dbinit.sql

+    silo_user_id
+);
+
+CREATE INDEX ON omicron.public.silo_group_membership (


I think you need (silo_user_id, silo_group_id) here. It's a little weird to have two indexes whose columns are the same except for the order, but I don't see a way around it. We want to paginate the users in a group as well as the groups that a user is in and I think we need two indexes to do that.

davepacheco · 2022-07-13T16:52:12Z

nexus/src/db/datastore.rs

+    ) -> ListResultVec<SiloGroupMembership> {
+        opctx.authorize(authz::Action::ListChildren, authz_silo).await?;
+
+        use db::schema::silo_group_membership::dsl;


We're going to want to paginate this.

If we make the other change I suggested earlier, you won't need this function. I think it would be really useful to have APIs for a user to list their own groups and to list the users in a group. Those should be paginated.

nexus/src/db/datastore.rs

nexus/src/db/model/silo_group.rs

nexus/src/authz/roles.rs

- primary key instead of unique index - reverse silo group unique index order (for query and pagination) - remove silo id index - add missing filter on time_deleted being null

davepacheco

Still working through more details here but I wanted to leave this while I have it.

davepacheco · 2022-07-26T15:50:01Z

common/src/sql/dbinit.sql

 );

-CREATE UNIQUE INDEX ON omicron.public.silo_group (
-    name,
+CREATE INDEX ON omicron.public.silo_group (


What is this index for? (I'm generally suspicious of non-unique indexes because they can't be used for lookup and they can't be used for pagination when the page size is smaller than the number of duplicate rows in the index. I see that we're doing this a lot though and I filed #1497.)

If I remove it, I see a full table scan error for

DELETE FROM "silo_group_membership" WHERE ("silo_group_membership"."silo_group_id" = ANY(SELECT "silo_group"."id" FROM "silo_group" WHERE (("silo_group"."silo_id" = $1) AND ("silo_group"."time_deleted" IS NOT NULL))))

meaning it's coming from the silo delete code:

let updated_rows = diesel::delete(silo_group_membership::dsl::silo_group_membership) .filter( silo_group_membership::dsl::silo_group_id.eq_any( silo_group::dsl::silo_group .filter(silo_group::dsl::silo_id.eq(id)) .filter(silo_group::dsl::time_deleted.is_not_null()) .select(silo_group::dsl::id), ), ) .execute_async(self.pool_authorized(opctx).await?) .await .map_err(|e| { public_error_from_diesel_pool( e, ErrorHandler::NotFoundByResource(authz_silo), ) })?;

but there's also the silo group delete code:

diesel::update(dsl::silo_group) .filter(dsl::id.eq(group_id)) .filter(dsl::time_deleted.is_null()) .set(dsl::time_deleted.eq(Utc::now())) .execute(conn)?;

which also would use this index.

Makes sense. When those queries become paginated in the future, they're going to need the id column. I'd suggest making this a UNIQUE index on (silo_id, id). I don't think it makes any difference right now. But we've done that elsewhere for pagination and it'll avoid having to do a schema migration later.

Right now, the indexes on silo_group are:

CREATE INDEX ON omicron.public.silo_group ( silo_id ) WHERE time_deleted IS NOT NULL; CREATE UNIQUE INDEX ON omicron.public.silo_group ( silo_id, external_id ) WHERE time_deleted IS NULL;

With the column name change, I'm a little confused - are you suggesting changing the first one to UNIQUE on (silo_id, id), or (silo_id, external_id)?

Sorry I think I missed a few things here, especially that the time_deleted condition is inverted between these two indexes. This index here is on time_deleted IS NOT NULL, so it includes only deleted silo groups.

I see the usage you mentioned in DataStore::silo_delete(). That query indeed specifies time_deleted IS NOT NULL, so it matches this index. But it looks to me like both might be wrong? At that point in silo_delete(), we haven't deleted the silo groups yet, right? That looks like the next thing we do. So it seems like the "delete all silo group memberships" section will only delete memberships from silo groups that were already deleted before the Silo was deleted. (Sorry if I'm misreading the code again!)

If the right thing is to flip the time_deleted condition in the query, then you'd want to flip the condition in this index too, and then it really should be redundant with the other index. So you should be able to just remove this one. (If I'm misreading the code and we do want to filter only on the deleted ones, then I'd suggest instead dropping the time_deleted IS NULL condition on the other index and removing this one.)

The second usage you mentioned (silo group delete code) is specifying time_deleted IS NULL, so it cannot use this index (but should use the other index).

Yep, that looks like a bug. We delete silo group memberships before silo groups, meaning time_deleted is null. I have no idea how this passes the restriction on full table scans...?

The only theory that makes sense to me is there wasn't a test for deleting a silo that contained a user, group, and group memberships, meaning the number of rows was zero. When I added one, the test failed because group memberships remained, not because of the full table scan restriction.

Commit 6e79986 changes the index, the filter in silo_delete, and adds the test I'm talking about.

We discussed the indexes again offline and I'll summarize here. I'd suggested just one unique, partial index here. This triggered an unexpected error on the INSERT ... ON CONFLICT. I filed #1545 to describe this and our subsequent research. In conclusion I think there are two okay options here:

Keep the single unique partial index and use on_conflict_do_nothing(). (@jmpesp I only discovered this option after we talked about this.) We may want to add a test for trying to ensure the same group twice?

Keep the single unique partial index and drop the on_conflict part of the insert. The problem with this is that it means that if two users log in at about the same time, both in some IdP group that we've never seen before, one of them will fail spuriously. But if that user retries, it will work. That's a lot of conditions, the impact is small, and there's a trivial workaround (retry). It'd be nice to fix this but it's hard for me to justify asking to spend much time on this case right now.

Either is okay with me -- but we should do one of these. As it is right now, since we have a unique non-partial index, it would incorrectly prevent us from reusing the external_id of a deleted silo group.

This is my last worry about this change -- otherwise I think it's good to go!

@smklein found another possible option

Updated in df37612, thanks for debugging with me :)

common/src/sql/dbinit.sql

nexus/src/authn/silos.rs

common/src/sql/dbinit.sql

nexus/src/app/silo.rs

davepacheco · 2022-07-26T17:30:57Z

nexus/src/db/datastore/role.rs

+                    let mut group_role_assignments = dsl::role_assignment
+                        .filter(dsl::identity_type.eq(IdentityType::SiloGroup))
+                        .filter(dsl::identity_id.eq_any(
+                            silo_group_membership::dsl::silo_group_membership


I think this will work okay. It's a little different than what I'd suggested, which was:

SELECT r.* from role_assignment r, silo_group_membership m WHERE m.silo_user_id == $1 AND r.actor_type == 'silo_group' AND r.actor_id == m.silo_group_id AND r.resource_type == $2 AND r.resource_id = $3

This looks more like:

SELECT * from role_assignment WHERE r.identity_type == 'silo_group' AND r.identity_id IN (SELECT silo_group_id FROM silo_group_membership WHERE silo_user_id = $1) AND r.resource_type == $2 AND r.resource_id = $3

This uses a subquery instead of a join. The main downside is that the database will materialize the full list of a user's groups and check that set against all the matching role assignments. The join might also have to do that but I think the database would have more freedom to choose the best approach (e.g., if there were a large number of role assignments and few groups, it could choose instead to enumerate the groups and check each one for a matching role assignment). I assume group membership will be small for the foreseeable future and this won't be a problem.

nexus/src/db/datastore/silo.rs

nexus/src/db/datastore/silo_group.rs

davepacheco

Great! I really like the new transaction for silo create. Thanks for doing that. There's just 1-2 things here that I think are worth doing (the index, the "limit 1", and maybe the "on conflict" change? but that case is pretty obscure).

davepacheco · 2022-07-29T16:18:12Z

common/src/sql/dbinit.sql

 );

-CREATE UNIQUE INDEX ON omicron.public.silo_group (
-    name,
+CREATE INDEX ON omicron.public.silo_group (


Makes sense. When those queries become paginated in the future, they're going to need the id column. I'd suggest making this a UNIQUE index on (silo_id, id). I don't think it makes any difference right now. But we've done that elsewhere for pagination and it'll avoid having to do a schema migration later.

nexus/src/authn/silos.rs

nexus/src/app/silo.rs

nexus/types/src/external_api/params.rs

davepacheco · 2022-07-29T16:37:06Z

nexus/src/authn/mod.rs

+    }
+}
+
+impl From<&Actor> for db::model::IdentityType {


Just out of curiosity, why the change from actor_type() to a From impl? It seems the same signature (&Actor -> db::model::IdentityType) and implementation. (I know I'd suggested that we use a separate return type for actor_type() that would be an enum of just UserBuiltin and SiloUser (but not SiloGroup). This change doesn't change that, which is fine. I was wondering if there was some other reason you preferred From to a function.)

In situations where a section of code has an Actor, I think this reduces the possibility that code can ask the wrong question with match, like if the Actor is somehow a SiloGroup. Previously, the code matched on the result of a function call:

match actor.actor_type() { SiloGroup => { ... } }

Here it seems like a natural question to see if an actor is a group, which doesn't make sense because the Actor enum doesn't contain that variant.

With the From, the user would have to type something like

let actor_type: db::model::IdentityType = actor.into(); match actor_type { SiloGroup => { ... } }

or

match actor.into::<db::model::IdentityType>() { SiloGroup => { ... } }

Which is explicitly asking questions about the db identity type, not what actor is - it's much more natural to type match actor { instead.

nexus/src/db/datastore/silo_group.rs

so multiple logins do not cause one another to fail

…ed groups

davepacheco · 2022-08-04T20:00:01Z

nexus/src/db/datastore/silo_group.rs

+        // are logging in at the same time, and both are part of IdP groups that
+        // do not yet exist in our database.
+        //
+        // Currently there is a unique partial index on silo_group, which


I thought the Diesel we're using did support this now. Is that not right?

That part of the comment refers to "diesel does not support creating queries that contain ON CONFLICT (...) WHERE ... DO NOTHING.

I looked at #1545 and saw that

.on_conflict((dsl::silo_id, dsl::external_id)) .filter_target(dsl::time_deleted.is_null()) .do_nothing()

was suggested, and it works! 643aadb changes this and removes the comment.

jmpesp requested a review from davepacheco July 6, 2022 03:12

jmpesp added 7 commits July 6, 2022 09:12

Merge remote-tracking branch 'upstream/main' into silo_groups

b963c5d

add IdentityType::SiloGroup to match patterns

4a51058

add optional fetch and lookup

5c85f1b

WIP add broken silo create saga yay

9d2b3ff

silo creation is now a saga

8f8729c

delete silo groups and group memberships

d300cff

external authenticator permissions on silo groups

2456ed1

davepacheco mentioned this pull request Jul 8, 2022

tracking issue for MVP IAM work #849

Closed

69 tasks

davepacheco reviewed Jul 13, 2022

View reviewed changes

jmpesp added 10 commits July 18, 2022 16:51

Merge remote-tracking branch 'upstream/main' into silo_groups

34dffd8

do not use Resource::name for silo group names

2544362

correct index specification

b11a146

- primary key instead of unique index - reverse silo group unique index order (for query and pagination) - remove silo id index - add missing filter on time_deleted being null

update user group memberships as one transaction

5eba13a

trim whitespace and skip empty groups

29950ba

add back missing delete code

04b0af5

get role assignments from group membership during transaction

d82c9f6

delete silo_group_membership_for_user_no_authz

0067fb0

Don't delete groups that still have memberships

0714fa3

fmt

1d6ef52

davepacheco reviewed Jul 26, 2022

View reviewed changes

jmpesp added 6 commits July 27, 2022 14:34

silo creation is now a transaction

0aa7669

drop silo_delete_by_id

9f8a3ec

silo and silo group don't need to serialize

e1ef703

fmt

566878d

fmt

99b752b

only one authz check for user instead of a for loop for group

c031eca

jmpesp added 8 commits July 27, 2022 15:29

use ErrorHandler::Server, do not error if nothing is deleted

bb9f567

only one info log message in during silo delete

6e91105

remove duplicate index

96fc1f3

fmt

615b56e

use public_error_from_diesel_pool

34f49bc

remove actor_type, replace with impl From and silo_user_id fn

b841c85

do not store admin_group_name

54e246e

Merge remote-tracking branch 'upstream/main' into silo_groups

f3cccc3

davepacheco reviewed Jul 29, 2022

View reviewed changes

jmpesp added 6 commits August 2, 2022 16:11

remove holdover from when this was a saga

35962ce

expand public docs

7837edc

only checking if any group membership exists

a2507a6

ensure (not exclusively create) silo groups

9ee416f

so multiple logins do not cause one another to fail

change silo_group index to "time_deleted IS NULL", add test for delet…

6e79986

…ed groups

use one unique partial index for silo_group

df37612

davepacheco approved these changes Aug 4, 2022

View reviewed changes

diesel does support ON CONFLICT ( ... ) WHERE ...

643aadb

jmpesp marked this pull request as ready for review August 4, 2022 21:22

jmpesp merged commit c08906c into oxidecomputer:main Aug 4, 2022

jmpesp deleted the silo_groups branch August 4, 2022 21:24

david-crespo mentioned this pull request Mar 19, 2025

minor: refactor session create methods #7827

Merged

[DRAFT] Add support for Silo groups #1358

[DRAFT] Add support for Silo groups #1358

Uh oh!

Conversation

jmpesp commented Jul 6, 2022

Uh oh!

jmpesp commented Jul 8, 2022

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!