-
Notifications
You must be signed in to change notification settings - Fork 44
[DRAFT] Add support for Silo groups #1358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add Silo groups, and the ability to grant them roles. Add the necessary logic to: - provision an "admin" silo group when a silo is created, which is granted silo admin role. - after successful authentication, create groups during silo user provision if the Silo's provision type is JIT. - add a group's roles to a user's role set if they're part of that group. Silos now have an optional admin_group_name that is configured at silo provision time. If this is left out, users will currently have no way to be granted roles when they first log in. In the future, this may be selected and groups would be created another way. SAML identity providers now have an optional group_attribute_name that configures what attribute represents a group name.
This is ready for review now :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I gather we're also going to need an API to list groups in the Silo, groups that you're a member of, and maybe members of a group that you're in. I'm not saying that has to be in this PR.
I gather that like users, we're going to identify groups in the API by their uuid and treat the external id like the user's display name -- that's what we show everywhere, but not what we use as the id.
silo_user_id | ||
); | ||
|
||
CREATE INDEX ON omicron.public.silo_group_membership ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need (silo_user_id, silo_group_id)
here. It's a little weird to have two indexes whose columns are the same except for the order, but I don't see a way around it. We want to paginate the users in a group as well as the groups that a user is in and I think we need two indexes to do that.
nexus/src/db/datastore.rs
Outdated
) -> ListResultVec<SiloGroupMembership> { | ||
opctx.authorize(authz::Action::ListChildren, authz_silo).await?; | ||
|
||
use db::schema::silo_group_membership::dsl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to want to paginate this.
If we make the other change I suggested earlier, you won't need this function. I think it would be really useful to have APIs for a user to list their own groups and to list the users in a group. Those should be paginated.
- primary key instead of unique index - reverse silo group unique index order (for query and pagination) - remove silo id index - add missing filter on time_deleted being null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working through more details here but I wanted to leave this while I have it.
common/src/sql/dbinit.sql
Outdated
); | ||
|
||
CREATE UNIQUE INDEX ON omicron.public.silo_group ( | ||
name, | ||
CREATE INDEX ON omicron.public.silo_group ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this index for? (I'm generally suspicious of non-unique indexes because they can't be used for lookup and they can't be used for pagination when the page size is smaller than the number of duplicate rows in the index. I see that we're doing this a lot though and I filed #1497.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove it, I see a full table scan error for
DELETE FROM "silo_group_membership" WHERE ("silo_group_membership"."silo_group_id" = ANY(SELECT "silo_group"."id" FROM "silo_group" WHERE (("silo_group"."silo_id" = $1) AND ("silo_group"."time_deleted" IS NOT NULL))))
meaning it's coming from the silo delete code:
let updated_rows =
diesel::delete(silo_group_membership::dsl::silo_group_membership)
.filter(
silo_group_membership::dsl::silo_group_id.eq_any(
silo_group::dsl::silo_group
.filter(silo_group::dsl::silo_id.eq(id))
.filter(silo_group::dsl::time_deleted.is_not_null())
.select(silo_group::dsl::id),
),
)
.execute_async(self.pool_authorized(opctx).await?)
.await
.map_err(|e| {
public_error_from_diesel_pool(
e,
ErrorHandler::NotFoundByResource(authz_silo),
)
})?;
but there's also the silo group delete code:
diesel::update(dsl::silo_group)
.filter(dsl::id.eq(group_id))
.filter(dsl::time_deleted.is_null())
.set(dsl::time_deleted.eq(Utc::now()))
.execute(conn)?;
which also would use this index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. When those queries become paginated in the future, they're going to need the id
column. I'd suggest making this a UNIQUE
index on (silo_id, id)
. I don't think it makes any difference right now. But we've done that elsewhere for pagination and it'll avoid having to do a schema migration later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the indexes on silo_group are:
CREATE INDEX ON omicron.public.silo_group (
silo_id
) WHERE
time_deleted IS NOT NULL;
CREATE UNIQUE INDEX ON omicron.public.silo_group (
silo_id,
external_id
) WHERE
time_deleted IS NULL;
With the column name change, I'm a little confused - are you suggesting changing the first one to UNIQUE
on (silo_id, id)
, or (silo_id, external_id)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I think I missed a few things here, especially that the time_deleted
condition is inverted between these two indexes. This index here is on time_deleted IS NOT NULL
, so it includes only deleted silo groups.
I see the usage you mentioned in DataStore::silo_delete()
. That query indeed specifies time_deleted IS NOT NULL
, so it matches this index. But it looks to me like both might be wrong? At that point in silo_delete()
, we haven't deleted the silo groups yet, right? That looks like the next thing we do. So it seems like the "delete all silo group memberships" section will only delete memberships from silo groups that were already deleted before the Silo was deleted. (Sorry if I'm misreading the code again!)
If the right thing is to flip the time_deleted
condition in the query, then you'd want to flip the condition in this index too, and then it really should be redundant with the other index. So you should be able to just remove this one. (If I'm misreading the code and we do want to filter only on the deleted ones, then I'd suggest instead dropping the time_deleted IS NULL
condition on the other index and removing this one.)
The second usage you mentioned (silo group delete code) is specifying time_deleted IS NULL
, so it cannot use this index (but should use the other index).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that looks like a bug. We delete silo group memberships before silo groups, meaning time_deleted
is null. I have no idea how this passes the restriction on full table scans...?
The only theory that makes sense to me is there wasn't a test for deleting a silo that contained a user, group, and group memberships, meaning the number of rows was zero. When I added one, the test failed because group memberships remained, not because of the full table scan restriction.
Commit 6e79986 changes the index, the filter in silo_delete
, and adds the test I'm talking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed the indexes again offline and I'll summarize here. I'd suggested just one unique, partial index here. This triggered an unexpected error on the INSERT ... ON CONFLICT
. I filed #1545 to describe this and our subsequent research. In conclusion I think there are two okay options here:
- Keep the single unique partial index and use
on_conflict_do_nothing()
. (@jmpesp I only discovered this option after we talked about this.) We may want to add a test for trying to ensure the same group twice? - Keep the single unique partial index and drop the
on_conflict
part of the insert. The problem with this is that it means that if two users log in at about the same time, both in some IdP group that we've never seen before, one of them will fail spuriously. But if that user retries, it will work. That's a lot of conditions, the impact is small, and there's a trivial workaround (retry). It'd be nice to fix this but it's hard for me to justify asking to spend much time on this case right now.
Either is okay with me -- but we should do one of these. As it is right now, since we have a unique non-partial index, it would incorrectly prevent us from reusing the external_id of a deleted silo group.
This is my last worry about this change -- otherwise I think it's good to go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in df37612, thanks for debugging with me :)
let mut group_role_assignments = dsl::role_assignment | ||
.filter(dsl::identity_type.eq(IdentityType::SiloGroup)) | ||
.filter(dsl::identity_id.eq_any( | ||
silo_group_membership::dsl::silo_group_membership |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will work okay. It's a little different than what I'd suggested, which was:
SELECT r.* from role_assignment r, silo_group_membership m WHERE
m.silo_user_id == $1 AND
r.actor_type == 'silo_group' AND
r.actor_id == m.silo_group_id AND
r.resource_type == $2 AND
r.resource_id = $3
This looks more like:
SELECT * from role_assignment WHERE
r.identity_type == 'silo_group' AND
r.identity_id IN (SELECT silo_group_id FROM silo_group_membership WHERE silo_user_id = $1) AND
r.resource_type == $2 AND
r.resource_id = $3
This uses a subquery instead of a join. The main downside is that the database will materialize the full list of a user's groups and check that set against all the matching role assignments. The join might also have to do that but I think the database would have more freedom to choose the best approach (e.g., if there were a large number of role assignments and few groups, it could choose instead to enumerate the groups and check each one for a matching role assignment). I assume group membership will be small for the foreseeable future and this won't be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I really like the new transaction for silo create. Thanks for doing that. There's just 1-2 things here that I think are worth doing (the index, the "limit 1", and maybe the "on conflict" change? but that case is pretty obscure).
common/src/sql/dbinit.sql
Outdated
); | ||
|
||
CREATE UNIQUE INDEX ON omicron.public.silo_group ( | ||
name, | ||
CREATE INDEX ON omicron.public.silo_group ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. When those queries become paginated in the future, they're going to need the id
column. I'd suggest making this a UNIQUE
index on (silo_id, id)
. I don't think it makes any difference right now. But we've done that elsewhere for pagination and it'll avoid having to do a schema migration later.
} | ||
} | ||
|
||
impl From<&Actor> for db::model::IdentityType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, why the change from actor_type()
to a From
impl? It seems the same signature (&Actor -> db::model::IdentityType
) and implementation. (I know I'd suggested that we use a separate return type for actor_type()
that would be an enum of just UserBuiltin
and SiloUser
(but not SiloGroup
). This change doesn't change that, which is fine. I was wondering if there was some other reason you preferred From
to a function.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In situations where a section of code has an Actor
, I think this reduces the possibility that code can ask the wrong question with match
, like if the Actor is somehow a SiloGroup. Previously, the code matched on the result of a function call:
match actor.actor_type() {
SiloGroup => { ... }
}
Here it seems like a natural question to see if an actor is a group, which doesn't make sense because the Actor enum doesn't contain that variant.
With the From, the user would have to type something like
let actor_type: db::model::IdentityType = actor.into();
match actor_type {
SiloGroup => { ... }
}
or
match actor.into::<db::model::IdentityType>() {
SiloGroup => { ... }
}
Which is explicitly asking questions about the db identity type, not what actor is - it's much more natural to type match actor {
instead.
so multiple logins do not cause one another to fail
nexus/src/db/datastore/silo_group.rs
Outdated
// are logging in at the same time, and both are part of IdP groups that | ||
// do not yet exist in our database. | ||
// | ||
// Currently there is a unique partial index on silo_group, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the Diesel we're using did support this now. Is that not right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part of the comment refers to "diesel does not support creating queries that contain ON CONFLICT (...) WHERE ... DO NOTHING
.
I looked at #1545 and saw that
.on_conflict((dsl::silo_id, dsl::external_id))
.filter_target(dsl::time_deleted.is_null())
.do_nothing()
was suggested, and it works! 643aadb changes this and removes the comment.
Add Silo groups, and the ability to grant them roles.
Add the necessary logic to:
granted silo admin role.
provision if the Silo's provision type is JIT.
group.
Silos now have an optional admin_group_name that is configured at silo
provision time. If this is left out, users will currently have no way to
be granted roles when they first log in. In the future, this may be
selected and groups would be created another way.
SAML identity providers now have an optional group_attribute_name that
configures what attribute represents a group name.