Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor device auth schema #1344

Merged
merged 10 commits into from
Jul 7, 2022
Merged

Refactor device auth schema #1344

merged 10 commits into from
Jul 7, 2022

Conversation

plotnick
Copy link
Contributor

@plotnick plotnick commented Jul 1, 2022

Fixes #1255 and fixes #1285.

Records in the device_auth_request table are now short-lived so that we can use user_code as a primary key.

Datastore authz is implemented except for lookups in the device_access_token table, since we need lookup by both the token (primary key) and the unique combination of client_id and device_code. The current lookup machinery does not appear to make this easy; suggestions for workarounds would be welcome.

Requests are now short-lived so that we can use `user_code`
as a primary key.

Authz is implemented and working for all but access token lookup,
which needs to be looked up both by `token` (primary key)
and the unique combination of `client_id` and `device_code`.
@plotnick plotnick requested a review from davepacheco July 1, 2022 20:27
access_token: DeviceAccessToken,
) -> CreateResult<DeviceAccessToken> {
assert_eq!(authz_user.id(), access_token.silo_user_id);
opctx.authorize(authz::Action::CreateChild, authz_user).await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check feels a little weird, since DeviceAccessToken isn't declared by lookup_resource! as a child of SiloUser. Suggestions for a better check would be welcome.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could create another synthetic resource analogous to ConsoleSessionList (DeviceAccessTokenList) but I admit the boilerplate-to-value ratio on those is getting worse...

To be clear, would we expect this function to be invoked by the user themselves or by Nexus's external-authenticator? I assumed the latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this function is called as part of the confirmation in the browser, so it's made by an authenticated user.

@plotnick plotnick marked this pull request as ready for review July 5, 2022 18:02
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite some questions below I really like the changes here!

);

-- Matches the primary key on device authorization records.
-- The UNIQUE constraint is critical for ensuring that at most
-- This UNIQUE constraint is critical for ensuring that at most
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is no longer true (that is, removal from the device_auth_request guarantees that now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is still true with respect to this table considered alone. There shouldn't be any code paths now for which this invariant is broken, but it still seems like a good idea to leave it in place for future us (me).

Copy link
Collaborator

@davepacheco davepacheco Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wasn't suggesting removing the constraint, just potentially updating the comment, but it's fine as is too)

common/src/sql/dbinit.sql Show resolved Hide resolved
nexus/src/app/device_auth.rs Outdated Show resolved Hide resolved
nexus/src/app/device_auth.rs Outdated Show resolved Hide resolved
nexus/src/db/datastore.rs Outdated Show resolved Hide resolved
nexus/src/db/datastore.rs Outdated Show resolved Hide resolved
access_token: DeviceAccessToken,
) -> CreateResult<DeviceAccessToken> {
assert_eq!(authz_user.id(), access_token.silo_user_id);
opctx.authorize(authz::Action::CreateChild, authz_user).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could create another synthetic resource analogous to ConsoleSessionList (DeviceAccessTokenList) but I admit the boilerplate-to-value ratio on those is getting worse...

To be clear, would we expect this function to be invoked by the user themselves or by Nexus's external-authenticator? I assumed the latter.

// TODO-security: authz
pub async fn device_auth_get_token(
/// Look up a granted device access token.
/// Note: since this lookup is not by a primary key or name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the client allowed to fetch the token more than once?

Wondering out loud if there should be another table of short-lived records here whose primary key is (client_id, device_id), which just has a foreign key into the token table. If they're only ever allowed to fetch the token once, that might also be useful to enforce that. If this doesn't seem appealing, ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good question. I don't think there's an explicit restriction on fetching the token more than once, though I do think it would be odd. I'm not very keen to add a new table, though.

@plotnick plotnick requested a review from davepacheco July 6, 2022 16:04
nexus/src/db/datastore.rs Outdated Show resolved Hide resolved
self.pool_authorized(opctx)
.await?
.transaction(move |conn| {
delete_request.execute(conn)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there's a window here where this could happen:

  1. request A comes in to verify the request and create a new token
  2. request A fetches the device auth request
  3. request B comes in to verify the request and create a new token, fetches the device auth request row, and executes this transaction
  4. request A executes this transaction

I think the result would be two tokens created from the same device auth request.

I think the easiest way to fix that might be to bail out at L4329 if execute() reports having deleted any number of rows other than 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fixed in 152fd2a. Please LMK if I got the error handling wrong.

nexus/src/db/model/device_auth.rs Outdated Show resolved Hide resolved
Ok(insert_token.get_result(conn)?)
} else {
Err(TxnError::CustomError(
TokenGrantError::ConcurrentRequests,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay. I would probably have this say:

0 rows back => 404 -- there's no reason to expose this as "concurrent update" -- it just doesn't exist now
more than 1 row back => 500

If you want to do that, vpc_update_firewall_rules() might be an example to follow? You basically need to return something from the transaction that's specific enough to match on it in the map_err() below to decide whether to use Error::internal_error or authz_resource.not_found().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry. Trying again in bf53d38.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this looks better. The case I'm still not sure about is when we find more than one item. I'd expect this to be a 500 (with a message like "unexpectedly found multiple device auth requests for the same user code") because I don't see how it could ever actually happen. The current code produces a 400 that says something about concurrent requests. (When we were talking about "concurrent requests" earlier, I think I meant concurrent verify requests, which is what could land you in the other case (which is now a 404, which is good). I'm not sure how concurrent requests could land you in this case...unless by "concurrent requests" you mean "we managed to have multiple [concurrent] device auth requests with the same user code", which is true, but that still seems like a server error.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct on all accounts, thank you. Hopefully last try: 8ce7e42.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@plotnick plotnick enabled auto-merge (squash) July 7, 2022 17:35
@plotnick plotnick disabled auto-merge July 7, 2022 17:37
@plotnick plotnick merged commit 787778b into main Jul 7, 2022
@plotnick plotnick deleted the device-auth-schema branch July 7, 2022 19:17
leftwo pushed a commit that referenced this pull request Jun 26, 2024
Added a new package, crucible-dtrace that pulls from buildomat a package
that contains a set of DTrace scripts.  These scripts are extracted into
the global zone at /opt/oxide/crucible_dtrace/

Update Crucible to latest includes these updates:
Clean up dependency checking, fixing space leak (#1372)
Make a DTrace package (#1367)
Use a single context in all messages (#1363)
Remove `DownstairsWork`, because it's redundant (#1371)
Remove `WorkState`, because it's implicit (#1370)
Do work immediately upon receipt of a job, if possible (#1366)
Move 'do work for one job' into a helper function (#1365)
Remove `DownstairsWork` from map when handling it (#1361)
Using `block_in_place` for IO operations (#1357)
update omicron deps; use re-exported dropshot types in oximeter-producer configuration (#1369)
Parameterize more tests (#1364)
Misc cleanup, remove sqlite references. (#1360)
Fix `Extent::close` docstring (#1359)
Make many `Region` functions synchronous (#1356)
Remove `Workstate::Done` (unused) (#1355)
Return a sorted `VecDeque` directly (#1354)
Combine `proc_frame` and `do_work_for` (#1351)
Move `do_work_for` and `do_work` into `ActiveConnection` (#1350)
Support arbitrary Volumes during replace compare (#1349)
Remove the SQLite backend (#1352)
Add a custom timeout for buildomat tests (#1344)
Move `proc_frame` into `ActiveConnection` (#1348)
Remove `UpstairsConnection` from `DownstairsWork` (#1341)
Move Work into ConnectionState (#1340)
Make `ConnectionState` an enum type (#1339)
Parameterize `test_repair.sh` directories (#1345)
Remove `Arc<Mutex<Downstairs>>` (#1338)
Send message to Downstairs directly (#1336)
Consolidate `on_disconnected` and `remove_connection` (#1333)
Move disconnect logic to the Downstairs (#1332)
Remove invalid DTrace probes. (#1335)
Fix outdated comments (#1331)
Use message passing when a new connection starts (#1330)
Move cancellation into Downstairs, using a token to kill IO tasks (#1329)
Make the Downstairs own per-connection state (#1328)
Move remaining local state into a `struct ConnectionState` (#1327)
Consolidate negotiation + IO operations into one loop (#1322)
Allow replacement of a target in a read_only_parent (#1281)
Do all IO through IO tasks (#1321)
Make `reqwest_client` only present if it's used (#1326)
Move negotiation into Downstairs as well (#1320)
Update Rust crate clap to v4.5.4 (#1301)
Reuse a reqwest client when creating Nexus clients (#1317)
Reuse a reqwest client when creating repair client (#1324)
Add % to keep buildomat happy (#1323)
Downstairs task cleanup (#1313)
Update crutest replace test, and mismatch printing. (#1314)
Added more DTrace scripts. (#1309)
Update Rust crate async-trait to 0.1.80 (#1298)
leftwo added a commit that referenced this pull request Jun 26, 2024
Update Crucible and Propolis to the latest

Added a new package, crucible-dtrace that pulls from buildomat a package
that contains a set of DTrace scripts. These scripts are extracted into the 
global zone at /opt/oxide/crucible_dtrace/

Crucible latest includes these updates:
Clean up dependency checking, fixing space leak (#1372) Make a DTrace
package (#1367)
Use a single context in all messages (#1363)
Remove `DownstairsWork`, because it's redundant (#1371) Remove
`WorkState`, because it's implicit (#1370)
Do work immediately upon receipt of a job, if possible (#1366) Move 'do
work for one job' into a helper function (#1365) Remove `DownstairsWork`
from map when handling it (#1361) Using `block_in_place` for IO
operations (#1357)
update omicron deps; use re-exported dropshot types in oximeter-producer
configuration (#1369) Parameterize more tests (#1364)
Misc cleanup, remove sqlite references. (#1360)
Fix `Extent::close` docstring (#1359)
Make many `Region` functions synchronous (#1356)
Remove `Workstate::Done` (unused) (#1355)
Return a sorted `VecDeque` directly (#1354)
Combine `proc_frame` and `do_work_for` (#1351)
Move `do_work_for` and `do_work` into `ActiveConnection` (#1350) Support
arbitrary Volumes during replace compare (#1349) Remove the SQLite
backend (#1352)
Add a custom timeout for buildomat tests (#1344)
Move `proc_frame` into `ActiveConnection` (#1348)
Remove `UpstairsConnection` from `DownstairsWork` (#1341) Move Work into
ConnectionState (#1340)
Make `ConnectionState` an enum type (#1339)
Parameterize `test_repair.sh` directories (#1345)
Remove `Arc<Mutex<Downstairs>>` (#1338)
Send message to Downstairs directly (#1336)
Consolidate `on_disconnected` and `remove_connection` (#1333) Move
disconnect logic to the Downstairs (#1332)
Remove invalid DTrace probes. (#1335)
Fix outdated comments (#1331)
Use message passing when a new connection starts (#1330) Move
cancellation into Downstairs, using a token to kill IO tasks (#1329)
Make the Downstairs own per-connection state (#1328) Move remaining
local state into a `struct ConnectionState` (#1327) Consolidate
negotiation + IO operations into one loop (#1322) Allow replacement of a
target in a read_only_parent (#1281) Do all IO through IO tasks (#1321)
Make `reqwest_client` only present if it's used (#1326) Move negotiation
into Downstairs as well (#1320)
Update Rust crate clap to v4.5.4 (#1301)
Reuse a reqwest client when creating Nexus clients (#1317) Reuse a
reqwest client when creating repair client (#1324) Add % to keep
buildomat happy (#1323)
Downstairs task cleanup (#1313)
Update crutest replace test, and mismatch printing. (#1314) Added more
DTrace scripts. (#1309)
Update Rust crate async-trait to 0.1.80 (#1298)

Propolis just has this one update:
Allow boot order config in propolis-standalone
---------

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revamp device auth schema DB authz for client authentication & token records
2 participants