Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design: OIDC lifecycle for Warehouse #10619

Closed
woodruffw opened this issue Jan 19, 2022 · 9 comments
Closed

Design: OIDC lifecycle for Warehouse #10619

woodruffw opened this issue Jan 19, 2022 · 9 comments

Comments

@woodruffw
Copy link
Member

woodruffw commented Jan 19, 2022

This is an overarching design tracking issue for how I plan to go about supporting OIDC JWT handling in Warehouse.

Setup flow

  1. A PyPI user logs into PyPI
  2. They navigate to a project and click Manage > Settings
  3. They enter a GitHub repository URL ($REPO_URL) and workflow name ($WORKFLOW_NAME) under an appropriate heading on the page

Authentication flow

  1. A user triggers $WORKFLOW_NAME on $REPO_URL.
  2. $WORKFLOW_NAME uses GitHub's JWT minting endpoint for OIDC to create a new JWT
  3. $WORKFLOW_NAME uses an HTTP POST to sent the JWT to PyPI (to some authorization endpoint)
  4. PyPI verifies the JWT's signature against the JWKS for the issuer
  5. PyPI verifies the consistency of the JWT's claims. Specifically, the sub contains a formatted representation of the $REPO_URL and $WORKFLOW_NAME, and other custom claims also contain them.
  6. If the claims are valid (i.e., they match a PyPI Project's registered $REPO_URL and $WORKFLOW_NAME, then PyPI mints an ephemeral access token.
  7. PyPI ships the access token back to the triggering workflow via OIDC's id_token response type.
  8. Off to the races: twine or another tool uses the access token to publish the project's distribution(s).
@woodruffw
Copy link
Member Author

cc @di: Let me know how the above flows sound to you (they're similar to the ones I wrote up in our chat, but more fleshed out).

@di
Copy link
Member

di commented Jan 21, 2022

and workflow name ($WORKFLOW_NAME) under an appropriate heading on the page

Just thinking out loud, is the workflow name enough to distinguish it here, or should it be a workflow path? E.g., can a repo have multiple workflows with the same name in different locations? Also, is this a filename or some name in the configuration of the workflow?

Also, is this the best way to uniquely identify a repo? What if the repo is deleted and then squatted by an attacker, would they be able to authenticate with this flow?

$WORKFLOW_NAME uses an HTTP POST to sent the JWT to PyPI (to some authorization endpoint)

Let's stick to the convention here, /authorization, /userinfo and such.

then PyPI mints an ephemeral access token.

Just confirming we're planning on minting an expiring macaroon for the underlying token.

Off to the races: twine or another tool uses the access token to publish the project's distribution(s).

Just stating a logical conclusion here: If the same repo is used for multiple projects, the token will have permissions to publish for any of those projects.

@woodruffw
Copy link
Member Author

Just thinking out loud, is the workflow name enough to distinguish it here, or should it be a workflow path? E.g., can a repo have multiple workflows with the same name in different locations? Also, is this a filename or some name in the configuration of the workflow?

I believe the "workflow name" in GitHub's docs refers to the top-level name: key in the workflow YAML file, which doesn't have to correspond to the YAML's filename.

Looking at GitHub's docs, it looks like it's actually the environment name and not the workflow name embedded in the OIDC subject:

repo:orgName/repoName:environment:environmentName

...where environments are defined separately from workflow names: https://docs.github.com/en/actions/deployment/targeting-different-environments/using-environments-for-deployment

So everything above should be s/$WORKFLOW_NAME/$ENVIRONMENT_NAME/.

Also, is this the best way to uniquely identify a repo? What if the repo is deleted and then squatted by an attacker, would they be able to authenticate with this flow?

Yeah, we need to determine our trust model/security boundary here. The OIDC JWT offers a repository_owner claim that we can cross-check against, but that requires us to either (1) TOFU, or (2) have the user put the owner's username in at the same time that they configure the repository URL and environment name.

OTOH, takeover might be entirely outside of our scope: I don't believe other package management ecosystems that root themselves in GitHub (like Go's) handle this. That's not a great answer, though, and I'd prefer to do better than the status quo 🙂

Just confirming we're planning on minting an expiring macaroon for the underlying token.

Yep, that's what I had in mind. I figured we could even go further than this by additionally restricting the number of authentication actions the temporary macaroon can perform, although that needs additional thought (since twine, AFAIK, does one HTTP request per distribution file.)

Just stating a logical conclusion here: If the same repo is used for multiple projects, the token will have permissions to publish for any of those projects.

Assuming that each of those projects has the repo registered with it on PyPI, yep. I don't think we want it to be "open" by default, since that would be surprising (even if not necessarily compromisable) behavior.

@woodruffw
Copy link
Member Author

Thought about it some more: TOFU doesn't save us in the event of squatting, since the GitHub's username wouldn't change.

The stronger thing might be saving the repository owner's GitHub ID, which (presumably) corresponds to the actual database row for the user. But that's an implementation detail that we'd be placing a lot of trust in.

@di
Copy link
Member

di commented Jan 21, 2022

I figured we could even go further than this by additionally restricting the number of authentication actions the temporary macaroon can perform, although that needs additional thought (since twine, AFAIK, does one HTTP request per distribution file.)

That's correct, so I don't think limiting by anything other than time will work here.

The stronger thing might be saving the repository owner's GitHub ID, which (presumably) corresponds to the actual database row for the user. But that's an implementation detail that we'd be placing a lot of trust in.

Also not included in the OIDC token AFAICT, so a second request for us.

@woodruffw
Copy link
Member Author

woodruffw commented Jan 21, 2022

Also not included in the OIDC token AFAICT, so a second request for us.

Yeah, I think it's something we'd want to do at configuration time -- we'd ask the user for their repository in user/repo format, then use the user component to do an ID lookup and establish the TOFU relationship on that ID.

The relevant endpoint is https://api.github.com/users/USERNAME, and the response looks like:

{
  "login": "woodruffw",
  "id": 3059210,
  "node_id": "MDQ6VXNlcjMwNTkyMTA=",
  "avatar_url": "https://avatars.githubusercontent.com/u/3059210?v=4",
  "gravatar_id": "",
  "url": "https://api.github.com/users/woodruffw",
  "html_url": "https://github.com/woodruffw",
  "followers_url": "https://api.github.com/users/woodruffw/followers",
  "following_url": "https://api.github.com/users/woodruffw/following{/other_user}",
  "gists_url": "https://api.github.com/users/woodruffw/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/woodruffw/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/woodruffw/subscriptions",
  "organizations_url": "https://api.github.com/users/woodruffw/orgs",
  "repos_url": "https://api.github.com/users/woodruffw/repos",
  "events_url": "https://api.github.com/users/woodruffw/events{/privacy}",
  "received_events_url": "https://api.github.com/users/woodruffw/received_events",
  "type": "User",
  "site_admin": false,
  "name": "William Woodruff",
  "company": "@trailofbits ",
  "blog": "https://yossarian.net",
  "location": "New York, NY",
  "email": null,
  "hireable": null,
  "bio": "Research @trailofbits, maintainer @Homebrew\r\n\r\n",
  "twitter_username": "8x5clPW2",
  "public_repos": 75,
  "public_gists": 17,
  "followers": 395,
  "following": 26,
  "created_at": "2012-12-17T01:59:44Z",
  "updated_at": "2021-10-10T04:03:45Z"
}

that endpoint also works for organizational accounts, so it should be fine. For example, here's GitHub's:

{
  "login": "github",
  "id": 9919,
  "node_id": "MDEyOk9yZ2FuaXphdGlvbjk5MTk=",
  "avatar_url": "https://avatars.githubusercontent.com/u/9919?v=4",
  "gravatar_id": "",
  "url": "https://api.github.com/users/github",
  "html_url": "https://github.com/github",
  "followers_url": "https://api.github.com/users/github/followers",
  "following_url": "https://api.github.com/users/github/following{/other_user}",
  "gists_url": "https://api.github.com/users/github/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/github/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/github/subscriptions",
  "organizations_url": "https://api.github.com/users/github/orgs",
  "repos_url": "https://api.github.com/users/github/repos",
  "events_url": "https://api.github.com/users/github/events{/privacy}",
  "received_events_url": "https://api.github.com/users/github/received_events",
  "type": "Organization",
  "site_admin": false,
  "name": "GitHub",
  "company": null,
  "blog": "https://github.com/about",
  "location": "San Francisco, CA",
  "email": null,
  "hireable": null,
  "bio": "How people build software.",
  "twitter_username": null,
  "public_repos": 404,
  "public_gists": 0,
  "followers": 0,
  "following": 0,
  "created_at": "2008-05-11T04:37:31Z",
  "updated_at": "2020-09-28T06:15:10Z"
}

@woodruffw woodruffw reopened this Jan 21, 2022
@di
Copy link
Member

di commented Jan 21, 2022

Yeah, I think it's something we'd want to do at configuration time

But we'd also need to make the same request every time we verify the token to ensure that it hasn't changed since the configuration was created.

@woodruffw
Copy link
Member Author

But we'd also need to make the same request every time we verify the token to ensure that it hasn't changed since the configuration was created.

Yeah, good point. That also complicates the claim representation, since it's not part of the claims themselves but is necessary for ultimately verifying the JWT. So, if we go down this route, whatever DB representation we pick in #10617 will also need to accomodate keeping the ID somewhere.

@woodruffw
Copy link
Member Author

I believe the actual design task here is done, so this is safe to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants