-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for API keys #994
Comments
This is another thing I've been wanting to do, but is likely a post-launch task. I'm a bit on the fence of how to exactly handle them, but one option I've been thinking about is instead of API keys, using client certificates for TLS which would give built in support for a signing based approach, high entropy, allow it to be used for all uploads (could store it password protected typically, and just offer a password-less option for automation), and expiration of the token. One problem with this is that it would mean we can't route uploads through our CDN, however uploads don't really gain anything by going through the CDN (and in fact, it's a bit harmful since uploads need a longer timeout than normal requests, we're forced to have high, 20+ second timeouts on upload). I've also considered something like OAuth here instead of just an API Key which would solve a lot of these problems as well, in addition to making it possible to securely grant other projects the ability to modify just one package (or one scope inside of that package). There's also the likely future signing tool, TUF, where we could just enforce that all uploads must be signed by a valid key for that author, and use that key as the authentication. A lot of different options here, which is another reason why it's likely a post-launch task :) |
I really really want to get my PyPI information out of CI. At the risk of responding to a years-old thread...I want to volunteer to do this work (as well as #996). :-) At this point, Warehouse is launched (albeit in beta), and the legacy upload endpoint is deprecated. I assert it would be reasonable to add this, although others who have actually been thinking about this for more than a few days might know better than I (so feel free to chime in and tell me!). Before reading @dstufft's comments above, my thought was, "Implement API keys", but I have nothing against the idea of certificates. Here is what I think is needed (sub out "key" below with "certificate" if we go that route):
Would anyone have any objection to my taking some time to scope this out further, with an eye to getting the work in soon-ish? (Since I am new around here, it would probably require some review cycles from @ewdurbin, @dstufft, etc.) |
Thanks for your note, @lukesneeringer, and sorry for the slow response! Thank you for volunteering to do this work! As I think you know, but just for context for future folks finding this discussion, the folks working on Warehouse have gotten funding to concentrate on improving and deploying Warehouse, and have kicked off work towards our development roadmap -- the most urgent task is to improve Warehouse to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable. So that's what Ernest, Dustin, and Nicole have been concentrating on and will be concentrating on for the next few months. But I'm putting your suggestion on the agenda for a sync-up meeting we'll have tomorrow, and we'll have more thoughts for you then. Also, Ernest wants to help folks get started as new Warehouse contributors, and has 30-minute 1:1 slots available each week, in case that's something you, or someone you know, is interested in doing. Thanks again! Talk with you soon. |
Sounds good. My guess is that this is probably work that can be done in parallel to the Warehouse improvements. The trick would be that the keys would not work on legacy PyPI, and therefore anyone using the legacy URL would not be able to use them. (However, I suppose it might be the case that review cycles or whatnot would not be available.)
Yep -- we already did that. :-) |
@lukesneeringer Oh great, glad you and Ernest have already started working together! In our meeting today we said "yay" about you working on this! Please go ahead and start scoping it out and let us know your thought process as you work. I could imagine you finding the SimplySecure resources useful on a UX level. We also decided that, as a new feature, this belongs in a future milestone. But we will do our level best to review your work as you have it! Could I please ask you to also comment at #996 to mention there that you're working on it? |
That sounds good. |
I'm going to be at all four days, and I think a number of other Python packaging/distribution developers will too. I think it'll likely be a good time to hash out architectural stuff and do some pair programming and in-person reviews. So if you could be there two or 2.5 days that would probably be of benefit. |
@lukesneeringer how is this going? Do you have any plans or code that you'd like us to look at? |
@brainwane Hi there; I have been on vacation. I will have a plan (and some code) for you to look at it on Friday. :-) |
@brainwane @ewdurbin et. al. I have started doing research and have a minimal amount of code to paper, but want to bring in other voices and such at this point. The API keys themselvesI assert that a new database model should be added to Key ContentsAs far as the contents of the keys, I am learning toward using RSA keys, and having the interface essentially allow you to upload the public keys (meaning that initially the user will be responsible for creating said keys). The request would include a signature (signed with the private key) which is validated against the expected signature using the public key. There are a few downsides to this approach: It puts the burden on the package maintainer to generate the key, is the big one. We could potentially later do what some other sites do where they provide generation, store the public key in the database, and give a forced one-time download of the private key. I think we should start with user generated keys, however, because it allows users to generate encrypted keys (and store the encryption key in CI). Data StructureI propose the following data model: class AccessKey(db.Model):
'''Access keys for project access separate from passwords.'''
__tablename__ = "access_keys"
# We a public key, and the client is responsible for signing
# requests using the corresponding private key.
public_key = Column(Text)
# An access key must be attached to either a user or a project,
# and may be attached to both.
#
# Attaching to a user limits the scope of the key to projects which
# that user can access (at the time access is attempted, not when the
# key is made). It is possible for this set of projects to be zero.
#
# Attaching to a project limits the scope of the key to that project.
user = orm.relationship(
User,
backref="access_keys",
lazy=False,
nullable=True,
)
project = orm.relationship(
Project,
backref="access_keys",
lazy=False,
nullable=True,
)
expiry = Column(
DateTime(timezone=False),
nullable=True,
)
created = Column(
DateTime(timezone=False),
nullable=False,
server_default=sql.func.now(),
) What is important here is the relationships with ImplementationI assert that this would require an additional Then, logic needs to be added to Finally, this would entail a change in RestrictionsThe biggest restriction on this is that the API keys would only initially be usable for the upload functionality. (Presumably ConcernsMy biggest concern about this is the keys. Using RSA keys provides several useful benefits (passphrases, high entropy, etc.), but it also feels a good bit more complicated than what (for example) npm does. Other package managers just use direct API keys (which seems awfully insecure) or some less secure form of key-secret combo. One concern here is that if this is deemed too difficult to get set up, people may choose not to use it. Another concern is "key collision". The idea here is to be able to have single package tokens, but most people work on lots of Python packages. Similarly, one might want a passphrase-based key to go in CI and a passphrase-less key to go on local disk. I think this sort of thing is solvable by being smart about naming and ordering. A potentially attractive idea is to actually look for project-specific keys in a subdirectory of the user's home directory before looking for a project-specific key in the project folder, then look for user-wide keys in the reverse order. ConclusionThis is a writeup for the moment. I have the model written (a trivial task) and am going to start with the various pieces of plumbing written above. Feedback is definitely desired before I get too far into it. |
Looks awesome! One question though -- |
I'm still digesting this, but I wanted to jot down my initial thoughts:
|
Sorry, I misspoke. I meant that they should upload a public key.
Here is my rationale: often times, organizations want API keys that are independent of individual users. Essentially, a company does not want all of their keys to break because an individual user leaves, and most people do not make separate accounts on PyPI for work vs. personal. A higher weight way to solve this problem would be to have explicit organizations to which credentials could be attached. Lower-weight version: Encourage the use of organization-level "users" -- but that has the downside of things like a single password that everyone shares, etc.
Ironically enough, I actually went for something heavier weight because of your thinking in the previous comment. :-) Given that we have been storing passwords in plaintext since time immemorial, and given that most other package managers go for simpler solutions, there is a good chance that I am overthinking here. I think there are two primary concerns: (1) mistakenly leaked or otherwise woefully unsecured keys, and (2) sniffing. The proposal I am putting forward does basically nothing for (1) and effectively guards against (2).
I would not have any issue with this approach; it would still improve on the status quo. (This is, of course, an inferior approach for the sniffing concern, but it has been fine for most other package managers to the best of my knowledge.) |
I've been thinking about this a lot and I think I've come up with the start of a proposal for how to handle this. To start off with, I think that a public key crypto scheme is generally overkill for this application. We don't have N untrusted parties that need to be able to verify the authenticity of a request, just a single trusted party (PyPI) which means there is little need to have a scheme that hides the credential from PyPI itself. A public key crypto scheme would prevent people who can intercept traffic from getting the upload credentials, however PyPI also mandates TLS which provides the same properties (and if you can break our TLS, you can also just upload your own key to the account of the user you wish to upload as). I do think that some sort of request signing scheme could be useful in that in the case of a TLS break it limits the need to re-roll credentials across the board. I think that would be more generally fit for an "upload 2.0" API that would eventually sunset the current API rather than extending the current API. Utilizing a bearer token authentication scheme today would mean that twine, etc just work immediately and we can constrain the effort of needing get agreement between multiple parties to a point in the future when we actually want to design a new upload API. So given that, I think the best path forward is to use some sort of bearer token authentication scheme. The simplest of these would just be a simple API key where Warehouse generates a high entropy secret and shares that with the user. However that has a number of drawbacks, such as:
After thinking about this for a few days and talking it over with some folks who are much smarter than me, I think that the best path forward here is to use Macaroons. Macaroons are a form of bearer token, where Warehouse would mint a macaroon and pass it onto the user. In this regards they are similar to the simple API key design. Where macaroons get more powerful is that instead of baking things like "here is a list of projects that this macaroon is able to access" into the database, it is stored as part of the macaroon itself AND that given a macaroon, you can add additional "caveats" (like which projects it can access) and mint a new macaroon without ever talking to Warehouse. This would allow a workflow like:
I think this ends up making a really nice system that can be retrofit to the current upload API, but that allows a lot of new capabilities for delegation and restricting delegation. We would need to figure out which caveats we want to support in Warehouse (even though anyone can add caveats, the caveats have to be supported by Warehouse so you can't add arbitrary ones). Off the top of my head I can think of the following (naming can be adjusted):
One caveat to the above, is that we likely should require both a Internal to Warehouse, we'd need a table that would store all of the initial root keys ( There is still the question of whether it makes sense to have these Macaroons able to be specified without a user or not. For right now, I think we should make them owned by a specific user, HOWEVER I think that we should always add an additional caveat that describes which user the macroon belongs to. Essentially scoping the macaroon to just that user (not the same as scoping it to a specific user in the What do you think? |
One nice thing about the above idea, is that end users can completely ignore the extra power granted by Macaroons if they want. They can simply treat it as an API key, generate a Macaroon via Warehouse, pass it to Twine as the password for their account and go on with life without giving it a second thought. Of course someone who wants to unlock additional power for delegation or resource constraining can opt into that and can utilize the additional benefits provided by Macaroons. The design of Macaroons is fairly brilliant in this aspect where it's very much a low cost abstraction for the basic use case, but enables you to delve deeper into it to unlock a lot more power. In the hypothetical Travis example, the end user might not even be aware that Travis is adding additional caveats to their Macaroon (or that it's even possible to do that). It could easily be presented to them as nothing more than adding the API key to Travis, with Travis doing the extra expiration stuff completely behind the scenes for the benefit of the user. I could even potentially see something like Twine just always mint a new macaroon scoped very specifically to what it is about to upload, with a very short expiration. Since that doesn't require talking to Warehouse to do, it would be very fast and very secure, allowing Twine to limit the capabilities of the token they actually send on the wire. While we generally trust TLS to protect these credentials, automatic scope limitation like that is basically zero cost and provides defense in depth so that in the case someone is able to look into the TLS stream (for example, a company MITM proxy) the credentials they get are practically useless once they've been used once. |
To extend what @dstufft said -- we can have a constraint be "upload file with hash " which means replay attacks (short of pre-image attacks) are useless. If we further have twine auto-attenuate with this + short time frame it means future pre-image attacks are useless too, there better be a pre-image attack ready now. This means that while, of course, we all love TLS a lot, with this in place, TLS would not be needed for security -- even complete breakage of TLS would not allow someone to upload a package with malicious code. |
I like this idea. One thing we could do is allow either the API key to be sent directly (meaning that we get the constraint of effort you mention) or a signing algorithm, which then tools could opt in to.
I like this idea too. +1.
I am okay with this provisionally but I think it is an important limitation. I do think that group permissions will be necessary. I do think it is reasonable to add groups first and then group level permissions, rather than the converse order that I originally proposed. I am a little confused about the user caveat. I would like to understand its purpose. Would we allow Macaroons to be moved? This seems implausible. Additionally, I do think we eventually need to end up with user-independent tokens. The point needs to be that the credentials continue to work after a user is no longer part of that group.
This is definitely something that would be easy and valuable to do. It gives you the value of request signing, effectively. I am sold on this. I will get an implementation of this in soon. Also, thanks @dstufft for teaching me about Macaroons. That is really valuable. |
@lukesneeringer I should mention that after talking through this more with people, I think that the right implementation would look something like:
Some replies to your comments:
We need a number of the values out of the macaroons in order to construct what the signing key should be in this hypothetical signing algorithm. It'd possible we could do something where we rip enough stuff out of the macaroon format so that caveats are still sent along with the request, but not the actual HMAC signatures, so that the server could still construct what the expected signing key is. I'm not sure that would be worth the effort.
Basically the user caveat is how you say "This macaroon (and by nature, all macaroons created from this macroon) are scoped to only resources that X user has access to and acts as if it were X user. The reason this is a caveat instead of just a column in the database (although it likely should be one of those too) is to keep our options open in the future, so that we can potentially start creating macaroons without that caveat (perhaps with a So to start out with, we'd always include that |
Oh, and the opaque, unique value would also give us something we can enumerate to display a list of macaroons owned by a user (to allow them to delete/revoke unused ones) and allows us to do things like record in the database whenever they are used, so we can display the last time each one was used too. |
All this sounds good. Also, apologies, I was replying inline before I read the entire post, so my first quote above is less relevant than I thought. If twine ever makes the addition you recommend to add the date range, it accomplishes the same thing as signing would. |
@brainwane filed a bug: #6262 |
Upload-only API tokens (both user-scoped and project-scoped) are now in beta on PyPI and Test PyPI! Our update on Discourse is at https://discuss.python.org/t/pypi-security-work-multifactor-auth-progress-help-needed/1042/31 . Uploading with an API token is currently optional but encouraged; in the future, PyPI will set and enforce a policy requiring users with two-factor authentication enabled to use API tokens to upload (rather than just their password sans second factor). Once the beta period for API tokens is complete, we will make a launch announcement on the pypi-announce mailing list, and start to notify project maintainers and owners of the upcoming policy change. Then, after a suitable waiting period, we will begin to enforce this restriction, and include a notice in the error message returned to clients. |
Is there any chance of adding 2FA support for uploads, as opposed to only accepting tokens? Seems like 2FA should be supported and preferred for dev machines. Storing API tokens locally doesn't seem any more secure than storing the username and password locally in that case. |
"more secure" depends, of course, on your threat model. One common problem with storing secrets locally is that they are available to any future application that runs as the user (any current application that runs as the user is a threat for 2FA-based systems, since it can directly hijack the session). However, this threat can be mitigated by only storing short-lived tokens. The Macaroon system we are implement allows adding such validity caveats to tokens before storing them. For example, you could create a token valid for 5 minutes before each upload. In addition, it is also expected and straight-forward to invalidate API tokens through the UI. Can you indicate what threat model you think 2FA for uploads solves that short-lived tokens do not? |
Are there any discussions as to where how the project-name:api-token mapping should be stored ? Or do you just expect devs to use token with the same scope as the user ? |
So you're saying the workflow to upload a package from a dev machine would be to log in to pypi with 2FA, get a short lived token, and tell twine about it? Why not just cut out the middle step and tell twine about 2FA? |
That sounds like a terrible idea to me - I really do not want to have to involve a browser (or enter my password in a CLI which would be worse since I generally do not know my passwords but generate them from a master password or randomly (and then store them in a password manager)) to publish packages. I think there are two usecases here:
When publishing to npm right now, everything works straightforward: I run So ideally, I'd like to have the same behavior with pypi/twine. I wouldn't mind if it was internally using a long-lived token that requires 2FA in addition and created a short-lived token to do the actual publish. This would actually be convenient for cases where you publish multiple packages in one go so you don't need to enter multiple tokens (and even allow reuse of a TOTP which is probably a bad idea). |
This could be implemented by adding a 2fa caveat. In the UI one would create a token with a 2fa requirement (maybe with a simple checkbox). Then twine would see that token (the macaroons can easily be inspected), ask the user for the 2fa code, add it to the existing token and send that to pypi. The new token would expire automatically when TOTP is used and the token from the UI would be useless without the 2fa code. I think this sounds sensible, useful and relatively straight forward to implement. Unless I overlooked something fundamental. |
One thing I would like to throw into the mix here with regards to 2FA for uploads is that #726 proposes a "two phase" upload, where packages are uploaded from the command line and (optionally) go into a "staging area" before they become available to the public and immutable. I will find that enormously useful for other reasons, but it also adds another workflow where the final upload is necessarily gated by 2FA - uploads into the staging area would be possible using the upload API key but none of that would actually be published without logging in with the 2FA key. Obviously that doesn't help anyone who deliberately wants to avoid using the browser as any part of their upload workflow, but it may be more convenient than the "get a short-term key from the browser and paste it into twine" workflow. |
Just 0.02c from the implementation side: yes, I think the right way to do this would be with an additional 2FA caveat as @fschulze proposed. That's out of scope for the current work (and would involve changes to twine and other uploaders), but wouldn't be too difficult to implement. OTOH, single-use and/or time-scoped tokens (as proposed by @moshez) that require a second factor for minting would provide similar security properties and potentially be less invasive for automatic deployments. |
@Carreau does it really make sense to have multiple tokens on dev machine? If yes, you could have multiple "repository" entries, one per each token/project. If no (better DX) — just use a user-wise token. There's been some discussion @ https://twitter.com/Ewjoachim/status/1154479563419869184 |
For me yes it does. I only want my work machine to be able to publish some packages; and vice versa for my home machine. Also tokens are "upload only" (or are going to be); so I can keep my password safer. For now I'm good with a custom solution but would love for an agreed upon way of doing it before various incompatible solutions emerge. |
Heads-up for people trying the beta of uploading with API tokens:
|
That's ok, as long as you don't change existing tokens |
@graingert I'm sorry, but yes, we will probably be making so that tokens you have already created do not work. As the manager on this project I'm comfortable making that choice during this beta, since we have warned people that there was a chance this would need to happen during the beta. To quote @ewdurbin in #6287 (comment) ,
|
We have updated the token username and prefix in #6342. username: These changes should alleviate the need for escaping heroics. The previous format will continue to work for now, but users will be notified to update their configurations to match the new syntax before the beta period is over. |
Sorry for moving your comment here, @takluyver, but I want to keep this issue focused on API keys and that issue on the rollout! We don't have a specific plan for that API feature yet, no. I filed your request as #6396. |
We've rolled out scoped API tokens for package upload on PyPI. It is in beta, and #5661 is a meta-issue where we are tracking its rollout and getting the last few items fixed before ending the beta, and the policy changes (requiring API token usage for some users) we'll make after that. We've now implemented all the items in this API token checklist. Some features are out of scope for our current funding:
So, per agreement with other maintainers in that meeting, I'm closing this issue. Please enjoy upload API tokens on Warehouse, and file new issues to request new API key-related features. Thank you all! |
A scary number of people embed their PyPI username and password in their Travis config (using Travis encrypted variables), to enable automatic releases for certain branches (Travis even has a guide for it).
In addition, the packaging docs example encourages users to save their password in plaintext on disk in their
.pypirc
(they can of course use twine's password prompting, but I wonder how many read that far, rather than just copy the example verbatim?)Whilst in an ideal world credentials of any form wouldn't be saved unencrypted to disk (or given to a third-party such as Travis) and instead users prompted every time - I don't think this is realistic in practice.
API keys would offer the following advantages:
password
field in.pypirc
, leaving a much safer choice between password prompting every time, or creating an API key that could be saved to disk.Many thanks :-)
(I've filed this against warehouse since I'm presuming this is beyond the scope of maintenance-only changes being made to the old PyPI codebase)
The text was updated successfully, but these errors were encountered: