-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Automatically uninstall malicious packages taken down from PyPI #5777
Comments
My immediate thought is that I'd prefer not to silently uninstall anything, but rather to let the user know what's happened and ask permission to uninstall the malicious software. Also, how would we confirm that package FOO on the user's PC is actually the malicious FOO from PyPI and not (say) some entirely local package that they developed themselves and isn't on PyPI? |
Agreed, this should definitely make it known to the user that they had malicious software installed, in case they need to take further steps to mitigate the problem.
I think this is unlikely (most malicious packages are typo squats on real packages) but possible and would need to be addressed in some way. If the uninstall isn't happening automatically, then the prompt could be a one-time thing: if the user decides to leave the package installed, |
@dstufft Thoughts? |
Some hypothetical questions:
|
Answers:
Yes.
Nope, it is permanently unavailable. We don't release typo squats to the "proper" owners. It would be a pain to manage 1 real package plus 5-10 typos of the name simultaneously. It's easier if the typo just never works.
They won't be marked as "safe".
See above. |
@pradyunsg how about having pip check print them? btw, how big is the list currently, im wondering if it would be reasonable to just download it compressed |
@RonnyPfannschmidt About 200 project names. (To be clear, I'm suggesting that the hypothetical API would be a single endpoint that returns all "bad" project names) |
@di i beleive its reasonable to provide a .json.bz2 with all those and to download it in a cachable manner |
I don't really think it even needs to be compressed. It doesn't change very often, as long as |
@di im simply going to assume its going to grow to thousands of text entries in the years to come ^^ - but a transfer-encoding may be enough to safe those bytes |
I'm thing that actively malicious packages are a special case of the general case of "packages with security issues". After all, there is not a lot of difference between a good package that accidentally allows something malicious to happen and a bad package that purposely allows that same thing to happen-- in both cases the bad thing happens. So with that in mind, I think a far better framework is something like what npm has implemented in The generic thing is a bit more work, but I think it is far far more useful that a one off feature. |
There's a need for care here on the server side (I say "server" rather than "PyPI" - see below for why). Once we start extending the reasons why we'd blacklist packages, we risk getting into a position of becoming curators, and PyPI as a curated system is a whole different thing. Having said that, (a) that's a problem for PyPI to wrestle with, not for pip, and (b) I'm not trying to suggest that "having a security flaw" is something we need to debate over. I agree that having an audit/fix solution rather than an automatic removal is better. Sure, there's a risk that someone doesn't audit their system, but the consenting adults principle applies here. I do not want pip to try to make it so users don't have to think about issues like this, we should give them the tools and the information, but their choices are their own to make. The question still remains (I asked it above and @di noted it but said he thought it would be rare) which is that for this to work, we need to track where packages come from. We can't really have PyPI being the authority that says "this name is forbidden". Consider as an example a package that has a security vulnerability and gets blacklisted. A company needs the functionality in that package, and creates a fixed version which they host on their local package index. Just because PyPI says that package is blacklisted can't be a reason for blacklisting the local version. So we need to be able to say "did this installed package come from the index that is reporting it as blacklisted?" Also, should we allow other indexes to publish blacklists? If not, why not? At a minimum we'd have to allow testpypi to do so, so people can test things (which reminds me, who maintains the blacklist - will we have some "fake" blacklist items set up for testing?). And why not test things locally? But if we do, people will try to use it for broader reasons than revoking malicious packages. Having it not affect packages sourced from anywhere other than the index the blacklist came from reduces the scope creep here dramatically (consider someone trying to publish a local blacklist of GPL packages, because their corporate licensing doesn't let them use GPL code - if we don't let a local blacklist stop PyPI packages being used, we can avoid having to think about the implications of scenarios like that because it simply won't work). Limiting the blacklist feature to "only PyPI" avoids a lot of this complexity of course (but introduces the "how do we test the feature" question...) |
Given there's a server side and a client side here, and IMO we should have this standardized, we probably should at a minimum discuss this on distutils-sig, if not produce a PEP for it.
We could even automatically run a
Yea, for this to work we will need to start tracking the provenance of packages, which isn't really a big deal I don't think, it'd just be more metadata in the installation DB to specify the repository that it came from. Alternatively we could track a unique hash of the sdist or something, and tag vulnerability reports to specific hashes. There are a few ways we could take it, and that will largely depend on the design of the server side API, but 100% agree that the feature needs scoped more specifically than "anything named foo is bad".
Yes. PyPI should not be special other than the fact it's the default. |
This comment has been minimized.
This comment has been minimized.
I think PEP 592 may be relevant to this: |
We now have some more features on the Warehouse side that are relevant here - we're about to get yanking, we have the start of some malware detection, and there's an event log as the foundation of notifications. So this may be more possible soon, once information's available in the Warehouse API to read. |
Now that PEP 592 is accepted and implemented pypi/warehouse#5837, if you are interested in working on this feature, take a look at the yanking feature and the "yanked" field in PyPI's JSON API. |
To be clear: yanked releases should not be considered malicious releases and should not be automatically uninstalled. If PyPI starts exposing packages removed for being malicious/typosquats, it'd be via an entirely new API, not the existing project/release JSON API (since the project/releases won't exist anymore once they're removed). |
Whoops. Sorry for the error and thanks for the correction. |
There was no new progress since 2020 on this interesting feature ? |
What's the problem this feature will solve?
PyPI occasionally gets malicious packages uploaded to it. PyPI administrators remove the packages as quickly as possible, but sometimes users still install these packages before they are taken down, and the packages remain in the user's environment.
Describe the solution you'd like
At runtime,
pip
queries PyPI for a list of malicious packages that have been taken down from PyPI:Additional context
The necessary API doesn't currently exist on PyPI, but if this feature is accepted, it would be trivial to implement.
The text was updated successfully, but these errors were encountered: