Feature request: Automatically uninstall malicious packages taken down from PyPI #5777

di · 2018-09-12T15:13:18Z

What's the problem this feature will solve?
PyPI occasionally gets malicious packages uploaded to it. PyPI administrators remove the packages as quickly as possible, but sometimes users still install these packages before they are taken down, and the packages remain in the user's environment.

Describe the solution you'd like
At runtime, pip queries PyPI for a list of malicious packages that have been taken down from PyPI:

if it doesn't find any of them in the local environment, it does nothing;
if it finds a malicious package has been installed, it uninstalls it automatically.

Additional context
The necessary API doesn't currently exist on PyPI, but if this feature is accepted, it would be trivial to implement.

The text was updated successfully, but these errors were encountered:

pfmoore · 2018-09-12T16:02:24Z

My immediate thought is that I'd prefer not to silently uninstall anything, but rather to let the user know what's happened and ask permission to uninstall the malicious software.

Also, how would we confirm that package FOO on the user's PC is actually the malicious FOO from PyPI and not (say) some entirely local package that they developed themselves and isn't on PyPI?

di · 2018-09-12T16:47:04Z

My immediate thought is that I'd prefer not to silently uninstall anything, but rather to let the user know what's happened and ask permission to uninstall the malicious software.

Agreed, this should definitely make it known to the user that they had malicious software installed, in case they need to take further steps to mitigate the problem.

Also, how would we confirm that package FOO on the user's PC is actually the malicious FOO from PyPI and not (say) some entirely local package that they developed themselves and isn't on PyPI?

I think this is unlikely (most malicious packages are typo squats on real packages) but possible and would need to be addressed in some way.

If the uninstall isn't happening automatically, then the prompt could be a one-time thing: if the user decides to leave the package installed, pip won't warn about it again.

pradyunsg · 2018-09-18T12:56:06Z

@dstufft Thoughts?

hugovk · 2018-09-18T21:06:49Z

Some hypothetical questions:

Once a malicious package has been removed from PyPI, is that name forever flagged as bad?
Or can it be at some point flagged as safe? A use case could be a typo-squatted name is given to the rightful owner.
If it is marked as safe, would pip then stop asking to uninstall?
And how about if the user had installed the malicious version, but now the name is marked good, would pip uninstall the bad one?

di · 2018-09-18T21:16:47Z

Answers:

Once a malicious package has been removed from PyPI, is that name forever flagged as bad?

Yes.

Or can it be at some point flagged as safe? A use case could be a typo-squatted name is given to the rightful owner.

Nope, it is permanently unavailable. We don't release typo squats to the "proper" owners. It would be a pain to manage 1 real package plus 5-10 typos of the name simultaneously. It's easier if the typo just never works.

If it is marked as safe, would pip then stop asking to uninstall?

They won't be marked as "safe".

And how about if the user had installed the malicious version, but now the name is marked good, would pip uninstall the bad one?

See above.

RonnyPfannschmidt · 2018-09-19T05:03:46Z

@pradyunsg how about having pip check print them?

btw, how big is the list currently, im wondering if it would be reasonable to just download it compressed

di · 2018-09-19T05:17:52Z

@RonnyPfannschmidt About 200 project names.

(To be clear, I'm suggesting that the hypothetical API would be a single endpoint that returns all "bad" project names)

RonnyPfannschmidt · 2018-09-19T05:26:01Z

@di i beleive its reasonable to provide a .json.bz2 with all those and to download it in a cachable manner

di · 2018-09-19T14:42:51Z

I don't really think it even needs to be compressed. It doesn't change very often, as long as pip can conditionally GET it, it should be fine as just JSON.

RonnyPfannschmidt · 2018-09-19T15:02:22Z

@di im simply going to assume its going to grow to thousands of text entries in the years to come ^^ - but a transfer-encoding may be enough to safe those bytes

dstufft · 2018-09-19T16:57:21Z

I'm thing that actively malicious packages are a special case of the general case of "packages with security issues". After all, there is not a lot of difference between a good package that accidentally allows something malicious to happen and a bad package that purposely allows that same thing to happen-- in both cases the bad thing happens.

So with that in mind, I think a far better framework is something like what npm has implemented in npm audit, which is effectively a generic listing of versions of software that has security issues, that people can run against their code base to get a report. It also has a npm audit fix, which will attempt any automatic remediation that can occur (in this case, uninstalling the malicious package).

The generic thing is a bit more work, but I think it is far far more useful that a one off feature.

pfmoore · 2018-09-19T17:39:32Z

There's a need for care here on the server side (I say "server" rather than "PyPI" - see below for why). Once we start extending the reasons why we'd blacklist packages, we risk getting into a position of becoming curators, and PyPI as a curated system is a whole different thing. Having said that, (a) that's a problem for PyPI to wrestle with, not for pip, and (b) I'm not trying to suggest that "having a security flaw" is something we need to debate over.

I agree that having an audit/fix solution rather than an automatic removal is better. Sure, there's a risk that someone doesn't audit their system, but the consenting adults principle applies here. I do not want pip to try to make it so users don't have to think about issues like this, we should give them the tools and the information, but their choices are their own to make.

The question still remains (I asked it above and @di noted it but said he thought it would be rare) which is that for this to work, we need to track where packages come from. We can't really have PyPI being the authority that says "this name is forbidden". Consider as an example a package that has a security vulnerability and gets blacklisted. A company needs the functionality in that package, and creates a fixed version which they host on their local package index. Just because PyPI says that package is blacklisted can't be a reason for blacklisting the local version. So we need to be able to say "did this installed package come from the index that is reporting it as blacklisted?"

Also, should we allow other indexes to publish blacklists? If not, why not? At a minimum we'd have to allow testpypi to do so, so people can test things (which reminds me, who maintains the blacklist - will we have some "fake" blacklist items set up for testing?). And why not test things locally? But if we do, people will try to use it for broader reasons than revoking malicious packages. Having it not affect packages sourced from anywhere other than the index the blacklist came from reduces the scope creep here dramatically (consider someone trying to publish a local blacklist of GPL packages, because their corporate licensing doesn't let them use GPL code - if we don't let a local blacklist stop PyPI packages being used, we can avoid having to think about the implications of scenarios like that because it simply won't work). Limiting the blacklist feature to "only PyPI" avoids a lot of this complexity of course (but introduces the "how do we test the feature" question...)

dstufft · 2018-09-19T18:01:26Z

There's a need for care here on the server side (I say "server" rather than "PyPI" - see below for why). Once we start extending the reasons why we'd blacklist packages, we risk getting into a position of becoming curators, and PyPI as a curated system is a whole different thing. Having said that, (a) that's a problem for PyPI to wrestle with, not for pip, and (b) I'm not trying to suggest that "having a security flaw" is something we need to debate over.

Given there's a server side and a client side here, and IMO we should have this standardized, we probably should at a minimum discuss this on distutils-sig, if not produce a PEP for it.

Sure, there's a risk that someone doesn't audit their system, but the consenting adults principle applies here.

We could even automatically run a pip audit on install, and print out a message like "hey we detected 7 security issues, run pip audit for more information. That doesn't automatically do anything, but it does provide information as part of the install that will hopefully lead people to investigate more.

The question still remains (I asked it above and @di noted it but said he thought it would be rare) which is that for this to work, we need to track where packages come from. We can't really have PyPI being the authority that says "this name is forbidden". Consider as an example a package that has a security vulnerability and gets blacklisted. A company needs the functionality in that package, and creates a fixed version which they host on their local package index. Just because PyPI says that package is blacklisted can't be a reason for blacklisting the local version. So we need to be able to say "did this installed package come from the index that is reporting it as blacklisted?"

Yea, for this to work we will need to start tracking the provenance of packages, which isn't really a big deal I don't think, it'd just be more metadata in the installation DB to specify the repository that it came from. Alternatively we could track a unique hash of the sdist or something, and tag vulnerability reports to specific hashes. There are a few ways we could take it, and that will largely depend on the design of the server side API, but 100% agree that the feature needs scoped more specifically than "anything named foo is bad".

Also, should we allow other indexes to publish blacklists?

Yes. PyPI should not be special other than the fact it's the default.

brainwane · 2019-06-22T22:11:02Z

I think PEP 592 may be relevant to this:

brainwane · 2020-04-03T22:13:34Z

We now have some more features on the Warehouse side that are relevant here - we're about to get yanking, we have the start of some malware detection, and there's an event log as the foundation of notifications. So this may be more possible soon, once information's available in the Warehouse API to read.

brainwane · 2020-04-23T01:21:08Z

Now that PEP 592 is accepted and implemented pypi/warehouse#5837, if you are interested in working on this feature, take a look at the yanking feature and the "yanked" field in PyPI's JSON API.

di · 2020-04-23T01:58:37Z

To be clear: yanked releases should not be considered malicious releases and should not be automatically uninstalled. If PyPI starts exposing packages removed for being malicious/typosquats, it'd be via an entirely new API, not the existing project/release JSON API (since the project/releases won't exist anymore once they're removed).

brainwane · 2020-04-23T01:59:54Z

Whoops. Sorry for the error and thanks for the correction.

AkechiShiro · 2023-11-29T22:21:16Z

There was no new progress since 2020 on this interesting feature ?

di mentioned this issue Sep 12, 2018

Publish a list of malicious packages that have been taken down pypi/warehouse#4703

Open

pradyunsg added type: security Has potential security implications type: feature request Request for a new feature labels Sep 18, 2018

This comment has been minimized.

Sign in to view

chrahunt mentioned this issue Sep 1, 2019

When installing a new package, print CVEs associated with this package and the dependencies that were installed #6087

Closed

di mentioned this issue May 1, 2020

redirects for Databased Backed Blacklists pypi/warehouse#7840

Closed

di mentioned this issue Dec 19, 2023

Add Principles for Package Repository Security ossf/wg-securing-software-repos#37

Merged

ichard26 mentioned this issue May 11, 2024

A bug database and a bug signalling method in pip or withdrawing buggy versions from repos #8315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Automatically uninstall malicious packages taken down from PyPI #5777

Feature request: Automatically uninstall malicious packages taken down from PyPI #5777

di commented Sep 12, 2018

pfmoore commented Sep 12, 2018

di commented Sep 12, 2018

pradyunsg commented Sep 18, 2018

hugovk commented Sep 18, 2018

di commented Sep 18, 2018

RonnyPfannschmidt commented Sep 19, 2018

di commented Sep 19, 2018

RonnyPfannschmidt commented Sep 19, 2018

di commented Sep 19, 2018

RonnyPfannschmidt commented Sep 19, 2018

dstufft commented Sep 19, 2018

pfmoore commented Sep 19, 2018

dstufft commented Sep 19, 2018

This comment has been minimized.

brainwane commented Jun 22, 2019

brainwane commented Apr 3, 2020

brainwane commented Apr 23, 2020

di commented Apr 23, 2020

brainwane commented Apr 23, 2020

AkechiShiro commented Nov 29, 2023

Feature request: Automatically uninstall malicious packages taken down from PyPI #5777

Feature request: Automatically uninstall malicious packages taken down from PyPI #5777

Comments

di commented Sep 12, 2018

pfmoore commented Sep 12, 2018

di commented Sep 12, 2018

pradyunsg commented Sep 18, 2018

hugovk commented Sep 18, 2018

di commented Sep 18, 2018

RonnyPfannschmidt commented Sep 19, 2018

di commented Sep 19, 2018

RonnyPfannschmidt commented Sep 19, 2018

di commented Sep 19, 2018

RonnyPfannschmidt commented Sep 19, 2018

dstufft commented Sep 19, 2018

pfmoore commented Sep 19, 2018

dstufft commented Sep 19, 2018

This comment has been minimized.

brainwane commented Jun 22, 2019

brainwane commented Apr 3, 2020

brainwane commented Apr 23, 2020

di commented Apr 23, 2020

brainwane commented Apr 23, 2020

AkechiShiro commented Nov 29, 2023