-
Notifications
You must be signed in to change notification settings - Fork 964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Malware detection and reporting infrastructure to support 3rd party reports #12612
Comments
I'll say from experience that reporting malware to PyPI has been a pleasant experience with a relatively fast response time when compared to other ecosystems. I think adding the proposed API/queue would be incredibly useful. A few considerations for an API that I think would be nice:
And of course, continuing to notify of report resolution would be really nice. |
Are you trying to automate the email-reading part of the human, or the decision-making part of the human? I'd be happy with something akin to a webook with just a url (and proof of reporter, whether that's an api token or signature) for a first draft. I don't commonly know the answers to the questions Louis proposes, when making the initial report, and providing them after the fact almost feels like making a bugtracker, which is orthogonal to the goal of making it easier on the admins to remove the obviously-bad projects we are inundated with today. One thing I'd be willing to provide (to pypi, or the trusted reporter community) is a realtime-ish feed of detections, if you wanted to combine several of those from different people to decide whether automatic action could be taken. |
This is mostly about making it easier for the the report-sending human and report-reading human. The decision will be still be entirely manual, but taking this out of email and onto PyPI will (hopefully) make it easier and faster for both. This will probably work using PyPI's existing user accounts and API tokens, for ease of implementation.
I think 95% of the time, all the context we (admins) need is the inspector link, but occasionally supporting evidence is helpful as well. Maybe this can just be an optional free-form text field for now?
One of the primary goals is for PyPI to be able to combine multiple reports about the same project into a single aggregated report -- right now, there's a lot of duplication. But I'd also be interested in providing these aggregated reports to other trusted reporters as well! Less sure about this ultimately resulting in automation though, at least until we have a less destructive way to delete things (ref #6091). |
Hi! Some thoughts about this issue.
|
I think it's likely we'll do this.
Classification doesn't matter too much to us (it's either a takedown or it isn't) but it would be good to have metrics on classes of malware for future reporting. What other use cases do you have in mind for classification?
I don't anticipate us changing our existing policy here. |
Some additional questions for everyone:
|
I have 2 options:
I have a few ideas -
It depends on how you send notification about the result. The easiest way would be to stay on the same platform. As an additional point, if the researchers send a clear report, it will be much easier to add relevant information about the removed package to the website/email blast. |
IMHO transparency is the key. Regarding notifications - suggesting implementing it similar to the GitHub issues' notifications experience - whenever a new status change, comment, or someone tags you - you get an email. I agree with @rakovskij-stanislav suggestion of using an external jury system. A similar approach is working great to help the moderators of StackOverflow vet answer edits made by its community members. For instance, let's assume we agree it's enough to have at least 3 authorized community members who manually inspect and vet if a package is indeed malicious. If so, you can set the final package verdict to malicious and initiate an automated cure process (removal, classification as "0.0.0-security" as @TalFo suggested, etc..) without the need for PyPi security team to do it manually. I agree with @louislang suggestion regarding specifying malicious classifications.
|
Email works really well imo.
If a jury system was in place, email notifications to review a package would be nice. Otherwise, just reviewing a queue would also likely suffice.
Email works for this use case too.
I really like the jury system idea. If an authorized jury system was used, how would packages that were incorrectly removed contest that decision? |
A user whose package is removed by the jury system will receive an automated email describing what happened and how to appeal the decision. It's one of the most proper solution here. An appeal request is sent to both the jurys who blamed the package and the pypi admins. This will help improve detection methods. Release that will be cleared by this method will obtain "clean" badge to notify other researchers that this release (linked with creation date to prevent badge save if the release will be recreated) is good. |
I wanted to revive this issue/discussion with a note that the PSF is in the final stages of securing funding to implement this. |
Update! A draft of the payload for malware reports has been published at #14503 for discussion/feedback. |
Update! A preview API has been merged #15228. We'll be reaching out directly to previously engaged parties to onboard and trial to learn more. At a future point, we may open the application to more folks. |
What's the problem this feature will solve?
Currently, malware reporting on PyPI is performed by sending an email to the PyPI maintainers (ref). This scales poorly, as the report itself is free-form, requires interpretation on the behalf of administrators, results in duplicate reports that are not easily de-duplicated, and does not collect relevant metadata (why the report was made and by who) for future reference or use. Additionally, varieties of reports are poorly distinguished (e.g. spam vs malware vs. compromise) which could lead to incorrect actions taken on behalf of the maintainers.
Describe the solution you'd like
A standardized API for generating a security report, limited to trusted reporters, that results in a non-email based queue of pending reports, grouped by the project in question, which administrators can easily process, which also stores metadata about the report itself.
This would make it easier to make malware reports, and allow for a shorter response time for administrators to respond to reports.
Additional context
Somewhat related: #3896 (essentially this, but for all PyPI users).
The text was updated successfully, but these errors were encountered: