Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: definition of "at risk" packages using heuristics #97

Open
aoberoi opened this issue Dec 15, 2018 · 7 comments
Open

Suggestion: definition of "at risk" packages using heuristics #97

aoberoi opened this issue Dec 15, 2018 · 7 comments
Labels
stale? This issue is dusty, please take a look and consider closing

Comments

@aoberoi
Copy link
Contributor

aoberoi commented Dec 15, 2018

A recurring topic I'm seeing in the open issues right now is how we want to make an impact across the ecosystem by identifying packages who are themselves in bad shape or depend (transitively) on packages that are in bad shape.

I believe there is an opportunity for this group to work on a set of heuristics to (either manually or programmatically) identify what it means for a project to be "at risk".

Here's a rather simplistic example of what such a set of heuristics might look like:

A package is classified to be at-risk when one of the following is true:

  • The package has open GitHub issues or PRs that are more than X months old and have no interaction from an owner/collaborator.
    • The threshold is X/2 when the package has more than Y number of downloads per week - to correct for the much larger impact extremely popular packages might have.
  • The package generates npm deprecate warnings from one of its dependencies
  • The package uses some other explicit signal that the owner is no longer interested in maintaining the project moving forward.

Once it can be "tuned" and we feel confident in it, we can begin surfacing the results - projects which are classified as at-risk - in many places. We may be able to work with npm, Inc. to utilize this in the CLI or on the website. We might publish our own website. We might publish guidelines or a tool that application and/or package authors can use to analyze their own dependency tree. We could supply a README badge service. The possibilities are endless, but I think it starts with creating a common definition of what we think at-risk looks like.

Refs:

@aoberoi aoberoi changed the title Suggestion: heuristics and algorithm for determining when a package is "at risk" Suggestion: definition of "at risk" packages using heuristics Dec 15, 2018
@demiacle
Copy link

I like some of these ideas! My personal heuristic is last updated, and num downloads. Its a poor standard but it answers two questions: is it being maintained? and do other people trust it?

That being said I think there are two issues here, can we depend on this package for some length of time and can we be sure the package is not introducing vulnerabilities.

My initial thought about your first bullet point is it doesn't actually address either issue and it may even have a side effect of hurting credibility from popular packages by evaluating them based on the amount of issues being identified. Having stale issues is definitely a concern but there is no current way to describe the importance of the issue so you may have a lot of low priority stale issues that come up which would skew results.

I like bullet 2 & 3 though.

I think most problems arise because time is limited and some issues are just deemed low impact. What we really need to know when deciding to depend on a package is can we trust a package to be maintained and for how long, and when it stops being maintained, then how do we go about the changing of the guard.

@dominykas
Copy link
Member

tl;dr: 1) we need to be careful to avoid negative labeling 2) not following what we deem "best practices" does not mean a package is "unmaintained"

This suggestion is making me think... Is this group really in a position to decide to label some package as "at risk" or "bad shape"? Adding such a label - even if it is in an automated list of thousands of packages - and publicising that in some website is going to hurt feelings. Not only will it hurt feelings, but it will do so incorrectly, unless the "detection tuning" is very very careful.

It is very hard to come up with some metrics which can't be interpreted in multiple ways:

  • lots of issues - possibly support requests or feature requests or just the approach the maintainer is taking (some people auto-close old issues they won't fix with a bot, others keep everything open, just for reference - does not mean the package is unmaintained/abandoned)
  • deprecate warnings from deps could very well be a deliberate choice, due to lack of time or otherwise; while this group may offer the time, should it really evolve towards being able to say "we're from the government and we're here to help"?

That said, before talking about heuristics, do we need to define what "bad shape" or "at risk" even means? What are these risks (that are in the scope of this group) that we're trying to minimize?

@mcollina
Copy link
Member

We need a better term that has no negative connotation. How about “highly depended packages” that might need some help?

@dominykas
Copy link
Member

dominykas commented Dec 21, 2018

Best term I can come up with on the spot is "unclear status", but things like that, once you start applying in a specific context, start to grow their own meaning. So in a couple of years people may just start reading "unclear status" as "crap", and we'd still be applying a negative label, even if inadvertant and with good initial intentions. We shouldn't do that.

I'd still like to question the need for this heuristic or labeling or even the category (as a single dimension) itself. What do we want to achieve/prevent?

One of the issues did mention that some packages may break in newer nodes - that's a very clear and unambiguous indicator. It can be coupled with "breakage date detected" and a link to an open/resolved/ignored+closed issue. Sure, one can work around it by having a true as npm test, but that's beside the point.

Then there's the security aspect, which is also pretty unambiguous - a package either has or hasn't unresolved security issues.

Anything more than that ("this package might have a security/upgrade problem and that problem might not get resolved if and when it occurs") is unfair and likely offensive to a maintainer no matter how you phrase it?

@wesleytodd
Copy link
Member

wesleytodd commented Dec 21, 2018

We shouldn't do that.

Agreed. I think any labels we create will inevitably have issues and second the "we shoulnd't do that" sentiment.

one of the issues did mention that some packages may break in newer nodes

While this is an issue I am not sure doing anything other than providing CITGM for module usage is a good idea. Again, it is a huge ask to try to solve the problem, but it is more reasonable to provide tooling for users to solve the issues on their own.

Then there's the security aspect, which is also pretty unambiguous - a package either has or hasn't unresolved security issues.

This is not "unambiguous". There are many reports which are either false positives or just not applicable. For example, the slug package had an issue filed against it for a ReDOS vouln. If the end user is using that in a way where untrusted input is passed to it on a web server, it can cause a perf issue. But the tooling also reported that migrate was vulnerable because it uses slug. Migrate is a cli tool for managing database migrations, so if you are giving it un-trusted user input you have a whole host of other problems that are nothing to do with a ReDOS report. This is anything but unambigious. tj/node-migrate#77 (comment)

My point with all of this is that we should focus on real and attainable goals before we attempt to "label" or "categorize" packages. It is fun when a solution for people problems is also a good recommendation for a software problem :)

@jonchurch jonchurch added the stale? This issue is dusty, please take a look and consider closing label Jul 29, 2021
@jonchurch
Copy link
Contributor

Is this something we are still interested in defining? I don't think I've seen discussion around this topic lately, so perhaps we have moved past identifying at risk pacakges and are working on a "I know it when I see it" basis.

I've labeled this stale?, and inviting the group to revisit.

@thescientist13
Copy link
Contributor

Although more of push model as opposed to pull, the group here did some interviews and does invite maintainers and collaborators to reach out to this group to help elevate the visibility of self identified projects in this space, mainly more so to help connect packages in need of some maintenance to those who may have time. I think given the proximity this group has to some high profile members of the NodeJS / npm / MS ecosystem, having organic outreach to this team, that can then be amplified through social channels would allow this group to help advocate without having to formally label anything?

Maybe it's just a matter of evangelizing / sharing more of what this group can do via social networks / channels?
https://github.com/nodejs/package-maintenance#for-maintainers

We've definitely done this kind of outreach triage before and I think is a good use of our collective network to help amplify such requests if we can. (but not really doing more than that.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale? This issue is dusty, please take a look and consider closing
Projects
None yet
Development

No branches or pull requests

7 participants