-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for default crate recommendation ranking #1824
Conversation
1. nom | ||
2. combine | ||
3. and 4. peg and lalrpop, in some order | ||
5. peresil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Markdown is renumbering this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UGH MARKDOWN NO.
This seems like an obvious way to encourage careful documentation. It would be nice if it was a strict requirements for crates to achieve some rank to have complete documentation on every public item. |
I disagree with the "Maintenance" section. |
I'm with @tomaka on the Maintenance part. Especially for small, self-contained crates this would be a huge disadvantage. Apart from bug fixes some simply might not need releases that often, while still providing value. |
I love that this is even an RFC - its really cool that we're able to have a very public, community-driven conversation about something like how crates.io ranks crates. I'm not sure displaying a numeric score is a great idea; I do worry that it encourages gamification (of course, people will optimize for the measurement by virtue of it being a measurement, no matter how its displayed). I guess I'm concerned people will compare close percentages as if that's authoritative when if two crates are ranked very close together its probably a wash. Maybe some sort of more vague statement where the 2 or 3 highest ranking categories receive superlative statements. |
When it comes to documentation, if you take these two examples:
The first one is clearly much better documented, yet they would both get the same score. But I agree that it's not easy to come up with a way to measure that objectively. |
@withoutboats thanks! ❤️ I'm glad this is an RFC too, I'm looking forward to discussing this with everyone :) WDYT about the emoji ☀️🌤⛅️🌥☁️🌧? What if we just showed the emoji? I agree that not inferring significance when there isn't much is a problem; I do want it to be transparent why a crate is ranked higher than another, though. I also want to be able to show crate authors how they can make their crates better, if they choose to. Definite tension here. |
For me the only way to know whether a crate is well maintained is the average time between when a bug report is opened and when it gets fixed (or marked as wontfix for example). For me it makes sense that a crate gets frequent releases before it reaches 1.0. A crate blocked at version 0.1.0 for six months clearly isn't maintained, because if there was nothing to change in the crate then it should be at version 1.0 already. But once you reach 1.0 maybe you don't release anything because there is nothing to release. |
That would be handled by rating >=1.0 versions higher than <1.0, right? |
this category easier. | ||
|
||
[nom]: https://crates.io/crates/nom | ||
[peresil]: https://github.com/docopt/docopt.rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This links the wrong library and can be confusing. I looked up the library mentioned and it makes more sense since this is a parsing library.
https://github.com/shepmaster/peresil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooooops lol. I was using different examples at first and missed updating this link. Good catch!
Is gamification necessarily bad? I mean, if it isn't overly competitive and inaccesible, I don't think it is a bad thing. It is important though that it doesn't end up being something causing more issues (stress, discomfort, etc.) than benefits (fun, quality, etc.). |
@tomaka @badboy one thing to remember is that the proposed ranking is based on what current, actual humans who use crates (and contributed to the survey) believe makes a crate better. It may or may not align with what authors think makes their crate better. To create a strawman example, if I believe that placing an image of a cat 🐈 in my docs makes a crate better but the majority of users of crates believe that a dog 🐕 signifies a good crate, then it doesn't really matter what I think; the majority of people won't use it. |
@badboy I believe that @tomaka is suggesting that the "# releases in the last X time periods" score could be adjusted or rebalanced if the crate has reached 1.0, as they believe the number of releases should be lower than a pre-1.0 crate. |
But whatever ranking we choose will definitely have an effect on both what users think is good and what authors think is good, so we should choose carefully! Some people might not bother with the analyses they do now; if we have a ranking, they might just trust the ranking. One example of something that was mentioned in the survey a lot was looking at the number of dependencies a crate has, and preferring crates with fewer dependencies. While I have had bad experiences managing dependencies and dependencies of dependencies, Cargo makes it so much easier to manage dependencies than in other systems programming languages. For this reason, I don't think this is a measure we should use to rank crates, so I didn't include it in the formula. |
As a frequent Stack Overflow contributor, I'm also fine with gamification in general. My main concerns are around meta-gaming: If people want "100% doc coverage", they can just add Is this currently a problem in our community? I don't believe so. Will it be in 6, 12, 24, 36 months? That's harder to answer. Perhaps the right risk to take is to try out something now that can be changed later, based on new information gained along the way. |
- Maintenance | ||
- Quality | ||
|
||
Feeding those signals are related measures of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Feeding” in this sentence seems to make no sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It refers to this idiom: http://idioms.thefreedictionary.com/feed+into
where there isn't really one. We've considered using letter grades, but those | ||
often have emotional associations (F means you're a failure), when it should be | ||
just an indicator of reality and not a value judgment. So we're also proposing | ||
an option of an emoji scale and are open to other proposals: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a 5-star system would be more easily recognizable and fit the same level of coarseness.
<tr> | ||
<th>Crate</th> | ||
<th>Downloads all time (~year)</th> | ||
<th>Downloads in last 90 days (~6 mo)</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 months is 180 days not 90
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<th>Crate</th> | ||
<th>Releases in last year</th> | ||
<th>Releases in last 6 mo</th> | ||
<th>Releases in last 1 mo</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that 3 months serves the same purpose as 1 month here? Rating based on releases/commits in a 30 day period punishes projects where the maintainers take vacations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also mean that projects that release on the same cycle as rust itself (6 weeks) would see this score fluctuate wildly.
@shepmaster I might have overseen something then. I thought the ranking here was manual. |
So the ranking that survey responders made was entirely manual based on their own metrics. We ranked these 5 crates pseudo-manually as well, to show how our proposed formula would automatically rank them. This RFC's goal is to not create extra work for anyone on an ongoing basis; the "Alternatives" section does mention that we could add curation or social voting/rating/reviewing features instead. Does that clear anything up? I'm not sure exactly which statement of @shepmaster's you're replying to... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to at least have HTML interjections gotten rid of
<td>F</td> | ||
<td>🌧</td> | ||
</tr> | ||
</table> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very obnoxious for readers of the plain-text form. Please use one of the supported markdown table formats instead:
| Percentage | Letter grade | Emoji |
| ----------- | ------------ | ----- |
| ≥ 90% | A | ☀️ |
| 80-89% | B | 🌤 |
| 70-79% | C | ⛅️ |
| 60-69% | D | 🌥 |
| 50-59% | E | ☁️ |
| ≤ 49% | F | 🌧 |
which would end up looking like this:
Percentage | Letter grade | Emoji |
---|---|---|
≥ 90% | A | ☀️ |
80-89% | B | 🌤 |
70-79% | C | ⛅️ |
60-69% | D | 🌥 |
50-59% | E | ☁️ |
≤ 49% | F | 🌧 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m considerably against using any letter association, mostly because they have no meaning to me but I’d also would HATE the academic-gradey feeling it has.
Emoji is a bad idea primarily because I, with a good sight, am having great trouble discerning between 3 of the pictures and rain doesn’t necessarily associate with bad to me nor do I find sunny to associate with “perfect”.
Some alternatives that come to mind:
- colour coding; is terrible because of colour blindness concerns;
- instead of binning crates into buckets, just explicitly note qualities a crate has: “popular parsing crate”, “credible serialization crate”, “easy to use high performance computing crate” etc. While not every crate would receive a nice label, you’d achieve similar results and avoid issues with developers of crates which do not make the cut.
- Even then if you insist on bucketing, just write out the percentage explicitly. Beats either of the letter or emoji-coding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on all that, never used letter rating anywhere in my life and I have no idea what the emoji are really conveying. Also the >= 90% looks completely different from than the others on my machine.
What's wrong with a 5 stars rating if you really want to show some kind of rating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very obnoxious for readers of the plain-text form. Please use one of the supported markdown table formats instead:
Sorry about that! It was easier for me to write, without having to worry about the column widths, and I thought it wouldn't matter since HTML is valid Markdown. I'll convert them tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column widths don't actually matter for markdown (though for plain text readers of course it's quite helpful)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of binning crates into buckets, just explicitly note qualities a crate has: “popular parsing crate”, “credible serialization crate”, “easy to use high performance computing crate” etc. While not every crate would receive a nice label, you’d achieve similar results and avoid issues with developers of crates which do not make the cut.
Maybe instead of a third party giving a crate a label which would most likely be argued ad infinitum, crate owners/creators themselves could be given the option to tag their crate as something like; "Production ready" or "work in progress, intended for production" or "not intended for production". So if you searched for serialization crate you'd only get hits like Serde and not something that was never intended to be used by anyone else. Better to completely forget about the abandoned, unusable ones than have them clog up the search results where the majority have a tag of "not unusable" or "abandoned" and only ones tagged for production get reviewed for quality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the crate owner declare the current project state as you describe, would allow for a better solution to the varying activity problem. For instance a project declared as "work in progress" should be getting more activity than a crate that has been in "maintenance" for a year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for whether grade, letter or cloud emoji,
I think we all could strive to get 5 Krustys! 🦀🦀🦀🦀🦀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For accessibility and clarity, please don't use the emoji as the primary indicator. I think I'd rather see separate badges for each point, not a fuzzy representation of the aggregate score.
top-level items are extremely well documented, 170/195 or 87%. Our | ||
definition of "top-level" counts the overall crate as an item. We think our | ||
doc coverage POC can be modified to report this number. | ||
- Would need to unpack and run this on each package version in a background |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky due to #[cfg()]
. docs.rs currently is building every single crate fully, although unnecessarily, but cargo model isn’t exactly compatible with desire to avoid running arbitrary code in order to figure out documentation for a crate.
job started by a publish; then save the percentage in crates.io's database. | ||
|
||
- In the crate root documentation, presence of a section headed with the word | ||
"Example" and containing a codeblock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about “Examples”, “Use cases”, “Common usage” and similar synonymous phrases? What if example is, say, for some GUI tool provided by crate (e.g. video tutorial demonstrating use of an UI builder of some sort)?
opportunity to encourage at least one to be present reliably. | ||
- Increases the doc percentage score by 5% | ||
|
||
- Presence of files in `/examples` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment. What if people put their “examples” into directories with semantically equivalent meaning? What if the example files are in root of the repository? In src/
and linked to from documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
examples/
is already a standardized directory, as cargo run --example <name>
runs examples/<name>.rs
.
[cargo doc-coverage]: https://crates.io/crates/cargo-doc-coverage | ||
[examples]: https://github.com/rust-lang/cargo/issues/2760 | ||
|
||
<table> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another HTML table T_T
- Number of releases in the last year - 10% | ||
- Number of releases in the last 6 mo - 30% | ||
- Number of releases in the last month - 60% | ||
- Yanked versions are not counted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the essentially “completed” utility crates such as, say, wrapper crates for machine instructions? Lack of releases does not imply lack of maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is second this concern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this one is tough. There are a number of small, focused crates that will rarely ever get an update, but are good crates and should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such crates would be compared against alternatives for the same purpose, right? It doesn't seem like a big issue, because all of them would have similar scores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any competition in that space? Are we worried about how libc
or winapi
are going to be ranked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use turnover/churn as a metric of goodness or badness. Some crates have frequent releases due to instability and churn; other crates have frequent releases because of regular maintenance. Some crates have rare releases because nobody maintains them; other crates have rare releases because they do their job and don't need further changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
|
||
- Stable version number | ||
- >= 1.0.0 ranks higher than < 1.0.0 | ||
- >= 1.0.0 increases the maintenance score by 5%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to see benefit of this scoring trigger. It is trivial to set the version number to 1.0.0
on first release and publish breaking releases everyday by increasing the left-most digit instead.
The best this will achieve is reinforcing the crates.io ecosystem to move from the currently quite common 0.x.y
versions to x.y.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a big proponent of this. And if it doesn't matter, and 0.x.y
is the same as x.y.0
, then I don't know why there's an objection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be ideal to downrank crates with overfrequent major version increments, but that could encourage breaking changes in 1.x releases, which is the opposite of what we want.
hover, with a link to a more thorough explanation. We like the information | ||
density in the way [npms][] displays scores: | ||
|
||
<img src="http://i.imgur.com/yadRNyy.png" src="Example npms score circles" width="442" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Markdown syntax please. At least you’ll avoid nonsensical HTML where alt text goes into the src
attribute :)
![Example npms score circles](http://i.imgur.com/yadRNyy.png)
Very well written RFC. Thank you. |
Some ideas of improvement:
I think overall, the score shouldn't be something overemphasized, as the methodology is quite flawed in comparison to manual involvement, like the manual moderation of stack stack overflow. |
The big concern with gamification is 'min/maxing' behavior. We don't want people writing crappy doc comments or releasing pointless updates just to keep their 'score' up. Obscuring exactly what your "score" is can help avoid people seeing that if they released another version, they could go up a ranking. About 10 years ago I played a great online game called Puzzle Pirates which did its best to discourage min/maxing by obscuring your raw score with fuzzy statements; all the games were ranked with an ELO system but none of them showed your score. I think this is pretty effective.
These needn't just be a raw equivalence to the ranking algorithm but could even have a certain degree of 'badginess' like there could be a special note about being above 1.0. This gamification creates a tension with everything we measure, of course. For example, one thing I value in a crate is infrequent breaking changes, but if we measure that we might encourage users to release breaking changes using non-breaking semver! (We could give people an interface for "committing to semver," given them a 'badge' and upranking crates with that commitment. This doesn't need to be policed - like taking their badge away if the violate semver - but it creates additional buy-in to the idea that following semver matters.) |
I was summoned here via IRC. Unicode has major screen reader concerns, to wit that you can't expect it to read. Of the options proposed by @nagisa, just showing the percent is the one I'd go for. If we want to use symbols, the only way to make them reliably accessible is to use graphics with alt text. While I wish I did live in a world with full unicode awareness from my AT, this is not going to happen anytime soon. |
@tomaka I don't know if you can really assume a crate pre-1.0 that doesn't change is unmaintained, at least without looking at its dependencies. For example I wrote a PRNG crate and I don't think that I should go 1.0 because the Rand crate isn't 1.0 yet. It doesn't make sense for me to go "stable" if my dependencies are unstable. So my crate doesn't change much any more. |
@mgattozzi I'm not claiming it's a magic bullet in any way. I agree it's hard, but I believe it's easier than the manual approach. |
Please open a new RFC if you'd like to propose adding machine learning to recommend crates; I won't be changing this RFC to add it.
I'm not going to spend time implementing and maintaining a data collection mechanism if we don't have a specific use for that data :-/ |
To echo what others have said, machine learning would constitute an entirely separate proposal, with a lot of its own complexities and downsides. As I emphasized in the motion to FCP, we can and should take an iterative approach here, and the RFC as it currently stands is something relatively straightforward to implement, that we can land within the constraints of this year's roadmap. I think it's also a clear improvement over the status quo, and has been through a number of rounds of iteration in this discussion. The RFC includes extensive research, both in terms of prior art, and in terms of what the Rust community specifically values. It also lays out several assumptions/design constraints at the outset. After these many rounds of discussion, I personally think it's reached a quite reasonable point in the design space balancing between the goals suggested by the research, technical feasibility for a first-cut implementation, and various concerns about gaming and proxy measures. In the interest of trying to drive things toward some kind of conclusion here, I think it's best to keep discussion to any final concerns about (1) the takeaways from the research or (2) the proposed design, in terms of its balance between the research-derived goals and what makes sense to try in a first-round effort here. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Thanks @carols10cents for the great (and much needed) RFC! This is a really long thread, so I just wanted to post a few links to the summary posts scattered therein for those just reading the thread now (like me)... @ruuda's comments here and here summarize the majority of the thread pretty well, I think. @aturon's FCP summary of the RFC is here. I was also intrigued by scattered proposals to take influence from ranking systems already in common use, such as Amazon, StackOverflow, and GitHub stars/trending. I won't bikeshed here, but I think these are worth considering for inspiration as this goes through iterations... |
If the goal is that "Rust should provide easy access to high quality crates" then badges and ranking are really just bells and whistles. The guts of the problem is to find a way to present information about all relevent crates so that a potential user can evaluate them easily. If the front page of crates.io had a keyword search that produced a list, ranked by the last 90 days downloads, with a a five line explanation drawn from the top of README.md, this would be ideal for me. The keyword search might give me 10 or 20 options, and at 5 lines each it wouldn't be much trouble at all to scan through. The first 5 lines would probably give a good general indication of quality of the documentation. |
The rendered RFC only talks about Coveralls. I personally have had a lot of issues with Coveralls configuration and recently have moved to Codecov in all my projects. They even provide Rust templates, and of course, badges. |
Interesting! I didn't know about Codecov. The implementation will definitely be extensible-- for example, someone sent in a PR to add a badge for GitLab CI status to go along with the Travis and Appveyor badges, and it was a pretty straightforward change. |
The final comment period is now complete. |
Huzzah! This RFC has been merged, and is tracked here. As has been emphasized throughout, this design is a starting point, and we'll expect to do follow-up evaluation and iteration. Thanks everyone who participated in this in-depth discussion, and especially @carols10cents for writing such a thoughtful RFC and sticking it out through 258 comments :-) |
Couple of thoughts. A crate should receive a higher score based on the number of other crates on crates.io that depend upon it. A create should receive a lower score based on the number of other crates it depends upon. Depending on lower-ranked vs higher-ranked crates should have a larger negative impact on a crate's rating. Having higher-ranked crates vs. lower-ranked crates depending upon a crate should have a larger positive impact on the ranking of the crate. |
That would be really biased in favor of libraries and against binaries.
…On Jun 4, 2017 12:52 AM, "Gerald E Butler" ***@***.***> wrote:
Couple of thoughts. A crate should receive a higher score based on the
number of other crates on crates.io that depend upon it. A create should
receive a lower score based on the number of other crates it depends upon.
Depending on lower-ranked vs higher-ranked crates should have a larger
negative impact on a crate's rating. Having higher-ranked crates vs.
lower-ranked crates depending upon a crate should have a larger positive
impact on the ranking of the crate.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1824 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIazwF43X093pjJTaOz6w2Bjfi7f36b-ks5sAjgkgaJpZM4LRJMT>
.
|
Those two categories should be separate anyway. Unless you are searching the database just for fun (or reviewing, or whatever), you either want specifically a library, or specifically an application. I think @gbutler69 has the right idea. |
But the score interaction would have to be broken up by individual influences, or otherwise stratified. Naive implementation could lead to nasty feedback cycles. |
@gbutler69
Agreed.
This will encourage bad practices:
In other words, this will encourage people to work against the crate ecosystem, not work with it.
Again. This will encourage bad practices. For example, this will discourage people from releasing modular crates. Fearing the lower-level crates will have a low-score which will reflect poorly on the higher-level ones.
Could be useful. But this might have complex interactions with other factors. And the implementation details are not clear. For example:
I think the straight forward factors I took into consideration in |
Yes, since this RFC has been merged, large changes proposed to this should get a new RFC. Small changes can be proposed in an issue on crates.io's repo, and the re-evaluation and iteration discussions will probably take place on crates.io's repo as well. I'm unsubscribing from this thread, thank you all for your discussion! |
I wanted to pass along a comment I saw in a private forum discussing Rust:
|
Rendered!
@shepmaster worked on this too :)