-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a "Well documented" badge for crates in the top 20% #703
Comments
Ok, so in thinking about the implementation of this a bit more I'm starting to wonder if this is worth implementing right now. It might be, but I have concerns. So implementing this as proposed requires looking at all the files in a Theoretically, once we implement however we're going to count documentation, that computation shouldn't change (at least I don't expect it to change as often as README rendering does). So I don't think we'll need a "re-assess documentation" script, or at least it won't be as essential as the render-readmes script is. Ideally, processing a version to count its documentation should mean getting its I don't necessarily want this to happen on the publish thread, though, and if this calculation fails, it shouldn't reject the publishing of a crate. So perhaps what we need is a script that we'd schedule to run every so often (every few hours? once a day?), finds all the versions that don't have this data, computes it, and stores it. Thinking longer term, though, there are potentially other analyses we'd like to do on the files in a crate, and it'd be nice if we only had to have/untar the TL;DR I'm struggling with whether adding the complexity of this feature to crates.io's codebase, or adding the complexity of support for this feature and others like it to crates.io's codebase, is a good idea at this point in crates.io's scale/maturity or not. |
So I've taken a look at this issue in RFC 1824, and the degree to which Rust code is well document is a score out of 1, with scores closer to 1 being better document. The calculation in the RFC was: ( @carols10cents, I assume the LOC in Rust includes blank lines? I'm a bit worried that the RFC uses a precision of 2, I feel it should be higher, maybe we could even use Regarding your points @carols10cents:
Think we could leave the recalculation script out of scope? Or did we want that as a task in this issue?
Me being me, I was thinking we could run this script or service every time we published a crate to keep the metrics up to date ASAP :). Would that be a tad ambitious? I can already see an issue in that if we continuously update the DB with scores, and new versions and crates are being published past a certain rate, that'd mean that it becomes difficult to impossible to keep a real time tracking of which crates are in the 20% since there's new information changing that are just above the cutoff point for measuring this.
I would like to learn how to set up a microservice :). That and perhaps it'd make the crates.io backend less monolithic? However, would this also mean that it becomes harder to set this up on private crates.io instances and local development environments? If this is a problem, maybe we could use Docker or another containerisation technology to manage this for us. Then again, a senior architect once told me of the YAGNI philosophy (you ain't gonna need it), which could be applied to this microservice idea. What other analyses were you thinking of running on extracted codebases? npm's followers idea sounds a bit like this issue regarding push infrastructure for crates.io :). Would you say there's a link? Hmm this task is starting to look bigger than I thought it'd be :). And yes, I'd be happy to take this task up once we've decided what needs to be done :). |
I just finished my last exam :). What were you thoughts on this so far @carols10cents? |
So it's been pointed out to me in IRC by insaneinside that the core logic is prone to being gamed by uploading a project with many empty doc comments, and that a potential way around that is to potentially strip the spaces out and then see if there is still anything in the line. However, this can be circumvented by someone uploading files with gibberish doc comments e.g. Lorem Ipsum. |
Until we see evidence of crates gaming this, do you think it's worth worrying about at all? Any crate that has a bunch of empty lines as documentation, or lorem ipsum is going to end up avoided either way |
I don't think we're going to end up doing this. |
Part of RFC 1824, rust-lang/rust#41616.
"Well Documented" badge implementation
For each crate published, in a background job, unpack the crate files and
calculate the ratio of lines of documentation to lines of code as follows:
grep -r "//[!/]" --binary-files=without-match --include=*.rs . | wc -l
find . -name '*.rs' | xargs wc -l
We would then add the lines in the README to the lines of documentation,
subtract the lines of documentation from the total lines of code, and divide
the lines of documentation by the lines of non-documentation in order to get
the ratio of documentation to code. Test code (and any documentation within
test code) is part of this calculation.
Any crate getting in the top 20% of all crates would get a badge saying "well
documented".
Please let me know if you have any questions, potential implementers!
The text was updated successfully, but these errors were encountered: