Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Innaccuracies with language plugins stats #531

Closed
lowlighter opened this issue Sep 9, 2021 · 6 comments
Closed

Innaccuracies with language plugins stats #531

lowlighter opened this issue Sep 9, 2021 · 6 comments

Comments

@lowlighter
Copy link
Owner

While investigating on #513 I noticed some flaws in the languages plugin.

It is actually not very well defined whether it should count your effective language stats (meaning stats from the current state of each repo) or cumulative language stats (meaning stats should include previous state of each repo, including code that has been refactored, moved or deleted).

Currently, the languages indepth analyzers is doing the following:

  1. Clone locally repository
  2. Call linguist on it
  3. git log --patch on each user commit to find which lines were added by user and update bytes count using detected language by linguist
  4. Remove locally cloned repository

For now, it leans more toward cumulative stats rather effective stats.
Since tracking deleted files is currently impossible because they don't exist when linguist analyze the repository (so detected language for a deleted file is always null), deleted/moved files are never taken into account so some lines are actually lost

Also renamed file may be counted multiple times, need to check whether an option exists to detect them (maybe --follow?)

@Nixinova
Copy link
Contributor

Nixinova commented Sep 20, 2021

Other issues:

Screenshot_20210920-182056

@Napolitain
Copy link

Napolitain commented Sep 26, 2021

Hey,

I have inaccuracies in the depth stats, as we can guess from this image.
Capture d’écran 2021-09-26 à 19 12 07
Out of 1155 commits, plugin looks at only 186.

My token has those rights.
Capture d’écran 2021-09-26 à 19 57 29

Normally, it should show much more C# than it does at the moment. I don't think the token is lacking (even tho I suspect it is could be a token issue since I use organizations and private repositories a lot for school).

What do you think?

@lowlighter
Copy link
Owner Author

@Napolitain Could you try setting repositories_affiliations to owner, collaborator, organization_member ?
Organization repositories should be displayed with this configuration
Private repositories are indeed impacted depending on the token rights
It may improve the accuracy of your stats, apart from points mentioned above in this issue 🙂

@Napolitain
Copy link

@lowlighter I have tried regenerating token before I saw your response, which turned out to be an error (it logs less total commits), then tried your suggestion. It is clearly better, but still lacking commits.
Capture d’écran 2021-09-28 à 11 58 07

Now I'm at 345 commits analyzed of 955 total.
Should the plugin_indepth parameter asks metrics to look at all commits ever ? Or it has a limit behind the hoods ?

@Napolitain
Copy link

Ok so I removed "indepth" and it seems better representative of the repositories.

I think that's linked to what you describe in this issue, as it shows only current state. And I think it reads the last commit of each file (or something like that) so only few lines are registered (and not the whole file).
A typo fix would then count as 1 line for a whole file.

@lowlighter
Copy link
Owner Author

Superseded by #857

@github-actions github-actions bot locked and limited conversation to collaborators Feb 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants