Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope churn calculation to provided paths #419

Merged
merged 2 commits into from
May 13, 2023

Conversation

Fito
Copy link
Contributor

@Fito Fito commented Nov 23, 2022

The churn calculation was hard-coded to .
This meant that rubycritic would run a git log command for the entire repository regardless of the paths specified with running the rubycritic command.

Running git log for large codebases with many years of commits can take over 30 minutes. This is particularly wasteful in mono-repositories with distinct projects that might be better analyzed separately (large parts of the repository might not even be Ruby code!)

The time to generate a report decreases significantly when scoping churn to the paths provided to the rubycritic command.

Other things to note:

  • Git log's --follow option can only handle a single path at a time. Therefore we need to run multiple git log commands when multiple path arguments are provided. The results are still merged together.

  • The initializer for SourceControlSystem::Git::Churn was already at the maximum number of statements allowed. I extracted renames into a memoized method to make room for the new parameter.

  • Added tests for SourceControlSystem::Git#churn to ensure SourceControlSystem::Git::Churn is instantiated with the correct options. This also adds a bit of coverage on Config.paths needing to exist.

Check list:

Copy link
Collaborator

@etagwerker etagwerker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fito Thanks for submitting this PR! I'm curious about these changes. How do they work in terms of the existing ways of specifying a path in the command line or in the ruby critic configuration file? It seems that there is a disconnect there.

Could you please explain a little bit about that?

@@ -30,7 +30,7 @@ def self.to_s
end

def churn
@churn ||= Churn.new(churn_after: Config.churn_after)
@churn ||= Churn.new(churn_after: Config.churn_after, paths: Config.paths)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fito I'm not sure about this. How does this work with the .rubycritic.yml configuration file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etagwerker This change works both for the paths options defined in .rubycritic.yml and for paths provided via the command line.

The RubyCritic::Cli::Options#to_h method merges all options provided via the command line and via the .rubycritic.yml configuration file. here

RubyCritic::Cli::Application instantiates an RubyCritic::Cli::Options object and calls to_h on it before passing it into RubyCritic::CommandFactory.create to execute the reporter here.

The RubyCritic::CommandFactory sets the options hash on the global config object via RubyCritic::Config.set(options) here.

This means Config.paths will contain the paths provided to RubyCritic via either CLI or Config File options.


def set(options)
self.mode = options[:mode] || :default
self.paths = options[:paths] || ['.']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fito Maybe this should default to this?

Copy link
Contributor Author

@Fito Fito Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etagwerker the path (singular) option currently refers to the path where the reports will get written. The paths (plural) option refers to the paths to be analyzed by RubyCritic. There might be an opportunity to rename either of those concepts for clarity, but it feels outside the scope of this change. The names were already in the codebase.

@Fito
Copy link
Contributor Author

Fito commented Nov 29, 2022

@etagwerker Thank you for reviewing the changes! I replied to your comments on specific changes. The high-level answer is that this change respects and works with existing ways of specifying paths to be analyzed. My change is only concerned with the paths option which refers to paths in the given codebase that should be analyzed. Example:

paths: # Files to analyse.
  - 'app/controllers/'
  - 'app/models/'

The changes are not concerned with the singular path option which refers to the path where RubyCritic reports are written. Example:

path: '/tmp/mycustompath' # Set path where report will be saved (tmp/rubycritic by default)

Let me know if you have any other questions!

@Fito Fito requested a review from etagwerker November 30, 2022 00:24
@Fito Fito force-pushed the churn-specific-path branch from 608743b to 866169a Compare December 1, 2022 05:19
@Fito
Copy link
Contributor Author

Fito commented Dec 1, 2022

I pushed a small change to address a couple of Rubocop violations my commit added.

@Fito
Copy link
Contributor Author

Fito commented Dec 3, 2022

The remaining CI failures don't seem related to my change. Is it normal for some checks to fail?

@Fito
Copy link
Contributor Author

Fito commented Dec 16, 2022

@etagwerker Do you have any more feedback on this?

@Fito Fito force-pushed the churn-specific-path branch from eedb0c0 to 46e6640 Compare February 25, 2023 04:04
Copy link
Collaborator

@etagwerker etagwerker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fito I think the changes look good. The last thing I would ask is that you document this new parameter somewhere in the README.

Considering this PR adds a new parameter that is used only with the Git SCM, I think it should be clear that this parameter will not have any effect if you are using another SCM (e.g. Mercurial)

Other than that it looks good 👍🏼

@etagwerker etagwerker force-pushed the churn-specific-path branch from 46e6640 to 1409edc Compare April 29, 2023 18:04
@etagwerker
Copy link
Collaborator

@Fito I just resolved the conflict in this PR but now it seems to be failing in a couple scenarios that are related to the new parameter. Could you please take a look? Thanks!

@Fito
Copy link
Contributor Author

Fito commented May 2, 2023

@etagwerker Thanks for resolving the conflicts, and for the suggestion about the README. I will take a look at the failing scenarios.

The churn calculation was hard-coded to '.'
This meant that rubycritic would run a `git log`
command for the entire repository regardless of
the paths specified with running the `rubycritic`
command.

Running `git log` for large codebases with many
years of commits can take over 30 minutes. This
is particularly wasteful in mono-repositories with
distinct projects that might be better analyzed
separately (large parts of the repository might
not even be Ruby code!).

The time to generate a report decreases significantly
when scoping churn to the paths provided to the
`rubycritic` command.

Other things to note:
- Git log's --follow option can only handle a single
  path at a time. Therefore we need to run multiple
  git log commands when multiple path arguments
  are provided. The results are still merged together.

- The initializer for SourceControlSystem::Git::Churn
  was already at the maximum number of statements
  allowed. I extracted `renames` into a memoized
  method to make room for the new parameter.

- Added tests for SourceControlSystem::Git#churn
  to ensure SourceControlSystem::Git::Churn is
  instantiated with the correct options. This also
  adds a bit of coverage on Config.paths needing
  to exist.
@Fito Fito force-pushed the churn-specific-path branch from 1409edc to eea9528 Compare May 7, 2023 08:38
@Fito
Copy link
Contributor Author

Fito commented May 7, 2023

@etagwerker I fixed the failing test and added a bit more about the "paths" option to the README.
I want to clarify though that this is NOT at new parameter. The "paths" parameter was already part of the gem before my change, you can see it documented in the Readme on the main branch -- However, it was getting ignored by the churn analyzer for Git. This commit makes the analyzer use the option when provided. Does that make sense?

Thanks again for the review and for fixing the merge conflicts! Please let me know if you have any other questions.

@Fito Fito requested a review from etagwerker May 7, 2023 08:45
Copy link
Collaborator

@etagwerker etagwerker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fito Thank you! 👍🏼

@etagwerker etagwerker merged commit 3efb884 into whitesmith:main May 13, 2023
@Fito
Copy link
Contributor Author

Fito commented May 16, 2023

Thanks for reviewing this @etagwerker !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants