Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typos: unset pass_filenames #387

Merged
merged 1 commit into from
Jan 12, 2024
Merged

typos: unset pass_filenames #387

merged 1 commit into from
Jan 12, 2024

Conversation

phip1611
Copy link
Contributor

Typos is supposed to run on the whole tree. If this is set to true, the system gets stuck for large projects due to very high memory consumption. The restriction on with files typos run, should be specified in the typos config file.

Tested this locally with a large project.

Typos is supposed to run on the whole tree. If this is set to true,
the system gets stuck for large projects due to very high memory
consumption. The restriction on with files typos run, should be
specified in the typos config file.
@domenkozar
Copy link
Member

Could we link to some upstream issue about memory consumption?

@phip1611
Copy link
Contributor Author

phip1611 commented Jan 11, 2024

Relevant upstream issues/PRs are:

TL;DR: typos is 26MiB in size. In a large project of mine (thousand of files), my system always freezes without pass_filenames = false. I suppose that dozens to hundreds processes are spawned in parallel. Unfortunately, it was not possible to see how the typos utility is effectively invoked when pass_filenames = true. And pre-commit run --verbose doesn't print that. My guess is that hundreds of processes are started.

I didn't investigate it with strace or so. Just looked for a quick fix.
Other tools in this repo that are supposed to run on the tree such as rustfmt also set pass_filenames = false.

@domenkozar domenkozar merged commit 769a234 into cachix:master Jan 12, 2024
4 checks passed
@phip1611 phip1611 deleted the typos branch January 12, 2024 09:09
@phip1611
Copy link
Contributor Author

phip1611 commented Jan 12, 2024

Update. I did some more investigation and better understand what happens.

So again, the project is very huge. If I run $ typos ., it takes a couple of seconds, but that's it. When pre-commit runs typos with pass_filenames = true, pre-commit wants to pass all 74000 files of the project to typos. I guess there is some limit in pre-commit hooks so that it splits these huge amount of arguments to 8 instances of typos. This whole process takes ages and consumes more than 16GiB of memory.

I suppose a combination of the relatively big memory usage of typos in combination with the ridiculous amount of parameters passed to it, results in an enormous amount of memory usage.

TL;DR: It is correct to use pass_filenames = false, as typos is supposed to run on the whole tree and excludes are specified in .typos.toml. This aligns the behavior of typos via $ pre-commit run with $ typos ., if the right config file is passed.

@totoroot
Copy link
Collaborator

@phip1611 I don't think this change is desirable. While I understand the issue at hand, changing the default behaviour of the hook to align with the call of typos . is not a good idea IMO.

I disagree with your statement "Typos is supposed to run on the whole tree.". pre-commit hooks are supposed to be run on changes that are being committed, not for the whole repository every time you commit changes.

If you wished to run the pre-commit hook on the whole directory/repository, you always had the option to run pre-commit run typos --all.

The way the hook is set up now, there is no option to just check a single file that you'd like to commit and ignore the rest of the repository.

This change 'broke' the typos hook in one of our repositories, as now all files get checked, even when just wanting to commit a single new file. This repository had hundreds of files with typos in them and while it is planned to check and fix them eventually, the hook should not care about any files beside the one that I'm trying to commit.

After all, it is a pre-commit hook.

@domenkozar I'd suggest that this gets reverted and we discuss how we can otherwise fix the issue that @phip1611 was running into.

@totoroot
Copy link
Collaborator

totoroot commented Jan 16, 2024

Update. I did some more investigation and better understand what happens.

So again, the project is very huge. If I run $ typos ., it takes a couple of seconds, but that's it. When pre-commit runs typos with pass_filenames = true, pre-commit wants to pass all 74000 files of the project to typos. I guess there is some limit in pre-commit hooks so that it splits these huge amount of arguments to 8 instances of typos. This whole process takes ages and consumes more than 16GiB of memory.

I suppose a combination of the relatively big memory usage of typos in combination with the ridiculous amount of parameters passed to it, results in an enormous amount of memory usage.

TL;DR: It is correct to use pass_filenames = false, as typos is supposed to run on the whole tree and excludes are specified in .typos.toml. This aligns the behavior of typos via $ pre-commit run with $ typos ., if the right config file is passed.

@phip1611 I don't quite understand why pre-commit would be running on 74000 files in the first place...

Were you committing the whole project at once? By default pre-commit passes only files to each configured hook that are about to be committed, so it should not care about the rest of your sizable repository, as long as you do not run pre-commit run --all.

If you indeed wished to check all files in your repository, I guess you have to expect it taking a while and it consuming quite a lot of resources. After all, that is something that should not be necessary for every commit. If you know that it is quicker to run typos . on the whole repository, then why not do that, commit all those fixes once and then run the pre-commit with pass_filenames set to true to only check newly committed files with changes going forward.

Running a pre-commit hook on the whole repository every time you commit is IMO a waste of resources and is not what a pre-commit hook is supposed to be for.

If you want to make absolutely sure, that your project at no point contains a typo, you will have to use a pipeline test running typos . anyways, as pre-commit hooks can always be skipped with git commit --no-verify.

TL;DR A pre-commit hook is supposed to only check files that are about to be committed and not the whole repository.

@phip1611
Copy link
Contributor Author

phip1611 commented Jan 16, 2024

In the meantime, I had more time to better understand the issue and thought about it again @totoroot. But first of all, I'm sorry if I broke your project's CI with that change.

Were you committing the whole project at once?

No. But I want to be able to run pre-commit run --all-files. In that case, typos should run the same as $ typos . --config .typos.toml. But I think we can combine both worlds.

We can set hooks.typos.excludes (pseudo code):

excludes =
          let
            configRaw = "...";
            config = builtins.fromTOML configRaw;
          in
          config.files.extend-exclude;

What do you think of that idea? I'd like to do this in a generic way and don't manually do it in my project. We might force typos users to provide the path to .typos.toml so that it is in the Nix store. Then we can extend the exlucdes. I think this might be a solution

Actually, I don't use git hooks because I don't like them as part of my workflow 🤣 . But pre-commit run --all-files is just such a handy way of running all style checks at once.

TL;DR A pre-commit is supposed to only check files that are about to be committed and not the whole repository.

I'd like to extend this to:

  • A pre-commit is supposed to only check files that are about to be committed and not the whole repository.
  • pre-commit run --all-files is just such a handy way of running all style checks at once. It should work like the default invocation of the tools

@phip1611
Copy link
Contributor Author

phip1611 commented Jan 16, 2024

Update. I have a solution that is compatible with all goals. What do you think? @domenkozar @totoroot

With that, users must specify either settings.typos.config or settings.typos.configFile but I think this is perfectly fine as this justifies a proper execution that works as expected in all scenarios.

diff --git a/modules/hooks.nix b/modules/hooks.nix
index f162b44..bf5689b 100644
--- a/modules/hooks.nix
+++ b/modules/hooks.nix
@@ -1111,12 +1111,11 @@ in
               '';
             };
 
-          configPath =
+          configFile =
             mkOption {
-              type = types.str;
-              description = lib.mdDoc "Path to a custom config file.";
-              default = "";
-              example = ".typos.toml";
+              type = types.path;
+              description = lib.mdDoc "Path to the config file.";
+              default = null;
             };
 
           diff =
@@ -2418,18 +2417,27 @@ in
           entry = "${settings.treefmt.package}/bin/treefmt --fail-on-change";
         };
       typos =
+        let
+          configAsFile =
+            if settings.typos.config != ""
+            then builtins.toFile "config.toml" settings.typos.config
+            else settings.typos.configFile;
+          excludesFromConfig =
+            let
+              toml = builtins.fromTOML (builtins.readFile configAsFile);
+            in
+              (toml.files or { }).extend-exclude or [ ];
+        in
         {
           name = "typos";
           description = "Source code spell checker";
           entry =
             let
-              configFile = builtins.toFile "config.toml" "${settings.typos.config}";
               cmdArgs =
                 mkCmdArgs
                   (with settings.typos; [
                     [ (color != "") "--color ${color}" ]
-                    [ (configPath != "") "--config ${configPath}" ]
-                    [ (config != "" && configPath == "") "--config ${configFile}" ]
+                    [ (true) "--config ${toString configAsFile}" ]
                     [ (exclude != "") "--exclude ${exclude} --force-exclude" ]
                     [ (format != "") "--format ${format}" ]
                     [ (locale != "") "--locale ${locale}" ]
@@ -2438,11 +2446,7 @@ in
             in
             "${tools.typos}/bin/typos ${cmdArgs}${lib.optionalString settings.typos.diff " --diff"}${lib.optionalString settings.typos.hidden " --hidden"}";
           types = [ "text" ];
-          # Typos is supposed to run on the whole tree. If this is set to true,
-          # the system gets stuck for large projects due to very high memory
-          # consumption. The restriction on with files typos run, should be
-          # specified in the typos config file.
-          pass_filenames = false;
+          excludes = excludesFromConfig;
         };
       typstfmt = {
         name = "typstfmt";

@Stunkymonkey
Copy link

Stunkymonkey commented Jan 26, 2024

I can report this PR breaks my config having:

pre-commit.settings.hooks.typos = {
    enable = true;
    excludes = [ "secrets\\.yaml" "\\.sops\\.yaml" ];
};

sure I can workaround with:

pre-commit.settings.settings.typos.exclude = "(secrets|\.sops)\.yaml";

But the excludes is always ignored. We should revert the change with pass_filenames.

@totoroot said correctly:

A pre-commit hook is supposed to only check files that are about to be committed and not the whole repository.

phip1611 added a commit to phip1611/pre-commit-hooks.nix that referenced this pull request Mar 12, 2024
Use excludes from .typos.toml when running `pre-commit run typos --all-files`.

Since we don't set "pass_filenames = false" for typos (see
upstream discussions [0]), we must ensure that pre-commit never
passes the files specified as excludes in `.typos.toml` to
`typos` as argument, if possible. Otherwise,
`$ pre-commit run typos --all-files` has another behaviour than
`$ typos .`, which is confusing and wrong.

To not break anything, and to be compliant with the existing
API, we encourage users to provide the `.typos.toml` as Nix
path, so the excludes can be read via evaluation time.

The `configAsFile` variable is the path where the configuration
file can be found by the `typos` utility when it executes in the
user environment. Not necessarily in Nix store.

[0]: cachix#387 (comment)
phip1611 added a commit to phip1611/pre-commit-hooks.nix that referenced this pull request Mar 19, 2024
Use excludes from .typos.toml when running `pre-commit run typos --all-files`.

Since we don't set "pass_filenames = false" for typos (see
upstream discussions [0]), we must ensure that pre-commit never
passes the files specified as excludes in `.typos.toml` to
`typos` as argument, if possible. Otherwise,
`$ pre-commit run typos --all-files` has another behaviour than
`$ typos .`, which is confusing and wrong.

To not break anything, and to be compliant with the existing
API, we encourage users to provide the `.typos.toml` as Nix
path, so the excludes can be read via evaluation time.

The `configAsFile` variable is the path where the configuration
file can be found by the `typos` utility when it executes in the
user environment. Not necessarily in Nix store.

[0]: cachix#387 (comment)
phip1611 added a commit to phip1611/pre-commit-hooks.nix that referenced this pull request Mar 19, 2024
Use excludes from .typos.toml when running `pre-commit run typos --all-files`.

Since we don't set "pass_filenames = false" for typos (see
upstream discussions [0]), we must ensure that pre-commit never
passes the files specified as excludes in `.typos.toml` to
`typos` as argument, if possible. Otherwise,
`$ pre-commit run typos --all-files` has another behaviour than
`$ typos .`, which is confusing and wrong.

To not break anything, and to be compliant with the existing
API, we encourage users to provide the `.typos.toml` as Nix
path, so the excludes can be read via evaluation time.

The `configAsFile` variable is the path where the configuration
file can be found by the `typos` utility when it executes in the
user environment. Not necessarily in Nix store.

[0]: cachix#387 (comment)
phip1611 added a commit to phip1611/pre-commit-hooks.nix that referenced this pull request Apr 2, 2024
Enable typos.configPath to be of type path. This way, we can
use excludes from .typos.toml when running
`pre-commit run typos --all-files`. Otherwise, the execution
differs from a normal invocation of typos, which is unexpected
and leads to wrong results.

To not break anything, and to be compliant with the existing
API, I modified configPath to be either a Nix path or a string.

[0]: cachix#387 (comment)
phip1611 added a commit to phip1611/pre-commit-hooks.nix that referenced this pull request Apr 2, 2024
Enable typos.configPath to be of type path. This way, we can
use excludes from .typos.toml when running
`pre-commit run typos --all-files`. Otherwise, the execution
differs from a normal invocation of typos, which is unexpected
and leads to wrong results.

To not break anything, and to be compliant with the existing
API, I modified configPath to be either a Nix path or a string.

[0]: cachix#387 (comment)
phip1611 added a commit to phip1611/pre-commit-hooks.nix that referenced this pull request Apr 8, 2024
Enable typos.configPath to be of type path. This way, we can
use excludes from .typos.toml when running
`pre-commit run typos --all-files`. Otherwise, the execution
differs from a normal invocation of typos, which is unexpected
and leads to wrong results.

To not break anything, and to be compliant with the existing
API, I modified configPath to be either a Nix path or a string.

[0]: cachix#387 (comment)
phip1611 added a commit to phip1611/pre-commit-hooks.nix that referenced this pull request Apr 19, 2024
Enable typos.configPath to be of type path. This way, we can
use excludes from .typos.toml when running
`pre-commit run typos --all-files`. Otherwise, the execution
differs from a normal invocation of typos, which is unexpected
and leads to wrong results.

To not break anything, and to be compliant with the existing
API, I modified configPath to be either a Nix path or a string.

[0]: cachix#387 (comment)
phip1611 added a commit to phip1611/pre-commit-hooks.nix that referenced this pull request Apr 19, 2024
Enable typos.configPath to be of type path. This way, we can
use excludes from .typos.toml when running
`pre-commit run typos --all-files`. Otherwise, the execution
differs from a normal invocation of typos, which is unexpected
and leads to wrong results.

To not break anything, and to be compliant with the existing
API, I modified configPath to be either a Nix path or a string.

The main motivation for this change is a repository on my machine
where pre-commit ignores directory foobar (which is excluded
by .typos.toml) but passes all 65,000 files of foobar as argument
to typos. In these situation, typos goes nuts and often the
system freezes.

So with this change, I prevent that possibly tens of thousands
of files that should not be checked in any case are passed to
typos, which results in a smooth experience.

[0]: cachix#387 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants