Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup pre-commit run by excluding python files from trailing hooks #11777

Conversation

Avasam
Copy link
Collaborator

@Avasam Avasam commented Apr 17, 2024

trailing-whitespace and end-of-file-fixer are handled by both black and ruff check --fix
This reduces the amount of files these hooks have to run on (by a lot considering typeshed is mostly sutbs), and even skips the hooks entirely when running protobuf generation scripts or as a pre-commit hook on changed files.

trailing-whitespace and end-of-file-fixer are handled by both black and ruff check --fix
This reduces the amount of files these hooks have to run on, and even skips the hooks entirely when running protobuf generation scripts or as a pre-commit hook on changed files.
@@ -3,7 +3,11 @@ repos:
rev: v4.5.0 # must match requirements-tests.txt
hooks:
- id: trailing-whitespace
# speedup hook by leaving python files to Black & Ruff
exclude: .*\.pyi?$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black does not necessarily fix trailing whitespace in string literals

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would only be multiline string literals right?
Do we even have those in stubs? Given we don't have docstrings (outside maybe generated protobuf files)

After checking, yes we do, in deprecated multiline messages, but it seems think Ruff does catch those, although it's considered an "unsafe-fix" (which is true and why black doesn't touch them):
image

That being said, if it was removed anyway by another pre-commit hook, I can just add W291 to our unsafe fixes hook.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that note, @AlexWaygood any reason not to use https://docs.astral.sh/ruff/settings/#lint_extend-safe-fixes instead of a separate pre-commit hook?

Copy link
Member

@AlexWaygood AlexWaygood Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason other than that I forgot about that option, I guess 😄

@AlexWaygood
Copy link
Member

AlexWaygood commented Apr 17, 2024

Do you have a rough estimation/numbers for how much time these hooks take as a % of the total time it takes for pre-commit run -a to run? Or how much this speeds things up

@Avasam
Copy link
Collaborator Author

Avasam commented Apr 17, 2024

Do you have a rough estimation/numbers for how much time these hooks take as a % of the total time it takes for pre-commit run -a to run?

I'm currently heading out, but I can try to whip something up later to have actual numbers rather than just relying on "what seems to make sense"

@JelleZijlstra
Copy link
Member

Did you get any numbers?

My intuition would be that these hooks are really fast and the extra complexity in our configuration is not worth the minor performance saving.

@Avasam
Copy link
Collaborator Author

Avasam commented May 10, 2024

Note that in all my tests, I've ignored the first run to let the cache populate correctly, and avoid issues where a first run fixes EOL from switching back & forth between branches on Windows vs WSL.
"all files" command: (Measure-Command { pre-commit run --all }).TotalSeconds

This PR main
All hooks
all files
39.2376706
38.665187
40.8236747
38.5270161
38.1521478
41.0012081
Only the 2 modified hooks
all files
1.6530308
1.6606674
1.8906664
2.5119982
2.5735033
2.7313761

The difference is indeed really small. But I come to realize why I had the impression the first hook was slow: Under WSL pre-commit is just extremely slow to get going, and it made it look like the first hook was really slow. The pre-commit command at the end of the proto scripts nearly 2m to run. --all with all hooks takes over 6m with black taking the vast majority of that time.

"after tensorflow" command: time (pre-commit run --files $(git ls-files -- "stubs/tensorflow/**_pb2.pyi"))

This PR main
All hooks
all files
real 5m37.778s real 5m42.821s
Only the 2 modified hooks
all files
real 0m23.215s real 0m50.211s
All hooks
after tensorflow
real 0m55.892s real 1m4.777s

(the variance between runs, even after ignoring the first one, is also pretty bad)

Anyway, I'll close with

the extra complexity in our configuration is not worth the minor performance saving.

and consider finishing porting the proto scripts to Windows git bash (or as Python script) so I don't have to run them under WSL.

After looking it up, it's apparently a known thing that in WSL2 specifically, performance for access across OS (ie files under mnt/<drive letter>) is really bad.
Ref microsoft/WSL#4197 & https://learn.microsoft.com/en-us/windows/wsl/compare-versions#comparing-features

@Avasam Avasam closed this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants