Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flake8 runs about 50% slower than pyflakes + pycodestyle #732

Closed
asottile opened this issue Apr 3, 2021 · 6 comments
Closed

flake8 runs about 50% slower than pyflakes + pycodestyle #732

asottile opened this issue Apr 3, 2021 · 6 comments

Comments

@asottile
Copy link
Member

asottile commented Apr 3, 2021

In GitLab by @andersk on May 13, 2020, 24:00

Please describe how you installed Flake8

pip install flake8 in a fresh virtualenv.

Please provide the exact, unmodified output of flake8 --bug-report

{
  "dependencies": [],
  "platform": {
    "python_implementation": "CPython",
    "python_version": "3.8.2",
    "system": "Linux"
  },
  "plugins": [
    {
      "is_local": false,
      "plugin": "mccabe",
      "version": "0.6.1"
    },
    {
      "is_local": false,
      "plugin": "pycodestyle",
      "version": "2.6.0"
    },
    {
      "is_local": false,
      "plugin": "pyflakes",
      "version": "2.2.0"
    }
  ],
  "version": "3.8.1"
}

Please describe the problem or feature

I would expect flake8 (with default configuration) to run about as fast as pyflakes plus pycodestyle. However, it turns out to be about 50% slower. Here’s an easily reproducible benchmark that demonstrates this discrepancy.

$ seq 200000 > s.py

$ 'time' pyflakes s.py
6.18user 0.14system 0:06.33elapsed 99%CPU (0avgtext+0avgdata 329016maxresident)k
0inputs+0outputs (0major+145654minor)pagefaults 0swaps

$ 'time' pycodestyle s.py
21.49user 0.00system 0:21.52elapsed 99%CPU (0avgtext+0avgdata 27088maxresident)k
0inputs+0outputs (0major+5587minor)pagefaults 0swaps

$ 'time' flake8 s.py
42.50user 0.14system 0:42.69elapsed 99%CPU (0avgtext+0avgdata 332792maxresident)k
0inputs+0outputs (0major+147133minor)pagefaults 0swaps

I also see a similar discrepancy on real code (https://github.com/zulip/zulip) using -j1. Parallelism compensates for the gap in real time but seems to waste a lot more CPU time than it saves in real time, which doesn’t help when the goal is to run many linters in parallel (mypy, eslint, tsc, etc.).

@asottile
Copy link
Member Author

asottile commented Apr 3, 2021

In GitLab by @andersk on May 13, 2020, 03:27

I found two easy optimizations in !430 that cut some of the extra time; now it’s only 30% slower than pyflakes plus pycodestyle.

@asottile
Copy link
Member Author

asottile commented Apr 3, 2021

In GitLab by @asottile on May 13, 2020, 07:25

you'll also want to run mccabe and factor that time it, it's not just pyflakes and pycodestyle (plus flake8 adds a bunch of features which extend beyond the capabilities of pyflakes and pycodestyle and I expect those aren't zero-sum)

@asottile
Copy link
Member Author

asottile commented Apr 3, 2021

In GitLab by @sigmavirus24 on May 13, 2020, 07:28

Also generating a sequence here isn't a remotely reasonable thing to be
benchmarking against

@asottile
Copy link
Member Author

asottile commented Apr 3, 2021

In GitLab by @andersk on May 13, 2020, 13:45

changed the description

@asottile
Copy link
Member Author

asottile commented Apr 3, 2021

In GitLab by @andersk on May 13, 2020, 13:45

Like I said, I’m using flake8 with the default configuration, so mccabe is being skipped. Because I installed flake8 in a freshly created virtualenv, there are no other plugins installed. I really am trying to be fair and compare apples to apples. Please let me know if there’s some difference I missed.

I benchmarked the sequence not because it’s a shining specimen of Python wizardry, but because it’s easily reproducible and doesn’t lead to questions about whether some huge unfamiliar codebase is hiding some configuration or plugins that might slow things down.

But I also said that I see the discrepancy on real code (https://github.com/zulip/zulip). Here are the benchmarks to demonstrate that.

$ cd /tmp; rm -r ve; virtualenv ve; . ve/bin/activate; pip install flake8

$ git clone https://github.com/zulip/zulip.git; cd zulip

$ 'time' pyflakes . > /dev/null
Command exited with non-zero status 1
17.09user 0.06system 0:17.18elapsed 99%CPU (0avgtext+0avgdata 73152maxresident)k
0inputs+0outputs (0major+38254minor)pagefaults 0swaps

$ 'time' pycodestyle . > /dev/null
Command exited with non-zero status 1
26.39user 0.02system 0:26.45elapsed 99%CPU (0avgtext+0avgdata 19116maxresident)k
0inputs+0outputs (0major+3839minor)pagefaults 0swaps

$ 'time' flake8 -j1 . > /dev/null
Command exited with non-zero status 1
57.85user 0.19system 0:58.11elapsed 99%CPU (0avgtext+0avgdata 444304maxresident)k
0inputs+0outputs (0major+153604minor)pagefaults 0swaps

$ 'time' flake8 . > /dev/null
Command exited with non-zero status 1
81.83user 0.58system 0:11.36elapsed 725%CPU (0avgtext+0avgdata 119504maxresident)k
0inputs+0outputs (0major+251965minor)pagefaults 0swaps

This is all running from an in-memory tmpfs on an otherwise idle Ryzen 1800X desktop to guarantee stable benchmark results. The parallelism numbers are much worse on a laptop CPU, or with other linters running at the same time. But I want to start with a focus on the serial performance, because I want to compare apples to apples, and I understand that parallelism comes with a whole host of separate challenges.

@asottile
Copy link
Member Author

after #1545 flake8 is now (somehow) faster than the sum of its parts :D

$ time pyflakes s.py 

real	0m7.552s
user	0m7.432s
sys	0m0.116s
$ time pycodestyle s.py 

real	0m22.766s
user	0m22.751s
sys	0m0.024s
$ time flake8 s.py

real	0m26.938s
user	0m26.777s
sys	0m0.172s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant