-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E0401 (import-error) checks perform a lot of repeated stat calls #9310
Comments
Thank you for analyzing the problem and opening an issue. |
Ran the commands as specified, seems like I was able to reproduce it. Here is an excerpt:
Can't promise anything but will take a look into it. Besides comparing results against this test repository itself, I wonder if there are a set of standard repos to use for performance analysis? Was thinking of double checking any changes against some repos used in the primer 🤔 |
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Local testing shows that caching reduces the total amount of calls to find_module methods (used by find_spec) by about 50%. Linting the test repository in the related issue goes from 439 to 351 seconds. Closes pylint-dev/pylint#9310.
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Local testing shows that caching reduces the total amount of calls to find_module methods (used by find_spec) by about 50%. Linting the test repository in the related issue goes from 439 to 351 seconds. Closes pylint-dev/pylint#9310.
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Local testing shows that caching reduces the total amount of calls to find_module methods (used by find_spec) by about 50%. Linting the test repository in the related issue goes from 40 seconds to 37 seconds. This was on a NVME disk and after warmup, so timing gains may be bigger on slower file systems like the one mentioned in the referenced issue. Closes pylint-dev/pylint#9310.
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Local testing shows that caching reduces the total amount of calls to find_module methods (used by find_spec) by about 50%. Linting the test repository in the related issue goes from 40 seconds to 37 seconds. This was on a NVME disk and after warmup, so timing gains may be bigger on slower file systems like the one mentioned in the referenced issue. Closes pylint-dev/pylint#9310.
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Local testing shows that caching reduces the total amount of calls to find_module methods (used by find_spec) by about 50%. Linting the test repository in the related issue goes from 58 seconds to 52 seconds. This was on a NVME disk and after warmup, so timing gains may be bigger on slower file systems like the one mentioned in the referenced issue. Closes pylint-dev/pylint#9310.
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Local testing shows that caching reduces the total amount of calls to find_module methods (used by find_spec) by about 50%. Linting the test repository in the related issue goes from 58 seconds to 52 seconds. This was on a NVME disk and after warmup, so timing gains may be bigger on slower file systems like the one mentioned in the referenced issue. Closes pylint-dev/pylint#9310.
If possible it is desirable to look for modules with no context file as it results in no search paths being given to astroid's find_spec(). This makes call to it more uniform and opens up the possibility of effective caching. Related to pylint-dev#9310.
If possible it is desirable to look for modules with no context file as it results in no search paths being given to astroid's find_spec(). This makes call to it more uniform and opens up the possibility of effective caching. Refs pylint-dev#9310.
If possible it is desirable to look for modules with no context file as it results in no search paths being given to astroid's find_spec(). This makes calls to it more uniform and opens up the possibility of effective caching. Refs pylint-dev#9310.
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Closes pylint-dev/pylint#9310.
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Closes pylint-dev/pylint#9310.
* Avoid search paths for ImportChecker when possible If possible it is desirable to look for modules with no context file as it results in no search paths being given to astroid's find_spec(). This makes calls to it more uniform and opens up the possibility of effective caching. Refs #9310.
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Closes pylint-dev/pylint#9310.
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Closes pylint-dev/pylint#9310.
Bug description
I run pylint on a repo that is mounted via SSHFS, which leads to slow I/O speeds.
While profiling a run, I noticed that the
import-error
checks perform a lot of repeated stat calls because they check for the presence of various.py
,.pyc
,.so
,.cpython-311-x86_64-linux-gnu.so
, etc. files.Many of these presence checks are repeated, so I'm wondering if it would be possible to improve performance by eliminating repeated checks or caching the results of previous calls.
I have prepared a repo that illustrates the issue. (The example repo contains ~60 files, whereas the repo I noticed the performance issue with contains ~2000 files.)
I noticed that pylint's performance can be improved by adding "missing"
__init__.py
files to the repo, but I'm hoping pylint itself can be tuned to increase performance even further.Configuration
Command used
Steps to reproduce
git clone --branch import-error-stats https://github.com/correctmost/pylint-corpus.git cd pylint-corpus python ./profile_pylint.py head -n 20 profiler_stats
Analysis
Notice that one of the top results is for
posix.stat
:posix.stat
is called byisfile
, which is called most often byfind_module
in astroid:There is evidence of repeated stats from strace:
Pylint output
There is no output, just reduced performance
Expected behavior
Improved performance via caching or reduced file-presence checks
Pylint version
OS / Environment
Arch Linux
Additional dependencies
No response
The text was updated successfully, but these errors were encountered: