Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E0401 (import-error) checks perform repeated file reads #9603

Open
correctmost opened this issue May 6, 2024 · 1 comment
Open

E0401 (import-error) checks perform repeated file reads #9603

correctmost opened this issue May 6, 2024 · 1 comment
Labels
Astroid Related to astroid Enhancement ✨ Improvement to a component Needs PR This issue is accepted, sufficiently specified and now needs an implementation performance

Comments

@correctmost
Copy link
Contributor

correctmost commented May 6, 2024

Bug description

This is a follow-up to #9310, where I reported slowness with import-error checks due to repetitive I/O over SSHFS.

While profiling the new code, I noticed that the _is_setuptools_namespace checks in astroid cause the same files to be read over and over.

My public example repo shows the following reads:

  • 109 reads - pylint-corpus/src/__init__.py
  • 50 reads - pylint-corpus/src/resources/sites/pages/page.py/__init__.py
  • 50 reads pylint-corpus/src/resources/results/result.py/__init__.py

I'm hoping that the repeated reads can be prevented to speed up pylint. (My private repo has ~2,200 files and shows >20,000 repeated reads.)

Configuration

[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no

Command used

Steps to reproduce

git clone --branch import-error-stats https://github.com/correctmost/pylint-corpus.git
cd pylint-corpus

python ./profile_pylint.py

Analysis

strace shows the same files being opened repeatedly:

$ strace -e trace=openat python ./profile_pylint.py 2>&1 | sort | uniq -c | sort -nr

109 openat(AT_FDCWD, "pylint-corpus/src/__init__.py", O_RDONLY|O_CLOEXEC) = 3
 50 openat(AT_FDCWD, "pylint-corpus/src/resources/sites/pages/page.py/__init__.py", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (Not a directory)
 50 openat(AT_FDCWD, "pylint-corpus/src/resources/results/result.py/__init__.py", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (Not a directory)

It seems possible to avoid most of these reads with caching around _is_setuptools_namespace, but I wonder if _is_setuptools_namespace should even be called with a non-directory path (notice the ENOTDIR errors)?


Python profiling:

import pstats

stats = pstats.Stats('stats')
stats.print_callers('_io.open')

ncalls  tottime  cumtime
   206    0.017    0.023  astroid/interpreter/_import/spec.py:329(_is_setuptools_namespace)

Pylint output

There is no output, just reduced performance

Expected behavior

Improved performance via caching or reduced filesystem accesses

Pylint version

astroid @ git+https://github.com/pylint-dev/astroid.git@a4a9fcc44ae0d71773dc3bff6baa78fc571ecb7d
pylint @ git+https://github.com/pylint-dev/pylint.git@500774ae5a4e49e2aa0c8d3f2b64613e21aa676e
Python 3.12.3

OS / Environment

Arch Linux

Additional dependencies

No response

@correctmost correctmost added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label May 6, 2024
@Pierre-Sassoulas Pierre-Sassoulas added Enhancement ✨ Improvement to a component performance and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels May 7, 2024
@Pierre-Sassoulas
Copy link
Member

Love those issues, keep them coming ❤️ !

@Pierre-Sassoulas Pierre-Sassoulas added the Needs PR This issue is accepted, sufficiently specified and now needs an implementation label May 7, 2024
@jacobtylerwalls jacobtylerwalls added the Astroid Related to astroid label May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Astroid Related to astroid Enhancement ✨ Improvement to a component Needs PR This issue is accepted, sufficiently specified and now needs an implementation performance
Projects
None yet
Development

No branches or pull requests

3 participants