Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type inference performance issue #579

Closed
7omb opened this issue Jul 2, 2018 · 8 comments
Closed

Type inference performance issue #579

7omb opened this issue Jul 2, 2018 · 8 comments

Comments

@7omb
Copy link

7omb commented Jul 2, 2018

In pylint 1.9.2 astroids type inference performs very badly for the code below:

args = []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args += []

if True:
    args.append('')

If the args.append('') appears in the first if statement pylint finishes almost immediately. If you move it to the end it already takes more than 7s. Profiling shows that the running time and number of calls to the function _infer_binary_operation increases exponentially

The issue can be reproduced in a virtual environment with only pylint installed and an empty pylintrc.

@brycepg
Copy link
Contributor

brycepg commented Jul 3, 2018

The inference result of args at the end of the above code:

  • has 16384 items
  • includes multiple duplicates with lines closer to the bottom being referenced more

log2(16384) is 14 which is equivalent to the number of AugAssigns in the above code, so it is definitely a n^2 perf issue


It looks like inference is (correctly?) inferencing all possible permutations of the list:

code = """
args = []

if True:
    args += ['a']

if True:
    args += ['b']

if True:
    args += ['c']
    
args #@
"""

result  = astroid.extract_node(code).inferred()
[result.as_string() for result in result]
['[]',
 "['a']",
 "['b']",
 "['a', 'b']",
 "['c']",
 "['a', 'c']",
 "['b', 'c']",
 "['a', 'b', 'c']"]

Not too sure what to do about this

@PCManticore
Copy link
Contributor

Is this an actual example of something that can happen in the wild or is just an artificial example? I wonder if we'd need to introduce some sort of recursion guard in the inference, that will stop after a certain amount of recursive inferred branches. This should prevent a couple of performance issues where we'd try to infer all the potential values that a complex function could generate, although I'm not yet exactly sure on how the mechanics would work for that guard.

@7omb
Copy link
Author

7omb commented Jul 3, 2018

Yes, this is an actual example which happened a few days ago. It appeared in a function which takes takes a setting dictionary and constructs the command line arguments for a monitoring plugin. Basically one commit increased the pylint running time of our project from about 0:30 h to 5:30 h.

@PCManticore
Copy link
Contributor

Ouch, that's quite the jump! Curious how big your project in terms of LOC is if it already takes half an hour without this bug.

@7omb
Copy link
Author

7omb commented Jul 3, 2018

According to cloc roughly 2000 Python files with 250000 lines of code. Without this issue and a bit of tuning it's now down to 0:10 h again.

@svenpanne
Copy link

Just for clarification: Spell checking in pylint seems to take ages, so we took out the checks wrong-spelling-in-comment and wrong-spelling-in-docstring. This brought down the pylint runtime from roughly 30-40 min to 11 min. Just in case anybody wonders...

@brycepg
Copy link
Contributor

brycepg commented Jul 3, 2018

I think we could open an issue on pylint for that performance problem

@PCManticore
We could limit the number of possible inferences in cache_generator by counting the number of yielded inferences, and, if it hit some limit (like 1000), just append Uninferable at the end and return from the generator. It could even be a environment variable set lower for performance on bigger projects.

@PCManticore
Copy link
Contributor

@brycepg Good catch, that might work!

brycepg added a commit to brycepg/astroid that referenced this issue Jul 6, 2018
spot performance issues.

Add new envrionment variable call ASTROID_MAX_INFERABLE to tune
the max inferable amount of values at a time.

Close pylint-dev#579
Close pylint-dev/pylint#2251
brycepg added a commit to brycepg/astroid that referenced this issue Jul 6, 2018
spot performance issues.

Add new envrionment variable call ASTROID_MAX_INFERABLE to tune
the max inferable amount of values at a time.

Close pylint-dev#579
Close pylint-dev/pylint#2251
brycepg added a commit to brycepg/astroid that referenced this issue Jul 6, 2018
spot performance issues.

Add new envrionment variable call ASTROID_MAX_INFERABLE to tune
the max inferable amount of values at a time.

Close pylint-dev#579
Close pylint-dev/pylint#2251
brycepg added a commit to brycepg/astroid that referenced this issue Jul 6, 2018
spot performance issues.

Add new envrionment variable call ASTROID_MAX_INFERABLE to tune
the max inferable amount of values at a time.

Close pylint-dev#579
Close pylint-dev/pylint#2251
brycepg added a commit to brycepg/astroid that referenced this issue Jul 6, 2018
spot performance issues.

Add new envrionment variable call ASTROID_MAX_INFERABLE to tune
the max inferable amount of values at a time.

Close pylint-dev#579
Close pylint-dev/pylint#2251
brycepg added a commit to brycepg/astroid that referenced this issue Jul 6, 2018
spot performance issues.

Add new envrionment variable call ASTROID_MAX_INFERABLE to tune
the max inferable amount of values at a time.

Close pylint-dev#579
Close pylint-dev/pylint#2251
brycepg added a commit that referenced this issue Jul 6, 2018
spot performance issues.

Add new envrionment variable call ASTROID_MAX_INFERABLE to tune
the max inferable amount of values at a time.

Close #579
Close pylint-dev/pylint#2251
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants