Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making Pylint faster #497

Merged
merged 3 commits into from
Mar 2, 2018
Merged

Making Pylint faster #497

merged 3 commits into from
Mar 2, 2018

Conversation

nickdrozd
Copy link
Contributor

I recently learned how to do Python profiling with yappi. pylint has
always seemed slow to me, so I decided to see if it could be sped up.

Here is the call graph from running pylint against
https://github.com/PyCQA/pycodestyle/blob/master/pycodestyle.py:

pycodestyle-master

On these graphs, nodes represent function calls, and the brighter the
node, the more time spent in that function. Each node has three
numbers: 1) the total time spent in the function, including its
subcalls; 2) the total time spent in that function but not its
subcalls; and 3) the total number of times the function was called.

I was somewhat surprised to learn (although it seems obvious in
retrospect) that the pylint code itself is not especially slow and
that most time is being spent in astroid functions. Looking at that
graph, it's obvious that nodes_of_clas and get_children are
bottlenecks, and optimizing those functions could have a big impact.
(To explain the numbers a bit, nodes_of_class is taking up almost
60% of total CPU time, and more than a third of that time is being
spent in get_children.)

First, I timed three runs of pylint with astroid on master to get
a benchmark:

real	1m17.996s
user	0m31.987s
sys	0m45.806s

real	1m17.828s
user	0m31.704s
sys 	0m45.936s

real	1m16.928s
user	0m31.327s
sys 	0m45.416s

Next, I timed three runs with astroid on the commit Add type-specific get_children. This commit gives each (or almost each) node class its
own get_children method instead of having them all use the same
generic function. This sped things up:

real	1m11.266s
user	0m30.778s
sys	0m40.306s

real	1m11.503s
user	0m31.215s
sys	0m40.036s

real	1m11.761s
user	0m31.376s
sys	0m40.194s

Here is the call graph with that change:

pycodestyle-children

As compared with the previous graph, nodes_by_class takes slightly
less total CPU time, but of the time it does take, less of it is spent
in its subcalls. This is because the type-specific get_children
calls go much faster.

Okay, so nodes_of_class is now the sole bottleneck. First, I applied
the commit Move nodes_of_class null check out of inner loop, which,
as the name suggests, just shuffles around the null check logic in
that function. This provided a modest speedup:

real	1m10.510s
user	0m30.712s
sys	0m39.624s

real	1m11.590s
user	0m31.072s
sys	0m40.328s

real	1m11.274s
user	0m31.055s
sys	0m40.049s

According to the call graph (which I won't bother to post), this
change saved about .2% of total CPU time, which is not bad for such a
small change.

Finally, I applied a much larger change, the commit Add type-specific nodes_of_class. This commit adds a few functions that are just like
nodes_of_class except that they apply only to specific node
searches. This eliminates the need to do expensive runtime type
checking. These replacing calls to nodes_of_class within astroid
with these new functions sped things up significantly:

real	0m58.005s
user	0m25.634s
sys	0m32.200s

real	0m57.874s
user	0m25.870s
sys	0m31.838s

real	0m58.177s
user	0m26.026s
sys	0m31.984s

Here is the call graph

pycodestyle-nodes

It's a little hard to interpret, but by my count nodes_of_class went
from taking ~58% of total CPU time to ~48%.

Note that all of this data comes from running pylint, as I said
earlier, against just a single file, pycodestyle.py. Larger pylint
targets (projects with lots of subdirectories, for instance) have
different profiles, and different functions become more prominent
(get_children and nodes_of_class are slow in all circumstances,
however). I have ideas for more optimizations, which I will come back
with after these first changes are taken care of.

@rogalski
Copy link
Contributor

Wow, an effort put to this is very impressive. Thank you!

@brycepg
Copy link
Contributor

brycepg commented Feb 19, 2018

Sorry ignore my comments if you saw them, I remembered that we haven't removed 2.7 yet from tox

@brycepg
Copy link
Contributor

brycepg commented Feb 20, 2018

@nickdrozd What do you think we can do to help prevent performance regressions? At first glance, some of this code would make me want to refactor it away because of DRY (but I understand the trade off)

@@ -967,6 +967,10 @@ def pytype(self):
:rtype: str
"""

def get_children(self):
for elt in self.elts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use yield from as @brycepg mentioned. We still have some bits of Python 2 compatibility in astroid, but we are in the process of removing it from both pylint and astroid (so if your PR fails on Python 2 for now, feel free to ignore that)

@@ -1172,6 +1176,9 @@ def __init__(self, name=None, lineno=None, col_offset=None, parent=None):

super(AssignName, self).__init__(lineno, col_offset, parent)

def get_children(self):
return (_ for _ in ())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@@ -1287,7 +1297,49 @@ class Arguments(mixins.AssignTypeMixin, NodeNG):

:type: NodeNG
"""

def get_children(self):
if self.args is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one would also benefit from yield from (like the entire PR). But I'd also like to have these grouped, as in:

yield from self.defaults
yield from self.kwonlyargs
yield from ...

The reason is that the blocks seem to be independent only in the nature of the value that gets yielded, but other than that, it's all the same.

@@ -644,6 +645,30 @@ def nodes_of_class(self, klass, skip_klass=None):
for matching in child_node.nodes_of_class(klass, skip_klass):
yield matching

def get_assign_nodes(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer these to be private. Some comments before them mentioning why they are like this would also help in the future in case someone wonders why these couldn't have been made more DRY

@PCManticore
Copy link
Contributor

Thank you for doing this amazing work @nickdrozd ! There's not much to be commented about this PR, left a couple of comments with things that we can improve, but overall looks like a pretty good gain in performance. Also I noticed from your yippi output that there might be other places where there are tons of calls, where we should have less, such as in the transforms for instance.
Also I support Bryce's question. Before we start optimising further, we should think about measuring the performance first; I imagine it would be super easy to introduce regressions for it.

@nickdrozd
Copy link
Contributor Author

Thanks for the kind words! I was somewhat worried that these changes
wouldn't be well-received, so I'm glad that everyone is on board 🐈

A few points:

yield from

I was going to say that I don't feel comfortable being the one to
cause the Python 2 tests to fail, but it looks like all that stuff is
already getting ripped out. I will update everything to use yield from.

Performance Regression Testing

I have no experience testing for performance regression, so I don't
have any clever ideas. This may be a naive approach, but the simplest
thing I can think to do would be to add a pass in the CI that runs
some fixed version of Pylint (say, 1.8) with the version of Astroid in
master or the PR being tested against some fixed file (say,
pycodestyle.py). If the approximate speed of that run in the test
environment can be determined, then the test will fail if the run of
Pylint is not within some reasonable bound of the target time.

That's a general (and again, possibly naive) approach to performance
regression testing. As far as preventing regression in the specific
areas affected by this PR, there are a few options. One possibility
would be to delete the generic NodeNG.get_children entirely and
require that each node type implements its own. A variation of this
idea would have NodeNG.get_children raise a NotImplementedError,
accompanied by a comment that says something like

# DON'T REFACTOR THIS, I KNOW IT LOOKS INELEGANT
# BUT IT'S LIKE THIS FOR PERFORMANCE REASONS

(NodeNG could be made into an actual abstract base class, but that
seems a little dramatic to me.)

Use of nodes_of_class within Astroid code is always bad, so that
should be prohibited. Astroid's pylintrc could probably be modified
to ensure this. In the worst case, a custom Pylint plugin could be
written to check for its use.

Making get_assign_nodes, etc, private

@PCManticore suggests that the type-specific nodes_of_class
replacements be made private. This is reasonable, as making them
"public" functions increases the API surface. On the other hand,
Pylint makes lots of calls to nodes_of_class, and these calls are,
of course, slow:

~/pylint $ grep -r nodes_of_class pylint
pylint/checkers/async.py:        for child in node.nodes_of_class(astroid.Yield):
pylint/checkers/base.py:    for import_node in try_block.nodes_of_class((astroid.ImportFrom, astroid.Import)):
pylint/checkers/base.py:        _node for _node in loop.nodes_of_class(loop_nodes,
pylint/checkers/base.py:        _node for _node in loop.nodes_of_class(astroid.Break,
pylint/checkers/base.py:        starred = list(node.targets[0].nodes_of_class(astroid.Starred))
pylint/checkers/base.py:        returns = node.nodes_of_class(astroid.Return,
pylint/checkers/base.py:            for child in node.nodes_of_class(astroid.Global)
pylint/checkers/base.py:        for node_name in node.nodes_of_class(astroid.Name):
pylint/checkers/base.py:            child.names for child in node.nodes_of_class(astroid.Nonlocal)
pylint/checkers/base.py:            child.names for child in node.nodes_of_class(astroid.Global)
pylint/checkers/base.py:        if any(expr.nodes_of_class(astroid.Call)):
pylint/checkers/classes.py:            for call in infer_method.nodes_of_class(astroid.Call):
pylint/checkers/classes.py:    for call in fundef_node.nodes_of_class(astroid.Call):
pylint/checkers/classes.py:        for stmt in node.nodes_of_class(astroid.Call):
pylint/checkers/imports.py:                any(node.nodes_of_class((astroid.Import, astroid.ImportFrom))):
pylint/checkers/imports.py:            if any(root.nodes_of_class((astroid.Import, astroid.ImportFrom))):
pylint/checkers/newstyle.py:        for stmt in node.nodes_of_class(astroid.Call):
pylint/checkers/refactoring.py:        for defined_argument in scope.args.nodes_of_class(astroid.AssignName):
pylint/checkers/refactoring.py:        for name in node.target.nodes_of_class(astroid.AssignName):
pylint/checkers/refactoring.py:            for name in names.nodes_of_class(astroid.AssignName):
pylint/checkers/refactoring.py:        return_nodes = node.nodes_of_class(astroid.Return)
pylint/checkers/refactoring.py:            for subscript in child.nodes_of_class(astroid.Subscript):
pylint/checkers/typecheck.py:    for name in statement.nodes_of_class(astroid.Name):
pylint/checkers/typecheck.py:        returns = list(function_node.nodes_of_class(astroid.Return,
pylint/checkers/utils.py:            for ass_node in _node.nodes_of_class(astroid.AssignName):
pylint/checkers/utils.py:            for ass_node in _node.target.nodes_of_class(astroid.AssignName):
pylint/checkers/utils.py:        for ass_node in _node.nodes_of_class(astroid.AssignName):
pylint/checkers/utils.py:        for imp_node in _node.nodes_of_class((astroid.ImportFrom, astroid.Import)):
pylint/checkers/utils.py:            for default_name_node in default_node.nodes_of_class(astroid.Name):
pylint/checkers/utils.py:        if node in base.nodes_of_class(astroid.Name):
pylint/checkers/variables.py:    imports = frame.nodes_of_class((astroid.Import, astroid.ImportFrom))
pylint/checkers/variables.py:    assign_stmts = name_node.scope().nodes_of_class(astroid.AssignName)
pylint/checkers/variables.py:        assigned_to = [var.name for var in node.target.nodes_of_class(astroid.AssignName)]
pylint/checkers/variables.py:        global_names = _flattened_scope_names(node.nodes_of_class(astroid.Global))
pylint/checkers/variables.py:        nonlocal_names = _flattened_scope_names(node.nodes_of_class(astroid.Nonlocal))
pylint/extensions/_check_docs_utils.py:            for ret in target.nodes_of_class(astroid.Return):
pylint/extensions/docparams.py:        return_nodes = node.nodes_of_class(astroid.Return)
pylint/extensions/redefined_variable_type.py:                          redef_parent in orig_parent.nodes_of_class(astroid.If)):

A whole slew of these functions could be added to the API:
get_assign_nodes, get_assignname_nodes, get_global_nodes, etc.
Using these instead of the generic nodes_of_class would make Pylint
faster still. Doing that is a lot more work, however, and would
require coordinating changes in the two repos.

As I write this, I've convinced myself that they should be made
"private" for now, but this decision should certainly be revisited in
the near future.

get_children is elegant and flexible and slow.

    def get_children(self):
        for field in self._astroid_fields:
            attr = getattr(self, field)
            if attr is None:
                continue
            if isinstance(attr, (list, tuple)):
                for elt in attr:
                    yield elt
            else:
                yield attr

It iterates over a list, dynamically accesses attributes, does null
checks, and does type checking. This function gets called a lot, and
all that extra work is a real drag on performance.

In most cases there isn't any need to do any of these checks. Take an
Assign node for instance:

    def get_children(self):
        for elt in self.targets:
            yield elt

        yield self.value

It's known in advance that Assign nodes have a list of targets and a
value, so just yield those without checking anything.
The check was being repeated unnecessarily in a tight loop.
nodes_of_class is a very flexible method, which is great for use in
client code (e.g. Pylint). However, that flexibility requires a great
deal of runtime type checking:

    def nodes_of_class(self, klass, skip_klass=None):
        if isinstance(self, klass):
            yield self

        if skip_klass is None:
            for child_node in self.get_children():
                for matching in child_node.nodes_of_class(klass, skip_klass):
                    yield matching

            return

        for child_node in self.get_children():
            if isinstance(child_node, skip_klass):
                continue
            for matching in child_node.nodes_of_class(klass, skip_klass):
                yield matching

First, the node has to check its own type to see whether it's of the
desired class. Then the skip_klass flag has to be checked to see
whether anything needs to be skipped. If so, the type of every yielded
node has to be check to see if it should be skipped.

This is fine for calling code whose arguments can't be known in
advance ("Give me all the Assign and ClassDef nodes, but skip all the
BinOps, YieldFroms, and Globals."), but in Astroid itself, every call
to this function can be known in advance. There's no need to do any
type checking if all the nodes know how to respond to certain
requests. Take get_assign_nodes for example. The Assign nodes know
that they should yield themselves and then yield their Assign
children. Other nodes know in advance that they aren't Assign nodes,
so they don't need to check their own type, just immediately yield
their Assign children.

Overly specific functions like get_yield_nodes_skip_lambdas certainly
aren't very elegant, but the tradeoff is to take advantage of knowing
how the library code works to improve speed.
@nickdrozd
Copy link
Contributor Author

The PR has been updated to

  • add get_children to every node,
  • use yield from, and
  • make the nodes_of_class replacements "private".

NodeNG.get_children could be deleted at this point.

@brycepg
Copy link
Contributor

brycepg commented Feb 24, 2018

Looks pretty good. I think it's a good idea to keep NodeNG.get_children for yet to be added nodes and as an interface

@PCManticore
Copy link
Contributor

I agree with Bryce that NodeNG.get_children could still be here for newly added nodes. Regarding pylint's usage of nodes_of_class I would say to not add those here, but rather implement them right into pylint, astroid should only provide fast nodes_of_class alternatives for what it's already using, while pylint could have its own variants.

Nevertheless, this was a fun patch @nickdrozd ! Thank you so much for contributing this work.
Regarding the performance regression, that sounds good, but I wonder if we're going to have potential varying results due to, say, CI being too slow at a given time. I wonder if there is any service that out there that provides this information somehow and that's free for open source projects. Anyway, I think we can merge this PR as is and we can revisit later how to measure performance.

@PCManticore PCManticore merged commit 9b5aa97 into pylint-dev:master Mar 2, 2018
@nickdrozd nickdrozd mentioned this pull request Mar 17, 2018
AWhetter added a commit that referenced this pull request May 5, 2018
52e6765 introduced a new type of
child iteration.
Some classes started yielding children in the incorrect order,
which was causing test failures in pylint.
This change corrects the ordering to match the _astroid_fields
of the class.

Relates to a change from #497
@nickdrozd nickdrozd mentioned this pull request May 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants