Slowdown in >=0.981 #14867

JMMarchant · 2023-03-10T18:57:45Z

Bug Report

There has been a sizeable slowdown (33x) in areas of our codebase in recent versions of mypy (since 0.981); using the latest version I was finally able to track down and profile the culprits to create a minimal example which seems to be related to Union typechecking.

To Reproduce

from __future__ import annotations

from typing import Dict, Union, Mapping, Protocol, Any, Optional, List

import numpy as np

class _TensorLike(Protocol):
    """Protocol defining what methods and operations a Generic Tensor can perform."""

    def __mul__(self: _TensorLike, other: Any) -> _TensorLike:
        """Multiplies the self tensor with the other tensor and returns the result."""
        ...

    def __sub__(self: _TensorLike, other: Any) -> _TensorLike:
        """Subtracts the other tensor from the self tensor and returns the result."""
        ...

    def squeeze(self: _TensorLike, axis: Optional[Any] = None) -> _TensorLike:
        """Returns a tensor with all the dimensions of input of size 1 removed."""
        ...

_Weights = Mapping[str, _TensorLike]

class Test:
    _other_worker_shares: List[int]

    def _encode_secret(self, secret: Union[_TensorLike, np.ndarray]) -> np.ndarray:
        return np.ones(5)

    def _reconstruct_secret(self, shares: List[Union[np.ndarray, int]]) -> np.ndarray:
        return np.ones(5)

    def _encode_and_reconstruct_state_dict(
        self, secret_state_dict: Union[_Weights, Mapping[str, np.ndarray]]
    ) -> Dict[str, np.ndarray]:
        """Encodes and reconstructs state dict from own and worker shares.

        Encrypts `secret_state_dict` using `own_shares` then reconstructs it using
        `self._other_worker_shares`.
        """
        encrypted_state_dict = [
            self._encode_secret(v) for _, v in secret_state_dict.items()
        ]
        reconstructed = [
            self._reconstruct_secret([*self._other_worker_shares, value])
            for value in encrypted_state_dict
        ]
        x = list(secret_state_dict)
        z = zip(x, reconstructed)
        d = dict(z)
        return d

Expected Behavior

Timings between versions are substantially different:

0.971 = 3.042s
1.1.1 = 1m44.30s

Per-line timings give:

    7    106.4
    8     10.9
   11      2.0
   12      1.9
   15      1.4
   16      1.0
   18     10.4
   19      4.8
   20      2.1
   22     58.1
   24     46.5
   25      2.0
   28    186.1
   31     83.3
   36      2.0
   41 78880690.2
   44    416.6
   48 8945003.5
   49    322.2
   50    458.1
   51      1.3

Given this (fairly) minimal example in which no typechecking errors are found I would expect there to not be such a slowdown.

Is there a paradigm shift or issue that would explain this and how can we go about making the typechecker happier and faster?

Actual Behavior

Substantial slowdown after 0.971

Your Environment

Mypy version used: 0.971/1.1.1
Mypy command-line flags: --line-checking-stats for 1.1.1 run
Mypy configuration options from mypy.ini (and other config files): None
Python version used: 3.8.16
numpy: 1.23.5 (though same behaviour in 1.24.2)

The text was updated successfully, but these errors were encountered:

ikonst · 2023-03-11T04:37:35Z

I tried narrowing it down more:

from __future__ import annotations

from typing import Any
from typing import Protocol

import numpy as np


class _TensorLike(Protocol):
    # Operator must be __add_, __sub__, __mul__.
    # Does not reproduce with binary operators (__add__, __xor__ etc.)
    def __mul__(self, other: Any) -> _TensorLike:
        ...


def f(arg: dict[str, _TensorLike] | dict[str, np.ndarray]) -> None:
    arg.get('foo')  # or most other dict methods -- SLOW LINE HERE

ikonst · 2023-03-11T05:33:40Z

See reproduction without numpy.

The trigger appears to be the 10 overloads that ndarray.__mul__ has, each with a self type, so I'm replicating this.
The fact that B is a protocol (not class) is significant.

from __future__ import annotations

from typing import Any
from typing import Generic
from typing import Protocol
from typing import TypeVar
from typing import overload


class V1:
    pass


class V2:
    pass


class V3:
    pass


class V4:
    pass


class V5:
    pass


class V6:
    pass


class V7:
    pass


class V8:
    pass


class V9:
    pass


class V10:
    pass


T = TypeVar("T")


class A(Generic[T]):
    @overload
    def foo(self: A[V1], other: V1) -> A[V1]:
        return A()

    @overload
    def foo(self: A[V2], other: V2) -> A[V2]:
        return A()

    @overload
    def foo(self: A[V3], other: V3) -> A[V3]:
        return A()

    @overload
    def foo(self: A[V4], other: V4) -> A[V4]:
        return A()

    @overload
    def foo(self: A[V5], other: V5) -> A[V5]:
        return A()

    @overload
    def foo(self: A[V6], other: V6) -> A[V6]:
        return A()

    @overload
    def foo(self: A[V7], other: V7) -> A[V7]:
        return A()

    # @overload
    # def foo(self: A[V8], other: V8) -> A[V8]:
    #     return A()
    #
    # @overload
    # def foo(self: A[V9], other: V9) -> A[V9]:
    #     return A()
    #
    # @overload
    # def foo(self: A[V10], other: V10) -> A[V10]:
    #     return A()

    def foo(self, other: Any) -> Any:
        return A()


class B(Protocol):
    def foo(self, other: Any) -> B:
        ...


x: A | B
x.__repr__()

AlexWaygood · 2023-03-11T16:54:47Z

I used git bisect with @ikonst's benchmark, and d27bff6 seems to be the "first bad commit" here. Mypy @ d27bff6 appears to be significantly slower when checking this snippet than mypy @ d2063d2. Cc. @ilevkivskyi.

(I bisected with non-compiled mypy btw, so my results might not be 100% accurate I suppose.)

JMMarchant · 2023-03-11T18:07:27Z

I used git bisect with @ikonst's benchmark, and d27bff6 seems to be the "first bad commit" here. Mypy @ d27bff6 appears to be significantly slower when checking this snippet than mypy @ d2063d2. Cc. @ilevkivskyi.

(I bisected with non-compiled mypy btw, so my results might not be 100% accurate I suppose.)

That matches with the previous commit I bisected to when trying to find the "bad" one; wasn't able to identify what was slower at that point though

ilevkivskyi · 2023-03-11T18:16:36Z

Hm, just to understand this better. What is the performance of something like

x: B = A()

(using the code by @ikonst) before and after the bad commit? IIUC this should be very slow even before the bad commit.

AlexWaygood · 2023-03-11T18:23:59Z

Hm, just to understand this better. What is the performance of something like
x: B = A()
(using the code by @ikonst) before and after the bad commit? IIUC this should be very slow even before the bad commit.

Yup, the time taken on my machine checking that is virtually identical between the two commits. (It seems to be much faster checking that code than it is checking x: A | B, apparently).

ilevkivskyi · 2023-03-11T18:36:08Z

Hm, interesting. Before that commit is_proper_subtype() (inconsistently) used some old simplistic logic for overloads, while the commit made it use the full/correct subtyping logic that was written in is_subtype(). So I thought this exposed bad performance in a corner case to more scenarios. Can you please also check what is the performance of

y: A | B
x: B = y

before vs after (and vs the original example by @ikonst)?

AlexWaygood · 2023-03-11T19:16:01Z

Can you please also check what is the performance of
y: A | B
x: B = y
before vs after (and vs the original example by @ikonst)?

That might be very slightly slower between the two commits, but still nothing like the dramatic slowdown on @ikonst's benchmark (which appears to get around 3x slower between the two commits).

ikonst · 2023-03-11T19:36:35Z

I think it gets exponentially slower with more overrides.

ilevkivskyi · 2023-03-11T22:39:58Z

OK, I figured it out, it indeed exposes pre-existing bad performance to more cases. Consider same example as @ikonst but with

class B(Protocol):
    def foo(self, other: object) -> B:  # Note object here
        ...

y: A | B
x: B = y

this is extremely slow even before 0.981, and the bad commit simply exposes the problem to is_proper_subtype() as well. I spent some time thinking about how to avoid the exponential growth here, but don't have any ideas yet.

ilevkivskyi · 2023-03-11T23:05:48Z

OK, well, there is a solution: negative subtype cache (tested it locally, works very fast for this case), but IIRC that was dangerous for some reason, @JukkaL what do you think?

ilevkivskyi · 2023-03-11T23:23:04Z

Also negative cache doesn't seem to cause any slow down on self-check. Here I opened a PR if someone wants to play with it #14884. cc @JMMarchant

JMMarchant · 2023-03-13T10:19:13Z

Tested this on my toy example and it's substantially better than 1.1.1 (12.313s). Still not as fast as 0.971 but much improved.

A possible solution for #14867 (I just copy everything from positive caches).

ilevkivskyi · 2023-04-11T14:09:23Z

@JMMarchant could you please check the current master? If the perf is acceptable, you can close the issue.

JMMarchant · 2023-04-17T10:21:28Z

Current master works well enough; there's an approximately 3x slowdown on a cold start (3min vs 1min), a 10x slowdown on a warm start (20s vs 2s) but this is substantially better than before this fix!

Thank you very much for all your work on this!

JMMarchant added the bug mypy got something wrong label Mar 10, 2023

JelleZijlstra added the performance label Mar 10, 2023

ilevkivskyi mentioned this issue Mar 11, 2023

Add negative subtype caches #14884

Merged

JukkaL pushed a commit that referenced this issue Apr 11, 2023

Add negative subtype caches (#14884)

b43e0d3

A possible solution for #14867 (I just copy everything from positive caches).

JMMarchant closed this as completed Apr 17, 2023

charlesfrye mentioned this issue May 5, 2024

Run examples on change modal-labs/modal-examples#735

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slowdown in >=0.981 #14867

Slowdown in >=0.981 #14867

JMMarchant commented Mar 10, 2023 •

edited

Loading

ikonst commented Mar 11, 2023 •

edited

Loading

ikonst commented Mar 11, 2023 •

edited

Loading

AlexWaygood commented Mar 11, 2023 •

edited

Loading

JMMarchant commented Mar 11, 2023

ilevkivskyi commented Mar 11, 2023

AlexWaygood commented Mar 11, 2023 •

edited

Loading

ilevkivskyi commented Mar 11, 2023

AlexWaygood commented Mar 11, 2023 •

edited

Loading

ikonst commented Mar 11, 2023

ilevkivskyi commented Mar 11, 2023

ilevkivskyi commented Mar 11, 2023

ilevkivskyi commented Mar 11, 2023

JMMarchant commented Mar 13, 2023

ilevkivskyi commented Apr 11, 2023

JMMarchant commented Apr 17, 2023

Slowdown in >=0.981 #14867

Slowdown in >=0.981 #14867

Comments

JMMarchant commented Mar 10, 2023 • edited Loading

ikonst commented Mar 11, 2023 • edited Loading

ikonst commented Mar 11, 2023 • edited Loading

AlexWaygood commented Mar 11, 2023 • edited Loading

JMMarchant commented Mar 11, 2023

ilevkivskyi commented Mar 11, 2023

AlexWaygood commented Mar 11, 2023 • edited Loading

ilevkivskyi commented Mar 11, 2023

AlexWaygood commented Mar 11, 2023 • edited Loading

ikonst commented Mar 11, 2023

ilevkivskyi commented Mar 11, 2023

ilevkivskyi commented Mar 11, 2023

ilevkivskyi commented Mar 11, 2023

JMMarchant commented Mar 13, 2023

ilevkivskyi commented Apr 11, 2023

JMMarchant commented Apr 17, 2023

JMMarchant commented Mar 10, 2023 •

edited

Loading

ikonst commented Mar 11, 2023 •

edited

Loading

ikonst commented Mar 11, 2023 •

edited

Loading

AlexWaygood commented Mar 11, 2023 •

edited

Loading

AlexWaygood commented Mar 11, 2023 •

edited

Loading

AlexWaygood commented Mar 11, 2023 •

edited

Loading