Process superclass methods before subclass methods in semanal #18723

ilevkivskyi · 2025-02-22T21:35:42Z

See also discussion in #18674 for another situation when this causes problems (deferrals). In general this problem is probably quite rare, but it bugs me, so I decided to go ahead with a simple and explicit (even though a bit ugly) solution.

ilevkivskyi · 2025-02-22T21:39:16Z

Hm, for some reason tests didn't start, I will try closing and re-opening.

ilevkivskyi · 2025-02-23T00:07:35Z

Oh well, it looks like this PR causes mypyc compiled mypy to segfault when running some tests.

ilevkivskyi · 2025-02-23T00:59:28Z

And as I guessed the error happens in one of those Bogus things, more precisely in CPyDef_semanal___SemanticAnalyzer___qualified_name (in PyUnicode_Concat one of the args is NULL or something).

ilevkivskyi · 2025-02-23T18:34:02Z

Actually it is much more tricky than that, fullname etc. are no longer Bogus instead empty strings are used. Looking at gdb, it seems that two totally valid strings are passed to PyUnicode_Concat but it segfaults. Maybe something is wrong with refcounting? I will try to dig a bit more with Python debug build.

JukkaL · 2025-02-24T10:31:29Z

mypy/semanal_main.py

+        return -1
+    if right_info in left_info.mro:
+        return 1
+    return 0


This seems to change the order of processing targets, even if derived classes are always after base classes (i.e. current ordering is already fine). I suspect that this will break the current SCC ordering algorithm, which we probably rely on in a bunch of places, and it could explain why things are failing. I think we must mostly follow the SCC ordering or we will have a bunch of weird regressions and generally a bad time.

Here's one potential way to fix this so this only changes the order when necessary:

Create a linear list of targets, similar to what you currently have.

Collect a set of all TypeInfos in the targets (e.g. all active_type values).

Iterate over the targets, and keep track of which TypeInfo's we've processed by removing the TypeInfo set created in the previous step. If we encounter a TypeInfo which has some MRO item that is in the set of TypeInfos, move that to a separately list (deferreds) instead of processing now.

After having iterated all targets, iterate over the deferred items.

The above approach could possibly be made even better by processing deferred nodes immediately after all the MRO entries have been processed, instead of waiting for all targets to be processed.

This has the benefit of not changing the processing order if it's already correct, and if it's incorrect, only the impacted targets will get rescheduled. This also could be a bit faster, since we perform a linear scan instead of a sort.

@JukkaL

This seems to change the order of processing targets, even if derived classes are always after base classes (i.e. current ordering is already fine)

I don't think I am following. Can you give an example of when this happens? I actually did a diff on full target list for mypy self check (including stdlib), and it is tiny, only few things that actually matter were changed (like e.g. couple visitors in mypy.types vs mypy.type_visitor).

Even then, how order of processing of method bodies can be so important? (All the top levels, including ClassDefs, are already processed at this point).

Ah I think I misunderstood how the ordering works. So it's probably fine. Changing the ordering of methods "shouldn't" change much, but it's just a very scary change that could trigger some pre-existing bugs or limitations. But if this only changes ordering very slightly, it should be fine.

Can you also manually test this when you import torch and numpy? At least torch has a massive import cycle which should be a good test case.

@JukkaL As we discussed, I now use an explicit ordering algorithm. Btw in the meantime I read a bit more about this, and although subclassing order is consistent as a total preorder, common sorting algorithms can be fooled by those (as they usually expect a total order). Also, the documented properties of Python sort are actually not enough to guarantee it will always work in this case.

Oh, and I also tried numpy and torch with latest version of this PR, didn't find any issues.

JukkaL · 2025-02-24T17:03:04Z

Maybe something is wrong with refcounting? I will try to dig a bit more with Python debug build.

Using a debug build is a good idea. Reference counting has been quite stable for a long time, but it's possible that something is still misbehaving.

ilevkivskyi · 2025-02-24T20:29:12Z

@JukkaL

Using a debug build is a good idea. Reference counting has been quite stable for a long time, but it's possible that something is still misbehaving.

It looks like something is wrong with unpacking of tuples. Replacing unpacking with indexing fixes the segfaults (see last commit). I still don't have any small repro, but looking at this comment

# Special-case multiple assignments like 'x, y = expr' to reduce refcount ops.

it seems to me this may be caused by #16022

ilevkivskyi · 2025-02-24T20:45:29Z

@JukkaL somewhat weird test to reproduce the segfault

[case testTupleUnpackingInCallback]
def f(x: tuple[str, int], y: tuple[str, int]) -> int:
    _, xi = x
    _, yi = y
    return 0

[file driver.py]
from native import f
from functools import cmp_to_key

xs = [("x" * i, i) for i in range(100)]
assert sorted(xs, key=cmp_to_key(f))[-1] == 99

ilevkivskyi · 2025-02-24T20:54:31Z

Another test case (a bit less sketchy) shows that the problem appears if one of the unpacking targets is unused in the function

[case testTupleUnpackingInCallback]
def f(x: tuple[str, int], y: tuple[str, int]) -> int:
    a, xi = x
    _, yi = y
    if a == "":
        return 0
    return 0

[file driver.py]
from native import f
from functools import cmp_to_key

xs = [("x" * i, i) for i in range(100)]
xs = sorted(xs, key=cmp_to_key(f))
print(xs[1])
print(xs[2])

ilevkivskyi · 2025-02-24T21:05:04Z

OK, sorry for spamming, last message until I (or you) fix this, finally a self-container repro for the segfault

[case testTupleUnpackingInCallback]
def f(x: tuple[str, int]) -> int:
    a, xi = x
    return 0

[file driver.py]
from native import f

xs = [("x" * i, i) for i in range(100)]
xs = [x for x in xs if f(x) == 0]
print(xs[1])
print(xs[2])

ilevkivskyi · 2025-02-25T11:00:57Z

@JukkaL unless I am missing some other edge case, I think #18732 should fix it.

github-actions · 2025-02-28T10:00:42Z

Diff from mypy_primer, showing the effect of this PR on open source code:

ignite (https://github.com/pytorch/ignite)
+ ignite/metrics/precision_recall_curve.py:100: error: Unused "type: ignore" comment  [unused-ignore]
+ ignite/metrics/precision_recall_curve.py:129: error: Unused "type: ignore" comment  [unused-ignore]

pandas (https://github.com/pandas-dev/pandas)
+ pandas/io/parsers/python_parser.py:168: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/python_parser.py:291: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/python_parser.py:328: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/python_parser.py:335: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/python_parser.py:671: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/python_parser.py:1135: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/python_parser.py:1189: error: Unused "type: ignore" comment  [unused-ignore]
- pandas/io/parsers/c_parser_wrapper.py:123: error: Cannot determine type of "names"  [has-type]
+ pandas/io/parsers/c_parser_wrapper.py:75: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:103: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:111: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:117: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:122: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:134: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:148: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:150: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:153: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:158: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:162: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:166: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:170: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:174: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:179: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:183: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:185: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:216: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:220: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:239: error: Unused "type: ignore" comment  [unused-ignore]
+ pandas/io/parsers/c_parser_wrapper.py:247: error: Argument 1 to "dedup_names" has incompatible type "Sequence[Hashable] | None"; expected "Sequence[Hashable]"  [arg-type]
+ pandas/io/parsers/c_parser_wrapper.py:248: error: Argument 1 to "is_potential_multi_index" has incompatible type "Sequence[Hashable] | None"; expected "Sequence[Hashable] | MultiIndex"  [arg-type]
+ pandas/io/parsers/c_parser_wrapper.py:274: error: Unused "type: ignore" comment  [unused-ignore]

Tanjun (https://github.com/FasterSpeeding/Tanjun)
- tanjun/dependencies/limiters.py:1127: error: Cannot determine type of "default_value"  [has-type]

ilevkivskyi · 2025-02-28T14:10:15Z

@JukkaL if you don't have further comments, I would prefer to merge this today/tomorrow, as there couple other PRs that depend on this.

JukkaL · 2025-02-28T14:24:07Z

Can you run perf_compare script on this? Also would be good to test with perf_compare.py -c 'import torch', since it has a massive import cycle which might trigger some worst case behavior. I'm mostly worried about some O(n**2) behavior that might be triggered when processing large SCCs.

ilevkivskyi · 2025-02-28T16:57:47Z

@JukkaL
I did a run for torch and the results are within noise level:

=== Results ===

master                    36.492s (0.0%)
order-funcs-semanal       36.558s (+0.2%)

Which is not surprising, since the new complexity is additive, not multiplicative, i.e. even assuming we have 1000 classes in SCC, and we have worst case scenario, we will do 1000000 iterations, but we will be simply "busy-looping" very fast, it is not like we are going to analyze the methods 1000000 times, we still analyze each function/method exactly once.

JukkaL

Thanks for making the measurements and refactoring the implementation! Looks good now.

Process superclass methods before subclass methods in semanal

24ea7eb

ilevkivskyi requested a review from JukkaL February 22, 2025 21:35

ilevkivskyi mentioned this pull request Feb 22, 2025

Do not blindly undefer on leaving fuction #18674

Merged

ilevkivskyi closed this Feb 22, 2025

ilevkivskyi reopened this Feb 22, 2025

This comment has been minimized.

Sign in to view

JukkaL reviewed Feb 24, 2025

View reviewed changes

Work around mypyc bug

a068451

This comment has been minimized.

Sign in to view

ilevkivskyi added 2 commits February 27, 2025 23:43

Merge remote-tracking branch 'upstream/master' into order-funcs-semanal

266b27e

Use an explicit ordering algorithm

3aee57d

JukkaL approved these changes Feb 28, 2025

View reviewed changes

ilevkivskyi merged commit 5fcca77 into python:master Feb 28, 2025
18 checks passed

ilevkivskyi deleted the order-funcs-semanal branch February 28, 2025 18:01

Uh oh!

Process superclass methods before subclass methods in semanal #18723

Process superclass methods before subclass methods in semanal #18723

Uh oh!

Conversation

ilevkivskyi commented Feb 22, 2025

Uh oh!

ilevkivskyi commented Feb 22, 2025

Uh oh!

This comment has been minimized.

ilevkivskyi commented Feb 23, 2025

Uh oh!

ilevkivskyi commented Feb 23, 2025

Uh oh!

ilevkivskyi commented Feb 23, 2025

Uh oh!

JukkaL Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

JukkaL Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

JukkaL commented Feb 24, 2025

Uh oh!

This comment has been minimized.

ilevkivskyi commented Feb 24, 2025

Uh oh!

ilevkivskyi commented Feb 24, 2025

Uh oh!

ilevkivskyi commented Feb 24, 2025

Uh oh!

ilevkivskyi commented Feb 24, 2025

Uh oh!

ilevkivskyi commented Feb 25, 2025

Uh oh!

github-actions bot commented Feb 28, 2025

Uh oh!

ilevkivskyi commented Feb 28, 2025

Uh oh!

JukkaL commented Feb 28, 2025

Uh oh!

ilevkivskyi commented Feb 28, 2025

Uh oh!

JukkaL left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!