Skip to content

feat(index): append #1282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 30, 2025
Merged

Conversation

cmp0xff
Copy link
Contributor

@cmp0xff cmp0xff commented Jul 18, 2025

  • Tests added: Please use assert_type() to assert the type of any return value

Index.append used not to be typed. In this PR, typings for Index.append are added and tested.

@overload
def append(self, other: Index[Never]) -> Index: ...
@overload
def append(self, other: Index[S1] | Sequence[Index[S1]]) -> Index[S1]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be nicer here to have a second S1 here as the resulting Index can contain a mix of different types.

def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

45e0ad3, but this one works less well.

  • mypy is not happy with Index[int].append(Index[int | str]) and gives Index[Any]
  • pyright is not happy with Index[int | str].append([Index[int], Index[str]]) and gives Index[int | Any]. In particular, the typing for [Index[int], Index[str]] seems to be list[Index[int] | Index[str]], instead of list[Index[int | str]].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • mypy is not happy with Index[int].append(Index[int | str]) and gives Index[Any]
  • pyright is not happy with Index[int | str].append([Index[int], Index[str]]) and gives Index[int | Any]. In particular, the typing for [Index[int], Index[str]] seems to be list[Index[int] | Index[str]], instead of list[Index[int | str]].

While that is annoying for testing on the CI, I think that is the safer choice for user: rather expect a wider type that includes Any than suggesting it is a narrower type. This needs input from @Dr-Irv.

If S1 and S2 were covariant, it seems to work for at least pyright in a simple toy example (but they are invariant)

from __future__ import annotations
from typing import TypeVar, reveal_type, Generic, Sequence

S1 = TypeVar("S1", bound=int | str, covariant=True)
S2 = TypeVar("S2", bound=int | str, covariant=True)

class Index(Generic[S1]):
    def __init__(self, data: list[S1]) -> None: ...

    def append(self: Index[S1], other: Sequence[Index[S2]]) -> Index[S1 | S2]: ...

strings = Index(["a"])
reveal_type(strings)
ints = Index([1])
reveal_type(ints)

reveal_type(strings.append([ints]))
reveal_type(ints.append([strings]))

string_ints = Index(["a", 1])
reveal_type(string_ints)
reveal_type(string_ints.append([ints]))
reveal_type(strings.append([string_ints]))

reveal_type(strings.append([ints]))
reveal_type(strings.append([strings, ints]))

reveal_type(strings.append([ints, strings, string_ints]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If S1 and S2 were covariant, it seems to work for at least pyright in a simple toy example (but they are invariant)

from __future__ import annotations
from typing import TypeVar, reveal_type, Generic, Sequence

S1 = TypeVar("S1", bound=int | str, covariant=True)
S2 = TypeVar("S2", bound=int | str, covariant=True)

class Index(Generic[S1]):
    def __init__(self, data: list[S1]) -> None: ...

    def append(self: Index[S1], other: Sequence[Index[S2]]) -> Index[S1 | S2]: ...

Hi, I am new to covariance / contravariance, but I read PEP484 (covariance-and-contravariance) and it says covariant is for classes, not for functions, where the latter case is prohibited. In your example, S1 is find, but not S2. Could you help me and explain a bit? Thanks.

B_co = TypeVar('B_co', covariant=True)

def bad_func(x: B_co) -> B_co:  # Flagged as error by a type checker
    ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't change S1 to be covariant. While the following is not exactly what we like to have, it is probably the closest we can get (but it doesn't work with mypy, unless the caller casts).

from __future__ import annotations
from typing import TypeVar, reveal_type, Generic, Sequence

S1 = TypeVar("S1", bound=int | str)
IndexT = TypeVar("IndexT", bound="Index")


class Index(Generic[S1]):
    def __init__(self, data: list[S1]) -> None: ...

    def append(self: Index[S1], other: Sequence[IndexT]) -> Index[S1] | IndexT: ...


strings = Index(["a"])
reveal_type(strings)
ints = Index([1])
reveal_type(ints)

reveal_type(strings.append([ints]))
reveal_type(ints.append([strings]))

string_ints = Index(["a", 1])
reveal_type(string_ints)
reveal_type(string_ints.append([ints]))
reveal_type(strings.append([string_ints]))

reveal_type(strings.append([ints]))
reveal_type(strings.append([strings, ints]))

reveal_type(strings.append([ints, strings, string_ints]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I ran the script myself. In the most complicated case, I see Index[int | str] | Index[int] | Index[str]. To be honest, as a user I would rather see Index[Unknown], because it's simpler, and in both cases I would probably still need a manual cast. Nevertheless, 3844062

@cmp0xff cmp0xff requested a review from twoertwein July 18, 2025 16:43
@overload
def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ...
@overload
def append(self, other: Sequence[_T_INDEX]) -> Self | _T_INDEX: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use Sequence[Index] here without the TypeVar ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea came from #1282 (comment). Removed in 67e6bde.

@cmp0xff cmp0xff requested a review from Dr-Irv July 29, 2025 11:13
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not return things like Index[str | int] and just make that Index

def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ...
@overload
def append(self, other: Sequence[_T_INDEX]) -> Self | _T_INDEX: ...
def append(self, other: Index[S2]) -> Index[S1 | S2]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def append(self, other: Index[S2]) -> Index[S1 | S2]: ...
def append(self, other: Index[S2]) -> Index: ...

I really want to avoid having the union types here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmp0xff cmp0xff requested a review from Dr-Irv July 29, 2025 15:36
Comment on lines 1047 to 1053
third.append([]), "pd.Index[int | str]"
),
pd.Index,
)
check(
assert_type( # type: ignore[assert-type]
third.append(cast("list[Index[Any]]", [])), "pd.Index[int | str]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change these to pd.Index in the assert_type statements, and then I don't think you need the # type: ignore statements., i.e., remove the union part of the generic paramter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments in fe5f1a9.

  • We have the following overloads
    @overload
    def append(self, other: Index[S1] | Sequence[Index[S1]]) -> Self: ...
    @overload
    def append(self, other: Index | Sequence[Index]) -> Index: ...
  • pyright uses the first overload and gives Index[str | int]
  • mypy uses the second overload and gives Index[Any]
  • Even if I add the following as the first overload
    @overload
    def append(self, other: Sequence[Never]) -> Self: ...
    mypy still uses the last overload and gives Index[Any]

I tend to believe this is a mypy bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After all, Index(["a", 1]) was created as the type Index[str | int] from the beginning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about 1c75df6? I am trying to get rid of Any by using constraint instead of bound in the TypeVar.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine. Have to wonder if we should change Index to be based on C2 instead of S1, but that can be another PR.

@cmp0xff cmp0xff requested a review from Dr-Irv July 30, 2025 08:04
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @cmp0xff

@Dr-Irv Dr-Irv merged commit bf1221e into pandas-dev:main Jul 30, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants