-
-
Notifications
You must be signed in to change notification settings - Fork 146
feat(index): append #1282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(index): append #1282
Conversation
pandas-stubs/core/indexes/base.pyi
Outdated
@overload | ||
def append(self, other: Index[Never]) -> Index: ... | ||
@overload | ||
def append(self, other: Index[S1] | Sequence[Index[S1]]) -> Index[S1]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be nicer here to have a second S1 here as the resulting Index can contain a mix of different types.
def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
45e0ad3, but this one works less well.
mypy
is not happy withIndex[int].append(Index[int | str])
and givesIndex[Any]
pyright
is not happy withIndex[int | str].append([Index[int], Index[str]])
and givesIndex[int | Any]
. In particular, the typing for[Index[int], Index[str]]
seems to belist[Index[int] | Index[str]]
, instead oflist[Index[int | str]]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy
is not happy withIndex[int].append(Index[int | str])
and givesIndex[Any]
pyright
is not happy withIndex[int | str].append([Index[int], Index[str]])
and givesIndex[int | Any]
. In particular, the typing for[Index[int], Index[str]]
seems to belist[Index[int] | Index[str]]
, instead oflist[Index[int | str]]
.
While that is annoying for testing on the CI, I think that is the safer choice for user: rather expect a wider type that includes Any
than suggesting it is a narrower type. This needs input from @Dr-Irv.
If S1
and S2
were covariant, it seems to work for at least pyright in a simple toy example (but they are invariant)
from __future__ import annotations
from typing import TypeVar, reveal_type, Generic, Sequence
S1 = TypeVar("S1", bound=int | str, covariant=True)
S2 = TypeVar("S2", bound=int | str, covariant=True)
class Index(Generic[S1]):
def __init__(self, data: list[S1]) -> None: ...
def append(self: Index[S1], other: Sequence[Index[S2]]) -> Index[S1 | S2]: ...
strings = Index(["a"])
reveal_type(strings)
ints = Index([1])
reveal_type(ints)
reveal_type(strings.append([ints]))
reveal_type(ints.append([strings]))
string_ints = Index(["a", 1])
reveal_type(string_ints)
reveal_type(string_ints.append([ints]))
reveal_type(strings.append([string_ints]))
reveal_type(strings.append([ints]))
reveal_type(strings.append([strings, ints]))
reveal_type(strings.append([ints, strings, string_ints]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
S1
andS2
were covariant, it seems to work for at least pyright in a simple toy example (but they are invariant)from __future__ import annotations from typing import TypeVar, reveal_type, Generic, Sequence S1 = TypeVar("S1", bound=int | str, covariant=True) S2 = TypeVar("S2", bound=int | str, covariant=True) class Index(Generic[S1]): def __init__(self, data: list[S1]) -> None: ... def append(self: Index[S1], other: Sequence[Index[S2]]) -> Index[S1 | S2]: ...
Hi, I am new to covariance / contravariance, but I read PEP484 (covariance-and-contravariance) and it says covariant
is for classes, not for functions, where the latter case is prohibited. In your example, S1
is find, but not S2
. Could you help me and explain a bit? Thanks.
B_co = TypeVar('B_co', covariant=True) def bad_func(x: B_co) -> B_co: # Flagged as error by a type checker ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't change S1
to be covariant. While the following is not exactly what we like to have, it is probably the closest we can get (but it doesn't work with mypy, unless the caller casts).
from __future__ import annotations
from typing import TypeVar, reveal_type, Generic, Sequence
S1 = TypeVar("S1", bound=int | str)
IndexT = TypeVar("IndexT", bound="Index")
class Index(Generic[S1]):
def __init__(self, data: list[S1]) -> None: ...
def append(self: Index[S1], other: Sequence[IndexT]) -> Index[S1] | IndexT: ...
strings = Index(["a"])
reveal_type(strings)
ints = Index([1])
reveal_type(ints)
reveal_type(strings.append([ints]))
reveal_type(ints.append([strings]))
string_ints = Index(["a", 1])
reveal_type(string_ints)
reveal_type(string_ints.append([ints]))
reveal_type(strings.append([string_ints]))
reveal_type(strings.append([ints]))
reveal_type(strings.append([strings, ints]))
reveal_type(strings.append([ints, strings, string_ints]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I ran the script myself. In the most complicated case, I see Index[int | str] | Index[int] | Index[str]
. To be honest, as a user I would rather see Index[Unknown]
, because it's simpler, and in both cases I would probably still need a manual cast
. Nevertheless, 3844062
pandas-stubs/core/indexes/base.pyi
Outdated
@overload | ||
def append(self, other: Index[S1] | Sequence[Index[S1]]) -> Self: ... | ||
@overload | ||
def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's right. I think the result would be Index[S1] | Index[S2]
.
I'm not sure how the downstream Index
stuff would work with the union type inside the generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we append Index([1])
to Index(["a"])
, we actually get Index(["a", 1])
, which has the type Index[str | int]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't narrow it to a specific type that is one of the types in S1
, then I think it will need to be just Index
without specifying the subtype, or Index[Any]
, or even UnknownIndex
.
Right now, if you did Index(["a", 1])
, the revealed type by pyright is Index[str | int]
, but with mypy it is Index[Any]
, so I think I want to avoid placing unions inside the generic type because I don't know what else will break and there could be inconsistencies in the type checkers in handling this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
67e6bde, except for that mypy
is unable to handle pd.Index([1, "a"]).append(pd.Index([1, "a"]))
and gives Index[Any]
.
pandas-stubs/core/indexes/base.pyi
Outdated
@overload | ||
def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ... | ||
@overload | ||
def append(self, other: Sequence[_T_INDEX]) -> Self | _T_INDEX: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use Sequence[Index]
here without the TypeVar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea came from #1282 (comment). Removed in 67e6bde.
tests/test_indexes.py
Outdated
check(assert_type(first.append(third), "pd.Index[int | str]"), pd.Index) # type: ignore[assert-type] | ||
check(assert_type(first.append([third]), "pd.Index[int | str]"), pd.Index) # type: ignore[assert-type] | ||
check( | ||
assert_type( # type: ignore[assert-type] | ||
first.append([second, third]), # pyright: ignore[reportAssertTypeFailure] | ||
"pd.Index[int | str]", | ||
), | ||
pd.Index, | ||
) | ||
|
||
check(assert_type(third.append([]), "pd.Index[int | str]"), pd.Index) # type: ignore[assert-type] | ||
check( | ||
assert_type(third.append(cast("list[Index[Any]]", [])), "pd.Index[int | str]"), # type: ignore[assert-type] | ||
pd.Index, | ||
) | ||
check(assert_type(third.append([first]), "pd.Index[int | str]"), pd.Index) # type: ignore[assert-type] | ||
check( | ||
assert_type( # type: ignore[assert-type] | ||
third.append([first, second]), # pyright: ignore[reportAssertTypeFailure] | ||
"pd.Index[int | str]", | ||
), | ||
pd.Index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests need to work without having ignore
in them. So you'll need to fix the types in the append()
declarations to make that happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three type: ignore
remains, which I believe are mypy
bugs. I tried to add
@overload
def append(self, other: Sequence[Never]) -> Self: ...
as the first overload, which did not help.
tests/test_indexes.py
Outdated
"""Test pd.Index[list[str]].append""" | ||
first = pd.Index([["str", "rts"]]) | ||
second = pd.Index([["srt", "trs"]]) | ||
check(assert_type(first.append([]), "pd.Index[list[str]]"), pd.Index, list) | ||
check(assert_type(first.append(second), "pd.Index[list[str]]"), pd.Index, list) | ||
check(assert_type(first.append([second]), "pd.Index[list[str]]"), pd.Index, list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be supporting Index[list[str]]
because list[str]
is not hashable and labels in an Index should be hashable.
But this is a bug in pandas
, I think. See pandas-dev/pandas#61937
So can you remove this test.
Then we will have to separate out list[str]
from S1
and have an I1
that includes everything in S1
except list[str]
, while S1
includes everything.
So can you make that change as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing Index[list[str]]
is hard. The following code runs:
import pandas as pd
pd.Index(["a_b"]).str.split("_")
It produces Index([['a', 'b']], dtype='object')
. We have Index[S1]
all over pandas-stubs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it really isn't supported to have list[str]
in an Index:
ind = pd.Index(["a_b", "c_d"])
spl = ind.str.split("_")
spl.duplicated()
This will create an exception because the list is unhashable.
Might be best in this PR to just leave out the test with append
and Index[list[str]]
and not worry for now about splitting it out until the issue in pandas is straightened out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not return things like Index[str | int]
and just make that Index
pandas-stubs/core/indexes/base.pyi
Outdated
def append(self, other: Index[S2] | Sequence[Index[S2]]) -> Index[S1 | S2]: ... | ||
@overload | ||
def append(self, other: Sequence[_T_INDEX]) -> Self | _T_INDEX: ... | ||
def append(self, other: Index[S2]) -> Index[S1 | S2]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def append(self, other: Index[S2]) -> Index[S1 | S2]: ... | |
def append(self, other: Index[S2]) -> Index: ... |
I really want to avoid having the union types here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/test_indexes.py
Outdated
check( | ||
assert_type( # type: ignore[assert-type] | ||
first.append([second, third]), # pyright: ignore[reportAssertTypeFailure] | ||
"pd.Index[int | str]", | ||
first.append(third), "pd.Index[int | str]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be Index[Any]
(or just pd.Index
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d7a538d. Now just two # type: ignore[assert-type]
s remain.
third.append([]), "pd.Index[int | str]" | ||
), | ||
pd.Index, | ||
) | ||
check( | ||
assert_type( # type: ignore[assert-type] | ||
third.append(cast("list[Index[Any]]", [])), "pd.Index[int | str]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change these to pd.Index
in the assert_type
statements, and then I don't think you need the # type: ignore
statements., i.e., remove the union part of the generic paramter.
assert_type()
to assert the type of any return valueIndex.append
used not to be typed. In this PR, typings forIndex.append
are added and tested.