Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABCMeta.register support #2922

Open
miedzinski opened this issue Feb 28, 2017 · 18 comments
Open

ABCMeta.register support #2922

miedzinski opened this issue Feb 28, 2017 · 18 comments

Comments

@miedzinski
Copy link
Contributor

class A(metaclass=abc.ABCMeta): pass
class B: pass
A.register(B)
a: A = B()  # currently E: Incompatible types in assignment (expression has type "B", variable has type "A")
print(isinstance(a, A))  # -> True

Mypy could support cases like this by special casing ABCMeta.register calls. __subclasshook__, __subclasscheck__, and __instancecheck__ are more troublesome and could be left ignored.

@JukkaL
Copy link
Collaborator

JukkaL commented Feb 28, 2017

This was on my radar early in mypy development. I didn't work on this back then because this is a tricky feature. For example, if module x has A.register(B), it can affect a seemingly unrelated module y if it tries to use B as a subclass of A. This is 'non-modular' behavior, and mypy would somehow have to figure out that y depends on x even if it doesn't import it, even indirectly. Currently dependencies between modules are completely described by the import graph.

There are few potential ways to support this:

  1. Have A.register(B) invalidate all modules that have been already type checked and that may use A and B. This can be tricky to implement and could slow down things, as we may have to process modules multiple times. Conservatively, A.register(B) would invalidate every module that uses A or B or a subclass of B.
  2. Like (1), but also cache all register calls from the previous run and somehow use these to tweak the order of processing modules. The goal is to avoid reprocessing modules multiple times. It's unclear if this idea would work.
  3. Make A.register(B) visible/active only in modules that (recursively) import the module that has this declaration. Mypy would have to keep track of these register calls defined in each module and which of them are active in any given module. This would deviate from Python semantics and could be a little confusing. However, there would no 'spooky action at a distance'.
  4. Have a new pass that only finds A.register(B) declarations in the entire program. This would get run before type checking. We can probably make this reasonably fast in incremental mode by being clever with caching etc. [This would probably result in mypy having to keep the entire program in memory when doing a non-incremental run, which would be unfortunate. Currently we could strip most nodes in ASTs of already processed modules to lower RAM use.]
  5. Only respect those register calls that mypy has seen at any given stage of processing. This would approximate how these work at runtime, but since mypy can process modules in a different order from Python, this wouldn't always work as expected and could be confusing. This would be easy to implement, at least. We'd probably want to process all register calls inside a module before processing the rest of the module, though, as otherwise the order of definitions within a module would affect type checking results, which would be confusing.

@JukkaL
Copy link
Collaborator

JukkaL commented Feb 28, 2017

It seems likely that mypy will support structural subtyping before it supports registering ABCs. Structural subtyping/protocols can be used for a similar effect, but that should be easier to implement. See python/typing#11 for more information.

@ddfisher
Copy link
Collaborator

Alternatively, instead of doing something very complicated to make sure A.register(B) works everywhere, just make A.register(B) a type error in modules that don't define either A or B, because you really shouldn't be doing that. (And it's pretty common to disallow such things, see for example Rust's rules against orphaned impls.) Solution 3 also seems potentially reasonable. Spooky action at a distance rules out all the other potential solutions IMO.

@pkch
Copy link
Contributor

pkch commented Apr 21, 2017

I just ran into a use case where support for A.register(B) would be the only way to make things work:

from typing import *
class A('Iterable[A]'):
    _a: 'List[A]'

    def __iter__(self) -> 'Iterator[A]':
        return iter(self._a)

a = A()
reveal_type(a)  # Revealed type is 'test.A'
for x in a:
    reveal_type(x)  # Revealed type is 'test.A*'

mypy is happy with this - surprisingly given that we don't support recursive types yet (#731). Of course, python runtime isn't too happy that we inherit from a string literal.

The basic problem is that we hacked the class inheritance syntax to do typing-related stuff. But this means we are limited in what we can do - and in particular, we can't use forward declarations there (as well as other constructs that would mess up inheritance; at present, I'm not aware of any specific situations where this comes up).

Since forwarding syntax is already set in stone, the only solution I see is to make this work:

class A:
    _a: 'List[A]'
    def __iter__(self) -> 'Iterator[A]':
        return iter(self._a)
Iterable[A].register(A)```

It looks a little ugly, but at least it's logical.

True, this particular use case will disappear once Protocols (#3132) is merged. But it won't fully solve the problem because sometimes the user might want to rely on nominal rather than structural typing.

Incidentally, this situation already appears in builtins.pyi:

class str(Sequence[str]):
    # ...

It works there because it's a stub, not the original source code, so python runtime doesn't need to look at it.

(Btw, we should probably make mypy catch inheritance from a string literal before the runtime complains.)

@ilevkivskyi
Copy link
Member

@pkch

class A(Iterable['A']):
    ...

works both in mypy and at runtime.

Btw, what are the cases where you think people will want nominal checks instead of protocols?

@pkch
Copy link
Contributor

pkch commented Apr 21, 2017

Duh 😆

Nominal checks: if the method naming is too generic (validate, save, etc.) so the user has (or is afraid to have) a bunch of unrelated classes that would look compatible based on purely structural checks.

@mensinda
Copy link

Any news/updates on this? I would have really liked support for .register() in mesonbuild/meson#8885.

@ktbarrett
Copy link

ktbarrett commented Dec 20, 2021

@ilevkivskyi

Btw, what are the cases where you think people will want nominal checks instead of protocols?

Numeric tower inter-operating with the builtin types requires nominal typing and requires this functionality. People wanting to create nominal subtypes that do not want inherit their parent's implementation also requires this functionality. I have a need for the second reason.

class Logic:
    """4-value logic type as seen in Verilog and VHDL: 0, 1, X, and Z"""

class Bit(Logic):
    """2-value logic type: 0 and 1"""

class LogicArray(Sequence[Logic]):
    """Array of Logics that constructs literals or subtypes to enforce the invariant"""

class Sfixed(Sequence[Bit]):
    """Two's complement arbitrarily-bounded fixed point number as seen in VHDL"""

# can't inherit from LogicArray since the generic base class needs to be re-parameterized
# don't want to anyway since Sfixed is implemented with an integer since that's a whole lot faster than a list of user-defined types
LogicArray.register(Sfixed)

Of course the real issue here is that inheritance is a sucky mechanism since it violates separation of concerns. I'd much rather say "Sfixed is a subtype of LogicArray without implementation inheritance". Even if that concern is addressed ABC.register would still be necessary since there is occasionally a need to add supertypes post-hoc. But of course ABC.register is a bad mechanism to achieve that since it totally destroys modularity (similarly to what is stated here #11727 (comment)).

@pranavrajpal
Copy link
Contributor

Most of the use cases I'm seeing here are for cases where people are using ABCMeta.register on classes they control instead of trying to register externally-defined classes (usually because they want to register a class with an ABC without implementing everything). It would probably be easier to support that case than handling full support for ABCMeta.register.

Special casing uses of register as a class decorator would most likely be the easiest to implement since that automatically ensures that a class is registered as soon as it's defined and we already have some code for special casing decorators. We'd then probably:

  • Emit an error if the class didn't implement all the attributes of the ABC being registered (but we'd only do this when the class is being registered, not when the class is instantiated)
  • Treat the class as if that ABC is a base class, even if that earlier unimplemented attributes check failed

@NeilGirdhar
Copy link
Contributor

Special casing uses of register as a class decorator would most likely be the easiest to implement

Isn't that the least important case since anyone who wants to support this can simply make the ABC a base class?

The biggest use case, I think, for this feature would be the numeric tower working as intended.

@pranavrajpal
Copy link
Contributor

Isn't that the least important case since anyone who wants to support this can simply make the ABC a base class?

The important difference between using register and making the ABC a base class is when an error about the class not properly implementing all abstract methods gets emitted. If the ABC is a base class, then that happens when the class is actually instantiated, instead of when you initially subclass the ABC, because mypy has to assume that you might be creating an ABC which would implement those methods later in a subclass. However, if you use register, we can emit the error at that register call itself because the registered class doesn't become an ABC, which means that there's no real reason to register a class with an ABC and then later implement any unimplemented attributes instead of just registering the final class when it has all the methods implemented.

The biggest use case, I think, for this feature would be the numeric tower working as intended.

My understanding of #3186 is that the builtin number types don't properly implement the numeric tower types that they claim to implement (like in python/typeshed#3195 (comment)), and runtime works around that by registering the classes with the correct ABCs.

In that context, having those builtin number types directly subclass the relevant ABCs isn't a realistic option because that would most likely cause errors about unimplemented methods everywhere that those number types are used, even though essentially all of that code is safe.

Using register in the way that I mentioned, however, would be an option because we'd emit an error only once when the class was first registered with the ABC. That means that typeshed could register each builtin number type with the correct ABC by adding register as a decorator, type: ignore the error generated, and anyone who wants to use those builtin number types use them without getting an error.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jul 9, 2022

the builtin number types don't properly implement the numeric tower types that they claim to implement (like in python/typeshed#3195 (comment)),

Is this still true? float.__ceil__ exists in Python 3.10. Are there any other examples?

t there's no real reason to register a class with an ABC and then later implement any unimplemented attributes instead of just registering the final class when it has all the methods implemented.

Are you sure about this? You maybe defining a class hierarchy and want to define an ABC, which you will then have your final classes inherit from it.

Using register in the way that I mentioned, however, would be an option because we'd emit an error only once when the class was first registered with the ABC.

So you want to change the semantics of register so that it does the checks that happen at class instantiation time? Won't that break anyone's code who's using register to define ABCs?

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jul 9, 2022

@ktbarrett

Of course the real issue here is that inheritance is a sucky mechanism since it violates separation of concerns. I'd much rather say "Sfixed is a subtype of LogicArray without implementation inheritance".

I think you can just add in the appropriate interface class, and your example will work:

class Logic:
    """4-value logic type as seen in Verilog and VHDL: 0, 1, X, and Z"""

class Bit(Logic):
    """2-value logic type: 0 and 1"""

class LogicArrayInterface(Sequence[Logic]):  ...  # define any members you want to share.

class LogicArray(LogicArrayInterface):
    """Array of Logics that constructs literals or subtypes to enforce the invariant"""

class Sfixed(LogicArrayInterface, Sequence[Bit]):
    """Two's complement arbitrarily-bounded fixed point number as seen in VHDL"""

No need for ABC registration. Ordinary inheritance works and the concerns are perfectly separated.

@ktbarrett
Copy link

@NeilGirdhar That's not equivalent. Sfixed is a subtype of LogicArray by its nature. Not only is it intuitively not correct, it will create more work.

For example, lets say you have a bitwise OR operation that should work on all Sequence[Logic] types. The example above shows only two classes, in reality there are 10. What's the resulting type of a bitwise OR operation between a LogicArray and an Sfixed? Intuitively, it would be the less derived type, essentially doing implicit upcasting on the more derived type, so it would return a LogicArray. How would you write that?

class LogicArray(LogicArrayInterface):
    """Array of Logics that constructs literals or subtypes to enforce the invariant"""

    # I don't know the concrete type of the other argument so I don't know the correct type to return.
    def __or__(self, other: LogicArrayInterface) -> ???:
        ...

    # This would work, but it requires that Sfixed be a subtype of LogicArray, that is no longer the case.
    # I can get this to work at runtime using `ABC.register`, but it isn't supported statically.
    def __or__(self: Self, other: Self) -> Self:
        ...
    def __ror__(self: Self, other: Self) -> Self:
        ...

    # Oh dear... this does work though.
    @overload
    def __or__(self, other: LogicArray) -> LogicArray: ...
    @overload
    def __or__(self, other: X01ZArray) -> LogicArray: ...
    @overload
    def __or__(self, other: BitArray) -> LogicArray: ...
    @overload
    def __or__(self, other: Signed) -> LogicArray: ...
    @overload
    def __or__(self, other: Unsigned) -> LogicArray: ...
    @overload
    def __or__(self, other: Sfixed) -> LogicArray: ...
    @overload
    def __or__(self, other: Ufixed) -> LogicArray: ...
    @overload
    def __or__(self, other: Float) -> LogicArray: ...
    # "let's add another type!"
    # *audible groans*
    # *someone posts something about the expression problem*

Also, there's a problem in Sfixed with the inheritance hierarchy you have set, Sequence[Logic] comes before Sequence[Bit], so those methods will not be correct. If you swap the order of the classes in the inheritance list, you get an ordering issue. You've found another issue with the type system I have also seen playing around with this stuff, where you can't "re-bind" generic classes in subtypes.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jul 9, 2022

For example, lets say you have a bitwise OR operation that should work on all Sequence[Logic] types. The example above shows only two classes, in reality there are 10. What's the resulting type of a bitwise OR operation between a LogicArray and an Sfixed? Intuitively, it would be the less derived type, essentially doing implicit upcasting on the more derived type, so it would return a LogicArray. How would you write that?

I see now. Your problem is analogous to numpy arrays, which are generic on the "dtype". Essentially, LogicArray is like numpy.ndarray and should be generic. Something like,

X = TypeVar('X', bound=Logic)
class LogicArray(Sequence[X], Generic[X]):
    @overload
    def __or__(self, other: Self) -> Self:
        ...
    @overload
    def __or__(self, other: LogicArray[Any]) -> LogicArray[Any]:
        ...

If you need more clever type promotion, then I'd look at how numpy did it.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jul 17, 2022

Summarizing the state of the world + conversation here:

Jukka's comment here still very much applies: #2922 (comment). There is not a great way to support register performantly

If you control your type, you can simply inherit from the ABC you want:

class MyInt(numbers.Integral): ...

If you can't actually inherit at runtime, you can lie to the type checker:

if TYPE_CHECKING:
    MyIntBase = numbers.Integral
else:
    MyIntBase = object

class MyInt(MyIntBase): ...

If you want mypy to check that your MyInt is actually a conformant implementation, you can simply instantiate it:

class MyInt(numbers.Integral): ...

if TYPE_CHECKING:
    _ = MyInt()

Curious if that covers all the use cases where you control your type, or if there's additional functionality that register as a class decorator provides (beyond ergonomics).

@ktbarrett
Copy link

ktbarrett commented Jul 18, 2022

@NeilGirdhar

LogicArray is like numpy.ndarray and should be generic

That will also not work, unfortunately. Subtypes have more operations on them. For example, Sfixed + int is defined, LogicArray + int is not. My situation is exactly the same as Python's numeric tower and number types (in fact, if I cared I could integrate my types into the numeric tower). If there was a workaround beyond hacking it into type system like all type checkers currently do with int and float, or adding support for ABC.register no one has figured that out yet.

@hauntsaninja
If one of Jukka's solutions is on the table, it's fairly obvious that 3 is the safest choice and would not have an appreciable effect on performance. The type checker being more strict than runtime already has precedent in typeshed.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jul 18, 2022

If one of Jukka's solutions is on the table, it's fairly obvious that 3 is the safest choice and would not have an appreciable effect on performance.

Totally agree with shooting for number 3.

That will also not work, unfortunately. Subtypes have more operations on them. For example, Sfixed + int is defined, LogicArray + int is not. My situation is exactly the same as Python's numeric tower and number types

Yes, okay, that makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants