Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream integration of some new features #78

Closed
12 tasks
francesco-ballarin opened this issue Mar 30, 2018 · 18 comments
Closed
12 tasks

Upstream integration of some new features #78

francesco-ballarin opened this issue Mar 30, 2018 · 18 comments

Comments

@francesco-ballarin
Copy link
Contributor

Dear multipledispatch developers,
I have done some customization to multipledispatch to fit the need of one of my libraries.
While some of the changes are very specific to my needs, I think that most of the changes might be interesting to commit upstream.

The patched version of @dispatch decorator is available at this link, as well as some unit tests.

I would like to have your suggestions on which (if any) of my changes you would deem interesting upstream. I try to summarize them here:

  • new features:
    • do not store Dispatchers in a dict, but rather in a module. Note that this BREAKS backward compatibility. The motivation for doing this are strongly related to my specific needs (there is some summary in the multiline comment at line 366, but nonetheless I think that it might be a more natural way to store Dispatchers and being able to import them. There are a few tests on this: lines 366, 394.

    • allow @dispatch to be applied to classes. Tests at line 500, 647. This feature depends on the backward incompatible change concerning Dispatcher storage inside a module (rather than a dict).

    • allow dispatched functions to be replaced through @dispatch kwargs replaces=... and replaces_if=... . Implementation from line 79. Test at line 459, 669, 715. This feature depends on the backward incompatible change concerning Dispatcher storage inside a module (rather than a dict).

    • implementation of dispatching based on subtypes for dict, lists, tuple etc. This is already discussed in issue Dispatching to list subtypes #72, and there is also a WIP pull request WIP: Experiment to use pytypes to add support for python type hints #69. My implementation is very different from the two aforementioned approaches, and relies on custom defined classes named list_of(TYPES), tuple_of(TYPES), set_of(TYPES), dict_of(TYPES_FROM, TYPES_TO), where TYPES could be a tuple of types. I had laid out this part before learning about pytypes, so I am open to suggestions on replacing my custom made classes with pytypes. However, I personally do not like the square brackets in their notations: when declaring a @dispatch to take either int or floats one would use @dispatch((int, float)), where "OR" is denoted by a parenthesis, while for pytypes object on would need to write @dispatch(Tuple[int, float]), where "OR" is denoted by a square bracket. The notation of my implementation is instead list_of((int, float)). More implementation detail:

      • the definition of custom *_of classes goes from line 428 to line 522
      • I have a strong assumption in my code that, whenever I dispatch on iterable, I will always know the subtype. For this reason, I have disabled the plain iterables in a validation stage at line 524. This assumption might be too strong for general purposes, and is definitely NOT backward compatible.
      • when calling the Dispatcher iterable input arguments are converted to the corresponding *_of types in the auxiliary function at line 553. In a similar way tuple expansion has been patched at line 599
      • the implementation of conflict operators for *_of classes is custom made from line 635 to line 729

      There are a few tests on this part:

      • adaptation of upstream tests at lines 128, 144, 158
      • custom conflict operators for *_of classes are tested at lines 224, 286, 342
      • validation that standard iterable types have been disabled is on the test at line 552
      • additional few tests related to class initialization at lines 564, 572, 596
      • tests that subtypes provided to *_of properly account for inheritance at lines 731, 754, 776, 798, 822, 845, 866, 888, 910, 934
      • one additional minor test at line 953
    • have MethodDispatcher account for inheritance. In the current master version, dispatched methods are not inherited. The work from line 192 to line 307 allows dispatched class methods to be inherited and possibly overridden. Tests for this are at line 967, 993, 1002, 1040.

    • allow lambda functions to be passed instead of hardcoded types. The typical use case for which I needed this was in the definition of arithmetic operators, e.g.

      class F(object):
          def __init__(self, arg):
              self.arg = arg
          
          @dispatch(F)
          def __mul__(self, other):
              return self.arg*other.arg
      

      Python interpreter fails on @dispatch(F) because F is not fully defined yet. My proposed solution has thus been to use

      class F(object):
          def __init__(self, arg):
              self.arg = arg
          
          @dispatch(lambda cls: cls)
          def __mul__(self, other):
              return self.arg*other.arg
      

      Note that in this case the evaluation of the signature is delayed to the first time the dispatched method is called.
      Tests for this feature are:

      • basic unit test at 1070
      • a test to show a caveat of the delayed signature computation, line 1096
      • compatibility with inheritance: lines 1113, 1137
    • compatibility with optional default None argument. Implementation at line 732, test at line 1193

  • minor changes:
    • slightly changed the implementation for what concerns annotations. There are a few tests on this:
      • adaptation of upstream tests at lines 196, 208
      • new test for function dispatchers at line 416, 424, 439, 530
      • new test on wrong number of annotation being provided, line 1218
    • new overload decorator, which can be called as @overload''' rather than @overload()'' if the signature is provided through annotations. Implementation at line 419, test at line 1169. This is essentially a duplicate of @dispatch, so I might be willing to rename it back to dispatch before pushing upstream.
    • additional minor tests related to calling method from class rather than instance at line
  • minor changes that break backward compatibility, and thus may not be worthy to push upstream:
    • enforce an error rather than a warning in case of ambiguity. See lines 47, 58 for the implementation. See test at line 104.
    • slight customization of the error which is raised in case a signature is not available. See lines 36, 161 for the implementation.

Apologies for the long post, necessary to clearly present all features of the patched version of the library in order to understand from you which features might be of general interest.

ping @mrocklin @hameerabbasi @llllllllll @mariusvniekerk @shoyer who were involved in the aformentioned issues and pull requests.

@hameerabbasi
Copy link

This is definitely some great work. I'm not sure how important backward-compatibility is, but I'll hazard a guess and say "not very important".

In any case, if you could break these features into chunks and send separate PRs I'd be all for it. Monster commits tend to be hard to manage.

@francesco-ballarin
Copy link
Contributor Author

francesco-ballarin commented Mar 30, 2018

Yes, I'll definitely try to split as much as I can. That's one of the reasons for which I want to have feedback of the people involved: if you think that something is not interesting I will skip preparing that PR.
I'll wait to hear more opinions on this.

I forgot to mention two things:

  • the code is compatible with current master
  • the library in which I used this code is python 3 only. I do think that this is compatible with python 2 as well (because when I first started my patched version I was still using python 2), but I haven't tested it with python 2 in a long time.

@hameerabbasi
Copy link

I do think that this is compatible with python 2 as well (because when I first started my patched version I was still using python 2), but I haven't tested it with python 2 in a long time.

Python 2 support is definitely needed. I believe @shoyer planned to use this in XArray if a suitable solution was found, and XArray needs Python 2 support.

However, I'm just a passer-by.

@francesco-ballarin
Copy link
Contributor Author

So, I tried running the tests with python 2 and the only issues were with annotations (python 3 only), nonlocal keyword (which I can easily fix), and a few times with inspect.signature (something similar to that is already used in the master version, so I can look there). Overall, I am confident that I can make the code python 2 compatible.

@llllllllll
Copy link
Collaborator

Thank you for itemizing the changes, it makes it much easier to understand all
the differences!

store dispatched objects in module

I am not sure why you need to store the new dispatches in a module, it wasn't
clear to me from reading this code. Also, I am not sure why dispatching on types
depends on this. Does

dispatch decorator for types

This seems like a nice to have, but I think there might be a way to do this in a
backwards compatible way.

replace dispatches

I personally don't think this is a good idea from a maintainability
standpoint. I think that dispatches should be mostly static and defined at
module scope. The only time I dynamically register dispatches is to extend the
arity of generic functions or to defer registration until needed for performance
reasons.

homogeneous collections dispatching

I still don't think dispatching on homogeneous collections require modifications
to multipledispatch itself, it can be written as a separate class which uses
virtual inheritance to resolve in the correct order. An example of this is in
blaze
and is discussed in #72. By relying on issubclass for the partial ordering, we
prevent a lot of weird edge cases that would occur if you mixed the new custom
collection types with regular list and set.

I would like to say again that I don't think that we should rely on pytypes for
things like this because they have their own notion of what the python type
hierarchy is which doesn't align with isinstance. Also, I have doubts that it
will ever be fast enough to be invoked every on dispatch and keep the overhead
of multipledipatch reasonable.

methoddispatcher inheritance

What is broken about the inheritance case now, shouldn't subclasses dispatch
like the superclass by construction? Maybe I am not sure what you meant, but I
see dispatched methods in subclasses:

In [1]: from multipledispatch import dispatch

In [2]: class C:
   ...:     @dispatch(int)
   ...:     def f(self, a):
   ...:         print('int')
   ...:     @dispatch(float)
   ...:     def f(self, a):
   ...:         print('float')
   ...:

In [3]: class D(C):
   ...:     pass
   ...:

In [4]: D.f(1)
int

In [5]: D.f(2.0)
float

allow lambdas for types

This is interesting, but I am not sure how useful it is. If this is a method
dispatching on its own type, why not just use an isinstance check inside the
method? Who else is going to register a dispatch?

strip None from tail

This could cause a change in behavior because it affects how existing functions
are dispatched. I don't know if this feature is important enough to break
existing code.

ambiguity warning/error

You can already turn this just this warning into an error using a warning
context. I think that you almost certainly do want to fix this, and even support
making this an error by default, but I would want to do a deprecation cycle
first.

custom exception for dispatch failures

This seems fine assuming it is a subclass of NotImplementedError and has the
same message. Any code that did a try/except should still succeed. I vaguely
remember multiple dispatching having something like this in the past,
MDNotImplemented, but it was removed. cc @mrocklin

general backwards compatibility

I'm not sure how important backward-compatibility is, but I'll hazard a guess
and say "not very important".

I think it is very important. I use multipledispatch in multiple productions
services across many repos, a major breaking change would be very
frustrating. This doesn't mean we can't break the interface, but it needs to be
justified.

@francesco-ballarin
Copy link
Contributor Author

francesco-ballarin commented Mar 30, 2018

@llllllllll

I will reply to you in several posts.

Concerning methoddispatcher inheritance: this one does not work with current master

from multipledispatch import dispatch

class C:
    @dispatch(int)
    def f(self, a):
        print('int')
    
    @dispatch(float)
    def f(self, a):
        print('float')

class D(C):
    @dispatch(str)
    def f(self, a):
        print('str')
    
d = D()
d.f(1)

@francesco-ballarin
Copy link
Contributor Author

@llllllllll

allow lambdas for types
This is interesting, but I am not sure how useful it is. If this is a method
dispatching on its own type, why not just use an isinstance check inside the
method? Who else is going to register a dispatch?

To be honest I am also heavily using it for actual static class attributes, as in the tests.

However, (even though the example is a bit silly) one could use it like this

class F(object):
    def __init__(self, arg):
        self.arg = arg
    
    @dispatch(lambda cls: cls)
    def __mul__(self, other):
        return self.arg*other.arg

class G1(object):
    pass
    
class G2(object):
    pass

to make sure that G1 objects can only be multiplied by G1 objects, and not by F or G2 objects.
An assert isinstance(..., F) would not catch that, one would need to have an assert isinstance(..., type(self))

@francesco-ballarin
Copy link
Contributor Author

@llllllllll

on ambiguity warning/error: I do agree on error by default, let's wait on other opinions.

@francesco-ballarin
Copy link
Contributor Author

@llllllllll

replace dispatches
I personally don't think this is a good idea from a maintainability
standpoint. I think that dispatches should be mostly static and defined at
module scope. The only time I dynamically register dispatches is to extend the
arity of generic functions or to defer registration until needed for performance
reasons.

OK, I can limit this feature to my code where I need it. I thought this could have been a feature of general interest. I'll stay tuned to hear from anybody else if they might be interested in this.

@francesco-ballarin
Copy link
Contributor Author

francesco-ballarin commented Mar 30, 2018

@llllllllll

custom exception for dispatch failures
This seems fine assuming it is a subclass of NotImplementedError and has the
same message. Any code that did a try/except should still succeed. I vaguely
remember multiple dispatching having something like this in the past,
MDNotImplemented, but it was removed.

It is indeed a subclass of NotImplementedError

@mrocklin
Copy link
Owner

This seems fine assuming it is a subclass of NotImplementedError and has the
same message. Any code that did a try/except should still succeed. I vaguely
remember multiple dispatching having something like this in the past,
MDNotImplemented, but it was removed. cc @mrocklin

I don't recall, but hopefully git history would have something. I suspect that it was just annoying to have a special type. Presumably subclassing would avoid this concern but for some reason, historically, it didn't.

With any change like this I always ask "why is making our own way of doing things necessary?" I think that the default should be not to do this if possible and that any attempt to do it should be accompanied with concrete applications where it is necessary.

@francesco-ballarin
Copy link
Contributor Author

@llllllllll

homogeneous collections dispatching
I still don't think dispatching on homogeneous collections require modifications
to multipledispatch itself, it can be written as a separate class which uses
virtual inheritance to resolve in the correct order. An example of this is in
blaze and is discussed in #72. By relying on issubclass for the partial ordering, we
prevent a lot of weird edge cases that would occur if you mixed the new custom
collection types with regular list and set.

I would like to say again that I don't think that we should rely on pytypes for
things like this because they have their own notion of what the python type
hierarchy is which doesn't align with isinstance. Also, I have doubts that it
will ever be fast enough to be invoked every on dispatch and keep the overhead
of multipledipatch reasonable.

Your point here is that my custom implementation of supercedes() and consistent() functions
could be move instead to __ subclasscheck __? However, one would still need to add
the various list_of, tuple_of etc. (with appropriate __ subclasscheck __) to multipledispatch, right?

@francesco-ballarin
Copy link
Contributor Author

francesco-ballarin commented Mar 30, 2018

@llllllllll

strip None from tail
This could cause a change in behavior because it affects how existing functions
are dispatched. I don't know if this feature is important enough to break
existing code.

I am sorry, I do not get how this could break existing codes. None would get stripped from both
the signature in cache when the dispatch is defined, and from the actual arguments
when the dispatched method is called. Can you send me an example of a case that
would be broken, because I must be missing the key point of your comment. Thanks.

@francesco-ballarin
Copy link
Contributor Author

@llllllllll

store dispatched objects in module
I am not sure why you need to store the new dispatches in a module, it wasn't
clear to me from reading this code. Also, I am not sure why dispatching on types
depends on this. Does

dispatch decorator for types
This seems like a nice to have, but I think there might be a way to do this in a
backwards compatible way.

I'll admit that this was originally driven by my need to programmatically generate modules, so there might be a
way to do this in a backwards compatible way.

I have to point out, however, that (unless I am mistaken)
the original @dispatch decorator (designed for functions) behaves like this:

from multipledispatch import dispatch

@dispatch(int)
def f(arg):
    return arg
    
print(type(f)) # <class 'multipledispatch.dispatcher.Dispatcher'>

This is no good for classes, because the following pseudo-code would not work

from multipledispatch import dispatch

@dispatch(int)
class F(object):
    def __init__(self, arg):
        self.arg = arg
    
print(F) # if this were running as in the case of functions, this would have
         # returned <class 'multipledispatch.dispatcher.Dispatcher'>
         
@dispatch(F) # here I am planning on dispatching w.r.t. the specific F defined at the fourth line,
             # not the Dispatcher object, but I cannot do it, since I have lost the capability to refer to it
def g(obj):
    pass

This is the reason why, for classes, I am requiring to store the Dispatcher somewhere else
(through the mandatory module argument).
The Dispatcher will be stored in the provided module argument, while the module in which F
is defined will continue to store F as if there were no decorators applied to it.
Then, when you need to fetch the Dispatcher, you should import it from the module you provided
to the module argument. Importing was definitely easier if the underlying object in which
Dispatcher are stored were modules, rather than dicts.

I attach a complete, yet simple, example.

example.zip

@francesco-ballarin
Copy link
Contributor Author

@llllllllll

Let me comment a bit more on backwards compatibility for what concerns storing Dispatchers in module.

The obvious compatibility brekage is the name in kwarg and the underlying storage. If you were willing to accept that, these would be the compatibility for each case (class, method, function, with/without kwarg):

  1. dispatch on class, without kwarg:
@dispatch(int)
class F(object):
    def __init__(self, arg):
        self.arg = arg
    
print(F) # this would print Disptacher. The Dispatcher object would be stored in this module.

Right now this case is disabled because I deem it confusing if one is willing to dispatch on F (see second snippet in my previous post). However, for the sake of compatibility with the current interface, I can enable it.

  1. dispatch on class, with kwarg:
@dispatch(int, module=other)
class F(object):
    def __init__(self, arg):
        self.arg = arg
    
print(F) # this would print F. The Dispatcher object would be stored in other.

This is the use case that I have in mind for my second snippet in the previous post. The only compatibility breakage is the obvious one, the kwarg name.

  1. dispatch on class method, without kwarg
class F(object):
    @dispatch(int)
    def f(self, arg):
        self.arg = arg
    
print(F.f) # this would print Disptacher. The Dispatcher object would be stored among the attributes of F.

This is the same behavior as current master.

  1. dispatch on class method, with kwarg. This case does not make any sense, and my code asserts that no one will ever try it.

  2. dispatch on function, without kwarg

@dispatch(int)
def f(self, arg):
   return arg
    
print(f) # this would print Disptacher. The Dispatcher object would be stored in this module.

This is the same behavior as current master.

  1. dispatch on function, with kwarg
@dispatch(int, module=other)
def f(self, arg):
   return arg
    
print(f) # this would print <function f>. The Dispatcher object would be stored in other.

Apart from the obvious compatibility breakage on kwarg name/value, this introduces another incompatibility with current behavior. Indeed, if I remember correctly, in current master print(f) would return Dispatcher , rather than function f. If we were willing to go down the module kwarg road, I would strongly argue in favor of my proposed behavior (even though it introduces another incompatibility) because otherwise cases 2 and 6 would be asymmetric.

@llllllllll
Copy link
Collaborator

You can get the behavior you want for classes by creating the Dispatcher ahead of time and registering types into it; for example:

In [1]: from multipledispatch import Dispatcher

In [2]: literal = Dispatcher('literal')

In [3]: @literal.register(int)
   ...: class IntLiteral:
   ...:     def __init__(self, value):
   ...:         self.value = value
   ...:     def __repr__(self):
   ...:         return f'{self.value} :: int'
   ...:     

In [4]: @literal.register(float)
   ...: class FloatLiteral:
   ...:     def __init__(self, value):
   ...:         self.value = value
   ...:     def __repr__(self):
   ...:         return f'{self.value} :: float'
   ...:     

In [5]: literal(1)
Out[5]: 1 :: int

In [6]: literal(1.5)
Out[6]: 1.5 :: float

In [7]: isinstance(literal(1), IntLiteral)
Out[7]: True

In [8]: isinstance(literal(1.5), FloatLiteral)
Out[8]: True

In this case, literal, the dispatcher object itself, is stored where ever you want. There is no need to mutate another module at the decoration site.

@francesco-ballarin
Copy link
Contributor Author

Thanks @llllllllll for the example.

Please allow me to play devil's advocate. Wouldn't your last statement

There is no need to mutate another module at the decoration site.

hold as well for the namespace kwarg? (replacing the word module with the word dict)

@francesco-ballarin
Copy link
Contributor Author

Closing due to inactivity.

@francesco-ballarin francesco-ballarin closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants