Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user to customize how an attribute is compared (#435) #627

Merged
merged 21 commits into from
Feb 22, 2021
Merged

Allow user to customize how an attribute is compared (#435) #627

merged 21 commits into from
Feb 22, 2021

Conversation

botant
Copy link
Contributor

@botant botant commented Feb 22, 2020

Apologies for resubmitting a PR. I had made a mess in my fork and this was a side-effect.

The idea of this PR is to add a comparator argument to attr.ib( ), in which the user could provide a simple class implementing equality and ordering methods to suit their needs.

  • Added tests for changed code.
  • New features have been added to our Hypothesis testing strategy.
  • Changes or additions to public APIs are reflected in our type stubs (files ending in .pyi).
    • ...and used in the stub test file tests/typing_example.py.
  • Updated documentation for changed code.
    • New functions/classes have to be added to docs/api.rst by hand.
    • Changes to the signature of @attr.s() have to be added by hand too.
    • Changed/added classes/methods/functions have appropriate versionadded, versionchanged, or deprecated directives. Find the appropriate next version in our __init__.py file.
  • Documentation in .rst files is written using semantic newlines.
  • Changes (and possible deprecations) have news fragments in changelog.d.

@botant
Copy link
Contributor Author

botant commented Feb 22, 2020

The name comparator is not great but we're struggling to come up with an alternative.
Other ideas were cmpspec, cmp_class, compare...
Any suggestions?

@hynek
Copy link
Member

hynek commented Mar 7, 2020

Heyyy. :)

Thanks for taking this on and sorry for the long time waiting.

As a quick idea, we do have the precedence of passing a callable:

In [1]: import attr

In [2]: @attr.s
   ...: class C:
   ...:     x = attr.ib(repr=lambda v: f">>> { v } <<<")
   ...:

In [3]: C(42)
Out[3]: C(x=>>> 42 <<<)

So I think we should stay consistent and not introduce a new argument.

Does anything speak against something like this:

from typing import Any

import abc

import attr


class Equality(abc.ABC):
    @abc.abstractmethod
    def eq(self, o1: Any, o2: Any):
        pass

    def ne(self, o1: Any, o2: Any):
        return not self.eq(o1, o2)


class Order(abc.ABC):
    @abc.abstractmethod
    def lt(self, o1: Any, o2: Any):
        pass

    def le(self, o1: Any, o2: Any):
        if o1 == o2:
            return True

        return self.lt(o1, o2)

    # ...


class NumpyEquality(Equality):
    def eq(self, o1: Any, o2: Any):
        if type(o1) is not type(o2):
            return NotImplemented

        return (o1 == o2).all()


@attr.s
class C:
    x = attr.ib(eq=NumpyEquality)

?


Other than that: cmp is dead and shouldn't appear anywhere in new APIs.

@botant
Copy link
Contributor Author

botant commented Mar 7, 2020

I think that passing custom "comparators" via eq and order works and is cleaner than adding a 4th equality-related argument (in addition to cmp, eq and order). See new comment below.

As it is, we have not used cmp in the new API but added a new argument comparator.

About the snippet you suggest above, I'm not sure how we would integrate this with the tuple comparison utilized in _make_eq:

class NumpyEquality(Equality):
    def eq(self, o1: Any, o2: Any):
        if type(o1) is not type(o2):
            return NotImplemented
        return (o1 == o2).all()

We took a slightly different approach: the comparison class wraps the attribute when we build the tuple:

        for a in attrs:
            lines.append("        self.%s," % (a.name,))
            others.append("        other.%s," % (a.name,))
            if a.comparator:
                cmp_name = "_%s_comparator" % (a.name,)
                # Add the cmp class to the global namespace of the evaluated
                # function, since local does not work in all python versions.
                global_vars[cmp_name] = a.comparator
                lines.append("        %s(self.%s)," % (cmp_name, a.name,))
                others.append("        %s(other.%s)," % (cmp_name, a.name,))
            else:
                lines.append("        self.%s," % (a.name,))
                others.append("        other.%s," % (a.name,))

Another question we have to answer is what kind of comparison functions or classes we provide off-the-shelf, and whether we enforce some sort of hierarchy or protocol. Comparing complex objects such as numpy arrays or pandas dataframes is a bit domain/application-specific, and I'm concerned that nothing we provide will be generic enough.

With that in mind, we have been experimenting (here https://github.com/botant/attrs/pull/4) with a comparators module that (loosely) follows the pattern of validators. Two functions have been implemented: using_function (uses a function to compare values) and using_key (compares values after calling the key function, a la sort).

Usage would look like this:

@attr.s
class C:
    # case insensitive
    name = attr.ib(eq=attr.comparators.using_key(lambda x: x.lower()))
    # absolute value
    value = attr.ib(eq=attr.comparators.using_key(abs))
    # numpy array
    data = attr.ib(eq=attr.comparators.using_function(np.array_equal))

What do you think?

Merge attrs/master onto fork
@botant
Copy link
Contributor Author

botant commented Mar 7, 2020

@hynek I'm having second thoughts about reusing eq and order. Whatever way you choose to compare an attribute, it is likely that equality and ordering operators will be applied in a consistent manner - absolute value, case insensitive, all, any, etc., which means specifying eq and order in a redundant way.

I'm trying to avoid something like this:

@attr.s
class C:
    x = attr.ib(eq=abs, order=abs)
    y = attr.ib(eq=lambda s: s.lower(), order=lambda s: s.lower())

I would suggest we keep eq and order as they are today - booleans that tell the library whether an attribute should be included in the operators or not. Introduce a new argument comparator (or something better named) to specify how the comparison should be done.

@wsanchez
Copy link

wsanchez commented Mar 7, 2020

There are two approaches being discussed here. @hynek is proposing an API in which you define comparison, and @botant points our that you may have to express the same thing twice, which might be solved if the Order ABC is a subclass of Equality, such that order=abs implies eq=abs?

We could have a helper function that build an Equality or Order for you:

def equality(eq=object.__eq__):
    class ConcreteEquality(Equality):
        def eq(self, o1, o2):
            return eq(o1, o2)
    return ConcreteEquality

def order(eq=object.__eq__, lt=object.__lt__):
    class ConcreteOrder(Order):
        def eq(self, o1, o2):
            return eq(o1, o2)
        def lt(self, o1, o2):
            return lt(o1, o2)
    return ConcreteOrder

x = attr.ib(eq=equality(eq=lambda a, b: a.lower() == b.lower()))
y = attr.ib(order=order(lt=lambda a, b: a.value < b.value, eq=lambda a, b: a.value == b.value)

Then @botant is proposing a new argument called comparator, which I think my be the wrong term, since I think it's actually a normalization function to be use when comparing things. I think you can do that on top of the the comparison API idea:

def normalized_equality(f):
    return equality(eq=lambda a, b: f(a) == f(b))

def normalized_order(f):
    return order(eq=lambda a, b: f(a) == f(b), lt=lambda a, b: f(a) < f(b))

x = attr.ib(eq=normalized_equality(lambda a: a.lower())
y = attr.ib(order=normalized_order(lambda a: a.value)

@botant
Copy link
Contributor Author

botant commented Mar 7, 2020

The API we're proposing is a bit different.

The two use cases we're trying to address are:

  • Using a normalization function such as abs or lower.
  • Using a function to compare objects such as numpy.array_equal or pandas.frame_equal.

In order to do this and keep the current _make_eq implementation, we added an argument called comparator to attrib (please bear with the horrible comparator name for now), to which you pass a class like this:

@attr.s
class MyComparator:
   value = attr.ib()
   def __eq__ ...
   def __lt__ ...

This specification fits _make_eq nicely because MyComparator wraps the attribute value and implements eq, lt, etc itself, and therefore can be added to the attribute tuple and compared like any other attribute value. No other assumption is made about MyComparator.

To facilitate the use of this API, we provide two functions that cover the use cases described above:

  • using_key(key: Callable[[Any], Any]) which normalizes the attribute value using the function key:
def __eq__(self, other):
   return self.key(self.value) == self.key(other.value)
  • using_function(func: Callable[[Any, Any], bool]) which caters for the numpy/pandas use case:
def __eq__(self, other):
   return self.func(self.value, other.value)

Example:

@attr.s
class C:
    # case insensitive
    name = attr.ib(compare=attr.compare.using_key(lambda x: x.lower()))
    # absolute value
    value = attr.ib(compare=attr.compare.using_key(abs))
    # numpy array
    data = attr.ib(compare=attr.compare.using_function(np.array_equal))

In a nutshell, the API can be described like this:

  • eq and order determine whether the attribute is included in the attrs-generated eq and lt dunders.
  • comparator determines how that comparison is done.

Things we wanted achieved:

  • Keep the API around eq and order intact.
  • Easy integration with current code, especially _make_eq.
  • Simple specification for Comparator class, and avoid inheritance, abstract methods or protocols.
  • Provide viable solutions for the use cases mentioned above, and hopefully other use cases as well.

Things we didn't achieve:

  • Find a good name for the new argument; compare is probably better than comparator.

Hope this makes our suggestion a bit clearer.

Thanks!

@hynek
Copy link
Member

hynek commented Mar 8, 2020

A few quick questions:

           # Add the cmp class to the global namespace of the evaluated
           # function, since local does not work in all python versions.

What Python versions are those? I'm happy to make this feature Python 3-only.


I understand your concern about consistency between eq and order, and since you seem to come from the scientific community that makes a lot of sense, however I consider having order=True by default a mistake in hindsight and plan on making it False in import attrs. Most people don't need it, so it's eating up precious resources (for example, I don't think I have ever used it).

I think this is something that could be solved either with a warning or some kind of syntactic sugar/helper class that combines both? But it shouldn't be mushed together again by default.

Maybe we could keep cmp for that but moving it from a prime API to syntactic sugar.


I agree that we couldn't ever give people what everybody needs. I could live with not giving them anything at all, as long as we give them a good API to build it themselves. :). I have only glanced at the APIs but I think it has legs!

It just needs to be split up with an helper. How would you (and this includes @wsanchez) feel about this:

@attr.s
class C:
    # Only change equality
    x = attr.ib(eq=attr.eq.using_key(lambda x: x.lower()))
    # Only change order
    y = attr.ib(order=attr.order.using_key(abs))
    # Change both
    data = attr.ib(cmp=attr.cmp.using_function(np.array_equal))

This would give me the flexibility I want and you'd get safety/comfort (cmp would become an comparator-only keyword in new APIs).

@botant
Copy link
Contributor Author

botant commented Mar 8, 2020

           # Add the cmp class to the global namespace of the evaluated
           # function, since local does not work in all python versions.

This should work on all versions from 2.7 to 3.8. We did try using locals but that didn't work in 2.7, thus the comment.


Yes! I explicitly set order=False on almost all my classes. That would be a most welcome change, but it also seems a bit dangerous.

How would you achieve that change in behaviour? A transition period where calls to __lt__ raise a deprecation warning if order=True was implicitly assumed?

What do you mean by:

making it False in import attrs?


This looks like a nice compromise between consistency and flexibility.

    x = attr.ib(eq=attr.eq.using_key(lambda x: x.lower()))
    y = attr.ib(order=attr.order.using_key(abs))
    data = attr.ib(cmp=attr.cmp.using_function(np.array_equal))

My only concern is that we're talking about potentially breaking changes, even though they would suit me. As a first-time contributor, I'd never have suggested that myself.

I'm looking forward to seeing how an API change like this is managed.

@wsanchez
Copy link

wsanchez commented Mar 8, 2020

@hynek I'm good with that lat proposal from you, though I find the terms using_key and using_function rather non-useful.

That is, if using_key is a function that normalizes the things to compare, and using_function is a function that does the comparison, then those terms don't imply either of those things.

So I'd suggest using_normalizer and using_comparator, or something like that.

@botant
Copy link
Contributor Author

botant commented Mar 9, 2020

@wsanchez using_key was a reference to sort's key argument:

>>> sorted("This is a test string from Andrew".split(), key=str.lower)
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']

And using_function was, well, the first I could think of :-)

I agree the names I chose aren't great, but the truth is that it will be hard to pick intuitive and self-explanatory names given that this is a new API. Unfortunately we can't reuse a standard library name like in validators.isinstance...

@botant
Copy link
Contributor Author

botant commented Mar 9, 2020

@hynek @wsanchez what are the next steps then? Should I update the code to fit the API you described?

@wsanchez
Copy link

wsanchez commented Mar 9, 2020

@botant I think so. I'm not going to object loudly based on the names, so the priority I'd suggest is what @hynek described above, though I do think that "normalizer" and "comparator" are much better than "key" and "function", as these are terms we can use in the documentation and it would be good for the API to match the terms we use in the docs.

Better terms are welcome, but I don't think "key" and "function" make much sense.

@botant
Copy link
Contributor Author

botant commented Mar 9, 2020 via email

@hynek
Copy link
Member

hynek commented Mar 10, 2020

No, wait don’t. :)

Normalizer and comparator may be technically correct, but it goes too far into characteristic territory that is easy to forget and hard to type. And key is totally a Pythonic thing, permeating throughout the stdlib so I think it’s good to keep it. 🤔

We can still bikeshed over the names later (I would prefer with_ instead of using_ for example) but let’s focus on everything else first. I hope I’ll find the time this week.

@wsanchez
Copy link

@hynek: characteristic wasn't all bad. ;-)

If we want stdlib-flavors for names, I'd say use key and cmp, eg: with_key and with_cmp.

I'll put down the paint bucket now.

@hynek
Copy link
Member

hynek commented Mar 10, 2020

Can you give me quick examples of cmp so I don't have to search myself? 😇

@botant
Copy link
Contributor Author

botant commented Mar 10, 2020 via email

@wsanchez
Copy link

Can you give me quick examples of cmp so I don't have to search myself? 😇

https://docs.python.org/2.7/library/functions.html#cmp

@hynek
Copy link
Member

hynek commented Mar 10, 2020

lol: https://docs.python.org/3.8/library/functools.html#functools.cmp_to_key

Yeah I think those two are good names.

@hynek
Copy link
Member

hynek commented Mar 29, 2020

@botant just to be clear: do you need anything from us or are you good

@botant
Copy link
Contributor Author

botant commented Mar 29, 2020 via email

@hynek
Copy link
Member

hynek commented Mar 29, 2020

No problem, just checking whether you're blocked unbeknownst to me. Stay healthy!

Merge python-attrs/master onto botant/master
@botant
Copy link
Contributor Author

botant commented Apr 3, 2020

Hello @hynek and @wsanchez, hope you are well.

Now that I've started working on this, I'm worried that the proposed API may lead to bugs very difficult to spot and debug.

Consider the following code:

@attr.s
class Test:
    x = attr.ib(eq=attr.eq.using_key(str.lower))
    y = attr.ib(order=attr.eq.using_key(abs))

That is clearly wrong. Surely the user wanted one of the following:

    y = attr.ib(eq=attr.eq.using_key(abs))

or

    y = attr.ib(order=attr.order.using_key(abs))

My concern is that a bug like this might be invisible to unit tests unless the user is actively testing for that and might show up at run time only.

I see a few options:

  1. Continue as planned.

  2. Add validation around what can be passed to eq and order. Certainly doable, but it does add a layer of complexity to the API.

  3. Change the API slightly. How about this:

@attr.s
class Test:
    x = attr.ib(eq=True, cmp=attr.cmp.using_key(str.lower))
    y = attr.ib(order=True, cmp=attr.cmp.using_key(abs))

In this case, we would preserve the current meaning of eq and order as flags that determine whether an attribute is used in the equality and comparison dunders, respectively. We would "upcycle" cmp to allow the users to specify how comparisons should be done. This is also a bit easier to explain.

Looking forward to hearing what you think.

@botant
Copy link
Contributor Author

botant commented Apr 12, 2020

Hi @hynek,

Whether we go with this:

@attr.s
class Test:
    x = attr.ib(eq=True, cmp=attr.cmp.using_key(str.lower))
    y = attr.ib(order=True, cmp=attr.cmp.using_key(abs))

Or this:

@attr.s
class Test:
    x = attr.ib(eq=attr.eq.using_key(str.lower))
    y = attr.ib(order=attr.order.using_key(abs))

What are we doing about cmp? Are we breaking completely with the previous API where the user could specify a bool, or are we maintaining backwards compatibility?

@Julian
Copy link
Member

Julian commented May 21, 2020

I am late to the party, and have clearly thought about this less than others earlier, so apologies if this comment is completely unhelpful --

But I don't see why things ended up needing helpers (i.e. why eq=attr.cmp.with_key(str.lower) isntead of just eq=str.lower?

I see what the helpers/full API does -- they let someone override everything about equality even beyond the transformation of the attribute, including specifically whether a type check is first performed. Is that really a common enough case to support? (I'm asking a leading question obviously, and I think the answer is "no" and that those cases should just define __eq__ fully manually, but just want to make sure I follow the reasoning).

I also see the back and forth about how cmp and order are really related (i.e. that usually you want to normalize them the same way), but if that's the primary reason for the helpers, I guess my vote if it counts would be for the API in the later comments, but without the helper, i.e. y = attr.ib(order=True, cmp=abs), which triggers "generate an order, use the same normalization as for cmp".

@botant
Copy link
Contributor Author

botant commented May 21, 2020 via email

@Julian
Copy link
Member

Julian commented May 21, 2020

I saw that about numpy too, but that to me instinctively should be supported in the same API -- i.e. just via a key -- so someone should have a array-to-eq wrapper thing that takes a numpy array and instead returns an object whose __eq__ does the right thing (i.e. what any other python object would do for equality).

It's not a common thing outside of the numpy ecosystem, so I'd expect it be supported by allowing numpy things to be wrapped into objects that behave natively, and then you can just use the "normal" way. That may be my numpy-is-weird bias showing though in full disclosure :).

@botant
Copy link
Contributor Author

botant commented May 21, 2020 via email

@ali5h
Copy link

ali5h commented Jan 29, 2021

this is exactly what we need, just wanted to say thanks

Copy link
Member

@hynek hynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this over the liiiineeee!

src/attr/_make.py Outdated Show resolved Hide resolved
src/attr/_make.py Show resolved Hide resolved
src/attr/_make.py Show resolved Hide resolved
src/attr/_make.py Outdated Show resolved Hide resolved
src/attr/_make.py Outdated Show resolved Hide resolved

# cmp takes precedence due to bw-compatibility.
if cmp is not None:
warnings.warn(_CMP_DEPRECATION, DeprecationWarning, stacklevel=3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. yes
  2. I think we do? It's still a derived attribute that can be used as syntactic sugar, to set eq and order at once – agreed?

if a.eq_key:
cmp_name = "_%s_key" % (a.name,)
# Add the cmp class to the global namespace of the evaluated
# function, since local does not work in all python versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer to make this Python 3-only then. pip just dropped Python 2, we don't need to bend backward anymore when it comes to new features.

Unless the code becomes even more complicated? I guess I'll leave this to you, but I'm -0 on global namespace shenanigans.

@hynek
Copy link
Member

hynek commented Feb 11, 2021

Ping? 😇

@botant
Copy link
Contributor Author

botant commented Feb 11, 2021 via email

@hynek
Copy link
Member

hynek commented Feb 11, 2021

No need for apologies! Just making sure we’re not both waiting for each other. :)

@botant
Copy link
Contributor Author

botant commented Feb 12, 2021

On cmp: not sure what you mean. :-)

@botant
Copy link
Contributor Author

botant commented Feb 12, 2021

On global: I was under the impression that passing the key function as global was required for PY2, but local doesn't work for me in Python 3.8. Here is some code to exemplify the issue:

Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def func():
...     return 2
... 
>>> x = eval('lambda: f()', {'f': func}, {})
>>> x()
2
>>> x = eval('lambda: f()', {}, {'f': func})
>>> x()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <lambda>
NameError: name 'f' is not defined
>>> x = eval('lambda: f()', {'f': func}, {})
>>> x()
2
>>> 

Do you know if there is an alternative? Can you confirm on your side?

PS: _make_attr_tuple_class and _make_init also use eval + global to achieve similar things.

@hynek
Copy link
Member

hynek commented Feb 18, 2021

On cmp: not sure what you mean. :-)

What part? 😅

On global: I was under the impression that passing the key function as global was required for PY2, but local doesn't work for me in Python 3.8.

Just leave it then. We can look if we can improve the code later (narrator: they won't).

@hynek
Copy link
Member

hynek commented Feb 19, 2021

Hey, I'm sorry but #760 added some conflicts. But I suspect it might actually simplify some of your code around eval?

@botant
Copy link
Contributor Author

botant commented Feb 19, 2021

On cmp, I didn't understand what you meant by this:

It's still a derived attribute that can be used as syntactic sugar, to set eq and order at once – agreed?

On globals: I did a bit more digging and I don't think it will ever work with locals, because __eq__ is not a closure. Either the key functions are in __eq__.__globals__, or __eq__ doesn't know about them. I can see two alternatives:

  • Turn __eq__ into a closure.
  • Turn __eq__ into a callable object.

I think we're better off leaving it as is, after cleaning my nonsensical comment. I can only imagine that I had inverted the eval arguments when I concluded that locals worked.

@hynek
Copy link
Member

hynek commented Feb 19, 2021

Ehh I'm not sure either anymore, but removing the property would break backward compatibility, so we definitely still need it. Or am I misunderstanding?

@botant
Copy link
Contributor Author

botant commented Feb 19, 2021

Ehh I'm not sure either anymore, but removing the property would break backward compatibility, so we definitely still need it. Or am I misunderstanding?

Yes, I think we do need it. Or at least I don't want to be the one to take it off. 😃

@botant
Copy link
Contributor Author

botant commented Feb 19, 2021

Hey, I'm sorry but #760 added some conflicts. But I suspect it might actually simplify some of your code around eval?

Merged!

Copy link
Member

@hynek hynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few minor things that need to be done (a bit too keen on resolving! :D), but it's not worth to ping-pong them so I'll do it myself real quick.

So let's get this merged, literally on its first birthday! :D Thanks for staying on the ball!

@hynek hynek merged commit aefdb11 into python-attrs:master Feb 22, 2021
@hynek
Copy link
Member

hynek commented Feb 22, 2021

Thank you everyone! It took a year to merge, but I think we've found a great solution!

Now ~someone~ has to write some narrative docs. 🤪

hynek added a commit that referenced this pull request Feb 22, 2021
@botant
Copy link
Contributor Author

botant commented Feb 22, 2021 via email

@hynek
Copy link
Member

hynek commented Feb 22, 2021

Wait, I’ll have to think about it how to integrate it first. I’ll ping you on the PR. :)

@botant
Copy link
Contributor Author

botant commented Feb 22, 2021 via email

@hynek hynek mentioned this pull request Feb 25, 2021
@botant botant deleted the comparator branch February 28, 2021 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants