Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching dynamic properties when using slots=True. #164

Closed
AndreLouisCaron opened this issue Mar 6, 2017 · 33 comments
Closed

Caching dynamic properties when using slots=True. #164

AndreLouisCaron opened this issue Mar 6, 2017 · 33 comments

Comments

@AndreLouisCaron
Copy link

Hey there!

This is more of a question than an issue :-)

Some of the classes that I'm trying to migrate to attrs have attributes that are expensive to compute. These attributes are either never accessed or accessed multiple times in quick succession. In those cases, I prefer delaying the cost of computing the attribute in case it is not needed -- but I also never want to compute the attribute twice for the same instance. I use pydanny/cached-property for this, which works fine until you use slots=True.

Any advice?

@Tinche
Copy link
Member

Tinche commented Mar 7, 2017

I've run into this as well. Unfortunately there's no easy solution as of yet. My advice for now is just use dict-based classes if you need cached properties.

@hynek
Copy link
Member

hynek commented Mar 7, 2017

Can’t we implement something ourselves? Technically it shouldn’t be a big of a deal: private var + property in front of it?

@Tinche
Copy link
Member

Tinche commented Mar 7, 2017

Well, yes, of course. Let's leave this open to track the issue.

Consider the elegance of cached properties when applied to dict classes. Suppose you have a class A and an instance of it, a. There's a cached property p on A.

The first time you do a.p, Python looks in a.__dict__, finds nothing, then looks in A.__dict__, finds the cached property descriptor and executes it. The descriptor calculates the value and puts it in a.__dict__.

Next time you do a.p, Python looks in a.__dict__ and finds the calculated value. So subsequent lookups are actually faster. Also you can trivially clear the cache with del a.__dict__['p'].

This is really clean and lends itself well to being refactored out into a simple library, which is what we have today.

@hynek
Copy link
Member

hynek commented Mar 7, 2017

Absolutely. I think we can simulate that using a simple placeholder class that does almost the same. It might make sense to have a blessed way for that that would also work with immutable classes even though it's a bit iffy. (not arguing, just sketching :))

@AndreLouisCaron
Copy link
Author

I like where this is going! I played around a bit with a cache_property desriptor and __slots__ today only to realize what the absence of __dict__ really means... (hint: it breaks the lookup mechanism that @Tinche described earlier).

If someone can sketch a basic structure that would work, I can put some work into a patch :-)

@graingert
Copy link
Contributor

fyi a class can be both slotted, and have a __dict__ slot

class Ham:
    __slots__ = ["__dict__", "spam"]

    def __init__(self):
        self.__dict__ = {"spam": 3}
        self.eggs = 1
        self.spam = 2


print(f"{Ham().eggs=}, {Ham().spam=}, {Ham().__dict__=}")

It seems you get the speedup for accessing slotted attributes

@graingert
Copy link
Contributor

graingert commented Oct 8, 2020

you can even replace the self.__dict__ slot with a slotted dict-like object that proxies its attrs - eg for all the attrs that have @cachedproperty use

class _Ham(dict):
    __slots__ = ["ham", "spam", "eggs"]

    def __getitem__(self, key):
        return getattr(self, key)

    def __setitem__(self, key, value):
        return setattr(self, key, value)


class Ham:
    __slots__ = ["__dict__", "spam"]

    def __init__(self):
        self.__dict__ = _Ham()
        self.eggs = 1
        self.spam = 2


print(f"{Ham().eggs=}, {Ham().spam=}, {Ham().__dict__=}")

edit: ah this doesn't work python cheats and accesses the self.__dict__'s data directly

@wbolster
Copy link
Member

fwiw, functools has .cached_property() these days. it works using a normal attribute and a fancy descriptor to populate it when it's first accessed. that means zero overhead the 2nd time it's accessed.

@hynek
Copy link
Member

hynek commented Nov 22, 2021

Yeah, the problem is that it doesn't work with slotted classes:

from functools import cached_property
import time
import attr


@attr.define
class A:
    @cached_property
    def x(self):
        return 42

a = A()

print(a.x)

throws:

Traceback (most recent call last):
  File "/Users/hynek/FOSS/attrs/t.py", line 14, in <module>
    print(a.x)
  File "/Users/hynek/.asdf/installs/python/3.10.0/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/functools.py", line 963, in __get__
    raise TypeError(msg) from None
TypeError: No '__dict__' attribute on 'A' instance to cache 'x' property.

Guess it could work if it stored the result in the decorator instead of the class?

@Tinche
Copy link
Member

Tinche commented Nov 22, 2021

Does the stdlib cached property work with normal (non-attrs) slotted classes?

@graingert
Copy link
Contributor

Does the stdlib cached property work with normal (non-attrs) slotted classes?

nope:

import functools

class Ham:
    __slots__ = "eggs", "__weakref__"

    def __init__(self, eggs):
        self.eggs = eggs

    @functools.cached_property
    def x(self):
        return 42

spam = Ham("example")
print(spam.eggs)
print(spam.x)
example
Traceback (most recent call last):
  File "foo.py", line 15, in <module>
    print(spam.x)
  File "/usr/lib/python3.8/functools.py", line 960, in __get__
    raise TypeError(msg) from None
TypeError: No '__dict__' attribute on 'Ham' instance to cache 'x' property.

@RonnyPfannschmidt
Copy link

the cached property in the stdlib used __dict__ as storage space
there would be a need for a variant that uses a slot and/or a identity weak mapping in the descriptor

@hynek
Copy link
Member

hynek commented Nov 22, 2021

Yeah this is not an attrs bug, but something where attrs might shine.

@wbolster
Copy link
Member

wbolster commented Nov 22, 2021

the cached property in the stdlib used __dict__ as storage space

not sure if this is entirely correct. functools.cached_property uses a normal instance attribute as its storage space. for non-slotted classes, that is inside .__dict__ indeed.

this means cached_property will only trigger once in normal situations: as a fallback when the attribute is not set. after running the function once, the result is set as the attribute value. subsequent accesses do not flow through cached_property anymore, which means the the cost of the property indirection is payed only once. clearing the cache is achieved by deleting the attribute using del x.y: it will cause the next access to go through the fallback path again, via a descriptor.

that said, slotted classes can also have (slotted) attributes that are not set:

>>> class C:
...     __slots__ = ('a',)
... 
>>> c = C()
>>> hasattr(c, 'a')  # Not yet set
False
>>> c.a = 12
>>> hasattr(c, 'a')  # Now it is set
True
>>> del c.a
>>> c.a  # Now it it unset again
Traceback (most recent call last):
[...]
AttributeError: a

however, trying to combine that with cached_property fails like this:

>>> import functools
>>> class C:
...     __slots__ = ('a',)
...     @functools.cached_property
...     def a(self):
...         return 123
... 
Traceback (most recent call last):
[...]
ValueError: 'a' in __slots__ conflicts with class variable

not sure if there's something that can be done about this 🤷

@kdebrab
Copy link

kdebrab commented Aug 31, 2022

functools.cached_property can be used in slotted classes if __dict__ is added as a field:

import functools
import attrs


@attrs.define()
class A:
    __dict__: dict = attrs.field(default=attrs.Factory(dict), init=False, repr=False, eq=False)

    @functools.cached_property
    def x(self):
        return 42


a = A()
a.__dict__  # {}
a.x  # 42
a.__dict__  # {'x': 42}

It works also for frozen slotted classes (replace attrs.define by attrs.frozen).

A (possibly undesired) side effect of adding __dict__ to a slotted class is that monkeypatching is enabled again. E.g. following simply works:

a.y = "test"

In case such a side effect is undesired, one may opt to add the field __cache__ instead of __dict__, while adapting cached_property such that it uses __cache__ instead of __dict__ to cache properties.

@diabolo-dan
Copy link
Contributor

I've been playing around with possibilities for getting some kind of cached property / lazy value on slotted (attrs) classes.

For my use case, I'd prefer to avoid adding a dictionary to the class, to keep the memory usage small, so adding in __dict__ isn't very appealing.

It would be great if attrs provided a way to do this behind the scenes, with something like:

@attrs.frozen
class X:
    x: float
    y: float
    z: float = attrs.field(lazy_value = lambda self: self.x + self.y)

and/or

@attrs.frozen
class X:
    x: float
    y: float
    z: float = attrs.field(init=False)

    @z.lazy_value
    def _z(self):
        return self.x + self.y

Or perhaps just by detecting cached_property objects on slotted classes, and adding the slot + whichever backend solution is preferred. That would make the typical user completely ignorant that this behaviour exists, at the expense of tying tightly to functools.cached_property.

Anyway, I've come across 2 general approaches that seem promising. The first, option-A, is through __getattr__:

import attrs

calls = []

@attrs.frozen
class A:
    x: float
    y: float
    z: float = attrs.field(init=False)

    def _z(self) -> float:
        calls.append(self.__class__.__name__)
        return self.x + self.y

    __lazy_values = {
        'z': _z
    }

    def __getattr__(self, item):
        if item in self.__lazy_values:
            result = self.__lazy_values[item](self)
            object.__setattr__(self, item, result)
            return result
        raise AttributeError(item)


a = A(10, 15)
assert a.x == 10
assert a.y == 15
assert a.z == 25
assert a.z == 25
assert a.z == 25

assert len(calls) == 1
assert calls == ['A']

The second, option-B, is through a custom descriptor for the field:

from typing import Callable
import attrs


@attrs.frozen
class B:
    x: float
    y: float
    z: float = attrs.field(init=False)

    def _z(self) -> float:
        calls.append(self.__class__.__name__)
        return self.x + self.y


@attrs.frozen
class LazyDescriptor:
    _old_descriptor: Callable
    _callable: Callable

    def __get__(self, instance, owner):
        try:
            return self._old_descriptor.__get__(instance, owner)
        except AttributeError:
            result = self._callable(instance)
            self._old_descriptor.__set__(instance, result)
            return result

    def __set__(self, instance, value):
        return self._old_descriptor.__set__(instance, value)

    def __delete__(self, instance):
         return self._old_descriptor.__delete__(instance)

B.z = LazyDescriptor(old_descriptor=B.z, callable=B._z)

calls = []
b = B(10, 15)
assert b.x == 10
assert b.y == 15
assert b.z == 25
assert b.z == 25
assert b.z == 25

assert len(calls) == 1
assert calls == ['B']

Of these, option-A has the advantage of not adding overhead except:

  • when calculating the cached value
  • when attempting to access non-existent attributes

Neither of which seems unreasonable.

It does mean adding/modifying __getattr__, which attrs doesn't appear to currently do, but __setattr__ is modified, so it's plausibly acceptable.

It also works the same way for non-slotted classes, so would be potentially easier to keep consistent.

Option-B in contrast adds consistent overhead for accessing the cached value (i.e. even after calculating it), but doesn't spill any overhead or complexity onto anything other than the cached value. It doesn't work out of the box for non-slotted classes, so an alternate would need to be found in that case, but that shouldn't be too difficult. It's also closer in implementation to functools.cached_property which may be advantageous.

I'd be happy to take a stab at an MR for something like this when I get some time, if either of the approaches would be considered acceptable, and the interface can be roughly agreed.

@hynek
Copy link
Member

hynek commented Sep 14, 2022

yeah B is too complicated and overhead's not cool.

I think I could live with an option-A-based solution, but maybe start with a PoC and take it from there.

A few notes:

  • It's fair game to detect a custom __getattr__ and error out.
  • the attribute's name should be __attrs_cached_properties__ or something similar.
  • and finally and most importantly: would it be possible to make it work along with @property?

It's an extra line, but I suspect it's a) clearer what's going on and b) less offensive to type checkers.

IOW I was thinking something along the lines of:

@attrs.define
class C:
    z: int = attrs.field(init=False).  # init=False would be optional but might mollify pyright etc

    @z.cached
    @property
    def _z(self) -> int:
        return

But maybe it's possible to type an Attribute.lazy_property in a way that type checkers would swallow it too?

@diabolo-dan
Copy link
Contributor

What are your thoughts on picking up cached_property directly? It's the most invisible to the typical user, and should play reasonably with type checkers etc. I guess it might cause surprise if someone is relying on specifics of the cached_property on the class level, but someone doing that should be able to work out what's going on.

I also realised this could actually be done outsids of attrs , as a pre-step with an additional decorator:

T = TypeVar('T', bound=type)
def convert_cached_property_for_attrs_slots(cls: T) -> T:
    cached_properties = {name: cached_property.func for name, cached_property in cls.__dict__.items() if isinstance(cached_property, functools.cached_property)}
    original_getattr = cls.__dict__.get('__getattr__')

    def __getattr__(instance, item):
        if item in cached_properties:
            result = cached_properties[item](instance)
            object.__setattr__(instance, item, result)
            return result
        elif original_getattr is not None:
            return original_getattr(instance, item)
        else:
            raise AttributeError(item)

    for name in cached_properties:
        setattr(cls, name,  attrs.field(init=False, hash=False, eq=False, repr=False, order=False))
    setattr(cls, '__getattr__',   __getattr__)
    return cls

Applied before attrs:

@attrs.frozen
@convert_cached_property_for_attrs_slots
class C:
    x: float = attrs.field()
    y: float = attrs.field()

    @cached_property
    def z(self) -> float:
        calls.append(self.__class__.__name__)
        return self.x + self.y


calls = []
c = C(10, 15)
assert c.x == 10
assert c.y == 15
assert c.z == 25
assert c.z == 25
assert c.z == 25


assert len(calls) == 1
assert calls == ['C']

I don't mean this as a long term solution, but it allows some exploration of the approach before touching attrs itself, and maybe unblocks others stumbling on this thread.

(This version is also obviously not thread safe, and it would make sense to use an RLock in the same way as functools.cached_property)

@EpicWink
Copy link

One quick fix that I'm happy with is allowing attrs.define to keep exising __slots__: right now, if slots=True, attrs overwrites the class's __slots__ instead of updating. If it were to update, I could specify __slots__ = ("__dict__",) and cached-properties would work again (I don't mind the performance penalty of having a __dict__).

@diabolo-dan
Copy link
Contributor

It seems that my proposed decorator only works when all the existing attributes are explicitly marked as = attrs.field(). I think because attrs will only infer when all attributes have type hints.

Adding in a:

cls.__annotations__[name] = inspect.signature(
            cached_properties[name]
        ).return_annotation

seems to resolve this, at least so long as the cached_property has a return annotation. I guess one could do:

        if isinstance(annotation, inspect.Parameter.empty):
            cls.__annotations__[name] = annotation

for robustness.

@RonnyPfannschmidt
Copy link

Btw, as properties /cached properties don't support slots as well, why not simply require the use of a cached property decorator that uses a alternative cache attribute name

@AdrienPensart
Copy link

It is maybe naive, but for unknown reason, inheriting from a class is fixing this behavior :

#!/usr/bin/env python3
from functools import cached_property
from attr import define

class Works:
    pass


@define(frozen=True)
class Whatever(Works):
    @cached_property
    def it_works(self) -> int:
        return 4


t = Whatever()
print(t.it_works)

print(t.__dict__)
print(t.__slots__)
4
{'it_works': 4}
()

@nrbnlulu
Copy link

@AdrienPensart well you have a __dict__ there...

@RonnyPfannschmidt
Copy link

slots only works as intended when all base classes use it as ell, else its mood and shouldn't e used to begin with ...

@mmerickel
Copy link

Ran into this issue again, and passing @attrs.define(slots=False) fixes things but attrs really needs a nice way to support this case and preserve slots. I was expecting something along the lines of the following:

@attrs.define
class Foo:
    x: int

    @attrs.lazy_field
    @cached_property
    def double_x(self) -> int:
        return self.x * 2

foo = Foo(x=5)
print(foo.double_x)  # 10

@hynek
Copy link
Member

hynek commented Mar 11, 2023

If we add another decorator, couldn't it be @attrs.cached_property that handles all edge cases correctly instead of requiring two decorators, one of which we don't control? Like serious question, would it be a lot work to reimplement or at least wrap?

@RonnyPfannschmidt
Copy link

Depends on whether one wants the cached property to be thread safe

However it's still going to require a rename of the field as to avoid having to deal with shadowed or hidden slot descriptors

@diabolo-dan
Copy link
Contributor

I found some time to put together a proof of concept of an approach to this: #1200

This works similarly to the decorator I posted above, in that it picks up use of the functools cached_decorator implementation, and converts it and the class to work in a similar way.

It would be good to get some feedback on whether this seems like an acceptable approach at a high level. The advantage of basing on the functools cached property is that it will just work for most people, and it potentially doesn't require any API changes. An argument against is that it's makes the conversion hidden, which could be surprising, but pragmatically, that doesn't seem like it would be a common problem.

There's also an issue with locking, because the current approach is per-class, which is unlikely to be acceptable in most cases. It would be good to get some feedback on what the best approach would be here. The options I see are:

  • Class based locking (current approach), inefficient for many objects across many threads.
  • No locking: Simplest, lowest overhead, potential for duplicated work (and potential races).
  • Object based locking: add a lock to each object during init -- extra overhead at construction, permanent extra memory consumption per object.
  • Some kind of key based lock on the class, keyed on instances: Still some overhead, still some class based locking (but with much more limited duration), no standard implementation so probably needs extra code in the repo.

@mmerickel
Copy link

Does attrs deal with locking in any other area? If so, this strategy should conform to that pattern. However, I think it does not - thus I would expect this to also not handle locking either, leaving room to grow out a full plan across all of attrs in the future consistently. Note that in Python 3.12+ the locking has been removed from cached_property, and this should not try to bring it back.

In use cases that I think are most common, you'd want per-object locks, never per-class locks. And in those cases I'd expect to declare the entire attrs class that way, not just the cached_property but any custom setters as well.

@diabolo-dan
Copy link
Contributor

Does attrs deal with locking in any other area? If so, this strategy should conform to that pattern. However, I think it does not - thus I would expect this to also not handle locking either, leaving room to grow out a full plan across all of attrs in the future consistently.

It doesn't that I could find, my concern was in matching cached_property as closely as possible, to minimise potential surprise.

Note that in Python 3.12+ the locking has been removed from cached_property, and this should not try to bring it back.

Ah, I hadn't realised that. I think that makes a fairly good argument in favour of leaving the locking out (along with the point about inconsistency with attrs).

@diabolo-dan
Copy link
Contributor

I've removed the locking, and tried to bring the MR up to contribution guidelines. I wasn't sure where the most appropriate place to document this behaviour would be. I've added a section to how-does-it-work.md, which seems like a reasonable place, but there might be a better option.

For hypothesis testing, I've added something to strategies, but there isn't prior provision for methods/non-attributes, so it might not be worth keeping the addition I've done.

As a general note, I like this approach for being transparent to users, but if there's a worry about potential confusion, it should be straightforward to have a separate decorator that acts as a sentinel instead of using functools.cached_property.

@n-splv
Copy link

n-splv commented Jan 31, 2024

FYI this works without error now (Python 3.10, 3.11, 3.12; attrs 23.2.0).

from functools import cached_property

from attrs import define


@define
class A:
    @cached_property
    def x(self):
        return 42


a = A()
print(a.x)  # 42

However, there is a nasty little bug with AttributeError propagation. Let's start with a regular class:

class A:

    @cached_property
    def x(self):
        return self.y

    @property
    def y(self):
        raise AttributeError("Message")
        
a = A()
print(a.x)  # AttributeError: Message

Now let's use attrs with __dict__:

@define(slots=False)
class A:
    ...

print(a.x)  # AttributeError: Message

So far so good. BUT:

@define  # slots=True by default
class A:
    ...

print(a.x)  # AttributeError: 'A' object has no attribute 'y'

This made me spend quite some time wondering where did my attribute go :)

@hynek what do you think, shall I open a separate issue?

@hynek
Copy link
Member

hynek commented Jan 31, 2024

ah yes yay this was fixed by #1200!

shall I open a separate issue?

If this is somehow caused by us, yes.

@hynek hynek closed this as completed Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests