Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small changes to List and Dict data types API #4495

Closed
mbercx opened this issue Oct 21, 2020 · 14 comments
Closed

Small changes to List and Dict data types API #4495

mbercx opened this issue Oct 21, 2020 · 14 comments
Assignees
Labels
priority/critical-blocking must be resolved before next release topic/data-types type/feature request status undecided
Milestone

Comments

@mbercx
Copy link
Member

mbercx commented Oct 21, 2020

I'd like to discuss two small changes to the API of the List and Dict classes:

Initialization

Is your feature request related to a problem? Please describe

When initializing a List or Dict node, you have to pass dict or list as a keyword argument:

l = List(list=[1, 2, 3])

If you try to instead just pass the list as a positional argument, you are faced with an error that doesn't clarify proper usage:

l = List([1, 2, 3])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-47-398676db57d5> in <module>
----> 1 List([1, 2, 3])

TypeError: __init__() takes 1 positional argument but 2 were given

and the docstring doesn't explain that you have to use the list keyword argument either:

In [46]: List?
Init signature: List(**kwargs)
Docstring:      `Data` sub class to represent a list.
Init docstring:
:param backend_entity: the backend model supporting this entity
:type backend_entity: :class:`aiida.orm.implementation.entities.BackendEntity`
File:           ~/envs/aiida-docs/code/aiida-core/aiida/orm/nodes/data/list.py
Type:           AbstractNodeMeta
Subclasses:    

Describe the solution you'd like

I think it would be more intuitive if the user could just pass the list/dict as the first positional argument. For example, we could change the current constructor:

def __init__(self, **kwargs):
    data = kwargs.pop('list', list())
    super().__init__(**kwargs)
    self.set_list(data)

to something like (thanks to @giovannipizzi for the suggestion to import the builtins module to avoid conflicts between the list argument and builtin, see this discussion):

def __init__(self, list=None, **kwargs):
    import builtins
    list = list or builtins.list()
    super().__init__(**kwargs)
    self.set_list(input_list)

Similarly for Dict. If for some reason we cannot make this change, we should at least add a docstring that clearly explains the API to the user.

Using .value

Is your feature request related to a problem? Please describe

For mose data types in AiiDA derived from builtin Python types, you can use the value property to get the corresponding Python builtin. For List and Dict nodes, however, you have to use the get_list() and get_dict() methods. Adding the
value property to these classes would make the API between the "builtin" AiiDA data types more consistent.

Describe the solution you'd like

Add the value property to both the List and Dict classes.

@mbercx mbercx added type/feature request status undecided good first issue Issues that should be relatively easy to fix also for beginning contributors priority/nice-to-have labels Oct 21, 2020
@greschd
Copy link
Member

greschd commented Oct 21, 2020

Completely agree on the first point.

For the .value, one thing to consider is what the behavior of the following should be:

In [1]: from aiida import orm

In [2]: x = orm.List(list=[1, 2, 3])

In [3]: x.value.append(4)

In [4]: x.store()
Out[4]: ...

In [5]: x.value.append(5)

In [6]: x
Out[6]: ...

To be consistent with e.g. Float:

  • 3 should be possible, and modify the content of the node
  • 5 should raise

Here's what currently happens with get_list():

In [1]: from aiida import orm

In [2]: x = orm.List(list=[1, 2, 3])

In [3]: x.get_list().append(4)

In [4]: x.store()
Out[4]: <List: uuid: 412e131b-60c7-47fa-a76e-48812541a179 (pk: 61311) value: [1, 2, 3, 4]>

In [5]: x.get_list().append(5)

In [6]: x
Out[6]: <List: uuid: ac068ff6-4182-477c-82c3-59f034b539d2 (pk: 61312) value: [1, 2, 3, 4]>

I think this could be a source of subtle bugs - maybe we should take the opportunity of introducing .value to change this, while keeping .get_list() the same for backwards-compatibility.

@mbercx
Copy link
Member Author

mbercx commented Oct 21, 2020

That's a good point, I hadn't considered this. I agree with your proposed behavior, i.e. make .value fully consistent with the other base data types and keep .get_list()/.get_dict() as is.

Initially I was inclined to suggest making List and Dict a subclass of BaseType, but this raises similar questions as those discussed in #1917 regarding the equality comparison ==. As @sphuber mentioned there, currently Dict(dict={'a': 1}) == Dict(dict={'a': 1}) returns False, which make this a backwards incompatible change. It's a little confusing though that currently List does have the same behavior as Int etc in this regard.

Maybe for now we can just copy the value property from BaseType to Dict, and make List a subclass of BaseType? We can consider making Dict a subclass of BaseType once #1917 has come to a resolution.

@mbercx
Copy link
Member Author

mbercx commented May 6, 2021

Just another note regarding the BaseType nodes, based on a discussion with @sphuber here on passing a node to the constructor of a node:

Currently, when you try to pass a e.g. Float node to the constructor of a Float node, this works fine:

In [1]: f1 = Float(1)

In [2]: f2 = Float(f1)

However, trying to do the same with a List or Dict fails:

In [3]: d1 = Dict(dict={'a': 1})

In [4]: d2 = Dict(dict=d1)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-158c214430d4> in <module>
----> 1 d2 = Dict(dict=d1)

~/envs/aiida-dev/code/aiida-core/aiida/orm/nodes/data/dict.py in __init__(self, **kwargs)
     60         super().__init__(**kwargs)
     61         if dictionary:
---> 62             self.set_dict(dictionary)
     63 
     64     def __getitem__(self, key):

~/envs/aiida-dev/code/aiida-core/aiida/orm/nodes/data/dict.py in set_dict(self, dictionary)
     81             # Clear existing attributes and set the new dictionary
     82             self.clear_attributes()
---> 83             self.update_dict(dictionary)
     84         except exceptions.ModificationNotAllowed:  # pylint: disable=try-except-raise
     85             # I reraise here to avoid to go in the generic 'except' below that would raise the same exception again

~/envs/aiida-dev/code/aiida-core/aiida/orm/nodes/data/dict.py in update_dict(self, dictionary)
     98         :param dictionary: a dictionary with the keys to substitute
     99         """
--> 100         for key, value in dictionary.items():
    101             self.set_attribute(key, value)
    102 

AttributeError: 'Dict' object has no attribute 'items'

I'm inclined to allow this, creating a new Dict node with the same value.

Once again, the equality comparisons are relevant here. For the Float nodes, we obtain the behaviour I would intuitively expect:

In [5]: f1 == f2
Out[5]: True

In [6]: f1 is f2
Out[6]: False

I.e. their value is the same, but the node is different. Note that this doesn't correspond to the Python base type behaviour for float:

In [9]: f3 = float(3)

In [10]: f4 = float(f3)

In [11]: f3 == f4
Out[11]: True

In [12]: f3 is f4
Out[12]: True

In [13]: id(f3)
Out[13]: 140577203382032

In [14]: id(f4)
Out[14]: 140577203382032

But this does correspond to the behaviour for dict:

In [15]: d3 = {'a': 1}

In [16]: d4 = dict(d3)

In [17]: d3 == d4
Out[17]: True

In [18]: d3 is d4
Out[18]: False

In [19]: id(d3)
Out[19]: 140577170248320

In [20]: id(d4)
Out[20]: 140577168878912

@mbercx
Copy link
Member Author

mbercx commented Aug 11, 2021

To move forward with this, I'd suggest three changes:

  • Change the constructor so the list and dict no longer need to be added.
  • Add .value to List and Dict, but keep get_list() and get_dict() in line with @greschd's comment above.
  • Make the equality behaviour for all Python basetype data types consistent. I.e. this means that going forward Dict nodes will also be equal when their content is equal, similar to Int, Float, List, etc.

I think all of these can be achieved by making both List and Dict subclasses of BaseType. So I'd try to implement this strategy unless there are clear objections.

@sphuber
Copy link
Contributor

sphuber commented Aug 11, 2021

Change the constructor so the `list` and `dict` no longer need to be added.

Since this is backwards compatible and makes the interface easier, I am fully in favor of this.

Add .value to List and Dict, but keep get_list() and get_dict() in line with @greschd's comment above.

Would be fine with this, but what do we do with the dict property of Dict? This already operates somewhat like what .value would do, except that it wraps the content in an AttributeManager.

Make the equality behaviour for all Python basetype data types consistent. I.e. this means that going forward Dict nodes will also be equal when their content is equal, similar to Int, Float, List, etc.

I wouldn't touch this for now. There is a reason that Python base types also behave differently when it comes to equality, as you have noticed. Since what the "correct" behavior should be is not at all trivial (as evidenced by the long discussions in #1917 ) I wouldn't hastily change this in this PR. I would keep it limited to the changes listed above.

@giovannipizzi
Copy link
Member

but what do we do with the dict property of Dict?

I would keep it, having one more method/property (that actually returns something different) does not seem a big issue to me

@sphuber
Copy link
Contributor

sphuber commented Aug 12, 2021

I would keep it, having one more method/property (that actually returns something different) does not seem a big issue to me

Also fine by me, but I am already anticipating the questions why there are two and they operate differently :)

@sphuber
Copy link
Contributor

sphuber commented Sep 22, 2021

Marking this as critical-blocking as this should go in v2.0. Even if nothing is changed, than that decision should be documented such that we don't come back to this discussion everytime

@mbercx
Copy link
Member Author

mbercx commented Oct 5, 2021

I've opened #5165 to adapt the constructor, but adding the .value property in line with @greschd suggestion might be trickier than I initially expected. To recap: we would want any changes to the value property of e.g. List to raise an error in case the List node is already stored:

In [1]: x = List(list=[1, 2, 3])

In [2]: x.store()
Out[2]: ...

In [3]: x.value.append(5)
--> This should raise!

However, the .value property of e.g. List should return a list instance, so I'm not sure how we can replicate this behaviour. @greschd indicated that this should be the behaviour to be consistent with e.g. Float, but I think that's only because the returned float by Float.value doesn't have any methods that change the instance. For another BaseType, Str, applying a method that changes the instance to Str.value doesn't raise either, nor does it actually change the value of the node:

In [1]: s = Str('test')

In [2]: s.store()
Out[2]: <Str: uuid: ad95e837-d158-48c1-af9a-f3b0b67e0377 (pk: 5) value: test>

In [3]: s.value.capitalize()
Out[3]: 'Test'

In [4]: s.value
Out[4]: 'test'

So I'm not sure how we would fix this, and if this is even desirable. 😅

@sphuber
Copy link
Contributor

sphuber commented Oct 5, 2021

To me this is not surprising. That is to say, the exact same happens with node attributes. After all, the value property merely returns the value of a particular attribute. For example:

In [1]: d = Data()

In [2]: d.set_attribute('l', [1, 2])

In [3]: d.get_attribute('l')
Out[3]: [1, 2]

In [4]: d.get_attribute('l').append(3)

In [5]: d.get_attribute('l')
Out[5]: [1, 2, 3]

In [6]: d.store()
Out[6]: <Data: uuid: bebd05e6-0d3f-4328-8056-5bba433698ee (pk: 3)>

In [7]: d.get_attribute('l').append(4)

In [8]: d.get_attribute('l')
Out[8]: [1, 2, 3]

This is on purpose though. The aiida.orm.entities.EntityAttributesMixin, which provides the methods to operate on node attributes, intentionally returns a deepcopy of the attributes of the underlying model instance if the node is stored. This is exactly to prevent that the content is modified if the node is stored.

    def get_attribute(self, key, default=_NO_DEFAULT):
        """Return the value of an attribute.

        .. warning:: While the entity is unstored, this will return a reference of the attribute on the database model,
            meaning that changes on the returned value (if they are mutable themselves, e.g. a list or dictionary) will
            automatically be reflected on the database model as well. As soon as the entity is stored, the returned
            attribute will be a deep copy and mutations of the database attributes will have to go through the
            appropriate set methods.

        :param key: name of the attribute
        :param default: return this value instead of raising if the attribute does not exist
        :return: the value of the attribute
        :raises AttributeError: if the attribute does not exist and no default is specified
        """
        try:
            attribute = self.backend_entity.get_attribute(key)
        except AttributeError:
            if default is _NO_DEFAULT:
                raise
            attribute = default

        if self.is_stored:
            attribute = copy.deepcopy(attribute)

        return attribute

So the problematic behavior that @greschd describes is actually intentional behavior and has been part of AiiDA for a long time. I would say this is simply a question of documenting if it hasn't already been done.

@greschd
Copy link
Member

greschd commented Oct 7, 2021

intentionally returns a deepcopy of the attributes of the underlying model instance if the node is stored

Wouldn't it then be more consistent to always return a deepcopy, regardless of store status? For modifying, it would make more sense to me to have methods (e.g. .append() directly on the List).
My main point above was that making the behavior dependent on whether .store() had already been called can lead to subtle differences where the same code doesn't have the same meaning, depending on non-local context.

I definitely see @sphuber's point though that this has been the behavior for a long time now, so there's a significant risk associated with changing it.

@unkcpz unkcpz removed the good first issue Issues that should be relatively easy to fix also for beginning contributors label Nov 10, 2021
@mbercx
Copy link
Member Author

mbercx commented Dec 8, 2021

To move towards closing this issue: the constructor has been adapted in #5165, and the issue of comparing node equality by content for the Dict class has been hashed out in discussion #5187 and adjusted in #5251. The only thing to resolve is whether/how to add the value property to Dict and List nodes. If we want to do this, I see a couple options:

  1. We make sure the usage of value is consistent among all base types by leaving BaseType.value untouched and adding value as a property for Dict and List. This would mean that the same issue regarding difference in behaviour whether or not the node is stored is propagated to the Dict and List classes.
  2. We add the value property to Dict and List , but have it return a deepcopy. This would fix the inconsistent behaviour depending on whether the node is stored or not, but would make the behaviour of value inconsistent across the base types.
  3. We make the value property always return a deepcopy for all base types. This is a backwards-incompatible change, however, that might require quite a bit of changes in the plugins if they rely on changing nodes using value.

For v2.0, I would suggest implementing [1] so the API across the base data types is consistent. After v2.0 we can look into implementing [3], but the issue of inconsistent behaviour between nodes before and after being stored is a more general one that will require more extensive discussion to see the implications of changing this.

@sphuber
Copy link
Contributor

sphuber commented Dec 8, 2021

I vote for approach 1. When in the future we go for 3, we should also make value raise when it is being used as a setter.

@sphuber
Copy link
Contributor

sphuber commented Jan 17, 2022

Since the bulk of this issue has been handled and only the value remains, I have split this off in a separate issue #5313 Discussion will continue there

@sphuber sphuber closed this as completed Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/critical-blocking must be resolved before next release topic/data-types type/feature request status undecided
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants