-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[minor] Recursive to/from_dict #1602
Conversation
Adds a create_from_dict function that takes the output of HasDict.to_dict and turns it into a live object, similar to our earlier to_object() method on ProjectHDFio. Adds an instantiate class method to HasDict to support this, this allows HasDictfromHDF to work and will be useful for dataclasses in the future. HasDict.to_dict now goes over the contents of what is returned from _to_dict and automatically converts any HasDict/HasHDF objects it finds. I haven't used this in downstream code yet to keep the change small, but in principle this will allow GenericJob/DataContainer to stop calling to_dict on their children explicitly and let the generic interface handle it. The rest of the changes are renaming everything to _from_dict/_to_dict and normalizing the argument name to obj_dict.
if "executable" in obj_dict.keys() and obj_dict["executable"] is not None: | ||
self._executable = obj_dict["executable"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit surprised about this part, when is the dictionary of the executable converted to the executable object? Is this already happening when reading the executable from the HDF5 file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this new setup HasDict.from_dict converts nested objects back from their dictionary form before the implementations HasDict._from_dict is called. So by the time GenericJob._from_dict runs and looks for the executable Executable._from_dict already ran.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle the reverse also works, i.e. it's not always necessary anymore to call to_dict on ones children that are returned in the obj_dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to document this at least in the HasDict
class, so developers understand how to derive their own classes from HasDict
and that they only have to implement the _to_dict()
and _from_dict()
interface. Finally, I am wondering if it makes sense to overload the __getstate__()
and __setstate__()
function in the HasDict
class. Then we can slowly transition from calling to_dict()
and from_dict()
to using the __getstate__()
and __setstate__()
interface to reload the pyiron objects. This would simplify the interface for developers who just want to integrate a new code. As long as their classes can be pickled, then we can store their classes in our jobs. If they want to optimise the performance and benefit from the hierarchical nature of the HDF5 file, then they can overload the __getstate__()
and __setstate__()
method, for example by attaching a dataclass to their class which they use for data storage or by attaching a data container.
def load(inner_dict): | ||
if not isinstance(inner_dict, dict): | ||
return inner_dict | ||
if not all( | ||
k in inner_dict for k in ("NAME", "TYPE", "OBJECT", "DICT_VERSION") | ||
): | ||
return {k: load(v) for k, v in inner_dict.items()} | ||
return create_from_dict(inner_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I understand that the conversion is handled here, still I am wondering if this is what we want. I thought the intention was to get towards using __getstate__()
and __setstate__()
. Now with the executable and the server object using datacalsses for data storage internally, I was hoping that we could use a simple dictionary representation there where the class is stored by storing the attached dataclass.
pyiron_base/interfaces/has_dict.py
Outdated
for k, v in self._to_dict().items(): | ||
if isinstance(v, HasDict): | ||
child_dict[k] = v.to_dict() | ||
elif isinstance(v, HasHDF): | ||
child_dict[k] = HasDictfromHDF.to_dict(v) | ||
else: | ||
data_dict[k] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify this, this part currently does not work for the Atoms
object in pyiron_atomistics
, as it is neither derived from HasDict
nor from HasHDF
, correct? It only works when the Atoms
object was stored as part of a DataContainer
, correct?
Adds a create_from_dict function that takes the output of HasDict.to_dict and turns it into a live object, similar to our earlier to_object() method on ProjectHDFio.
Adds an instantiate class method to HasDict to support this, this allows HasDictfromHDF to work and will be useful for dataclasses in the future. HasDict.to_dict now goes over the contents of what is returned from _to_dict and automatically converts any HasDict/HasHDF objects it finds. I haven't used this in downstream code yet to keep the change small, but in principle this will allow GenericJob/DataContainer to stop calling to_dict on their children explicitly and let the generic interface handle it.
The rest of the changes are renaming everything to _from_dict/_to_dict and normalizing the argument name to obj_dict.
This allows to store "abitrary" objects again in the input/outputs of TemplateJobs.
Separate PR for easier review. I will add extra unit tests, but it worked already in my local notebook tests.