-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node
: add the is_valid_cache
setter property
#5207
Node
: add the is_valid_cache
setter property
#5207
Conversation
@ramirezfranciscof @unkcpz @chrisjsewell @greschd your comments please :) This is a very rough first sketch of what I think we need. Once we agree on the mechanism, I can work on more tests, documentation etc. |
Codecov Report
@@ Coverage Diff @@
## develop #5207 +/- ##
===========================================
+ Coverage 81.22% 81.22% +0.01%
===========================================
Files 532 532
Lines 37337 37344 +7
===========================================
+ Hits 30323 30329 +6
- Misses 7014 7015 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sphuber , I have no objection to this. Just one minor request of seting it as property object.
aiida/orm/nodes/node.py
Outdated
|
||
return (node for node in builder.all(flat=True) if node.is_valid_cache) | ||
|
||
def set_is_valid_cache(self, valid: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that possible to have it like this, make it a property?
def set_is_valid_cache(self, valid: bool) -> None: | |
@is_valid_cache.setter | |
def is_valid_cache(self, valid: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that, would make it easier for Node
subclasses to override the behavior. Maybe we should document that as an "official" way to influence caching?
Of course, they would mostly implement the getter, not the setter.
That is then also where the RemoteData
implementation should likely go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly is possible and probably the right thing to do. Only reason I already didn't do it was because originally the method was called something completely different. Then I saw that is_valid_cache
exists and that is essentially the getter of what I was implementing, however, that was currently being documented as a property to override by classes (not instances). So then it felt a bit weird to make the setter really only operate on instances.
But then again, that is what the property is now doing: it is now an instance level concept, and so your suggestion makes sense. Will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should document that as an "official" way to influence caching?
If people agree that this is indeed the way to go to officially control caching on a per-node basis, then I would definitely add this to the docs. I first wanted to get a sense if people agree before writing this though. So I take @greschd is on board. Then I will just wait a bit longer for the others to have the chance to respond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Given only |
Not sure. I see your objection to the node namespace being busy, but the entire caching logic is currently implemented on the node level and the fact that
That is not really meant with |
37994b5
to
5c33bfc
Compare
Node
: add the set_is_valid_cache
methodNode
: add the is_valid_cache
setter property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just a minor detail that might affect performance, but I agree with the concept.
I also agree that this should be the documented way to enable/disable caching; and the docs should also suggest how to extend the behaviour in subclasses (and when it's recommended to do it, or not to do it). I guess calling the super, and then using and
if one wants to possibly disable caching e.g. in remote data? Or is there a better way?
Two suggestions to be opened as issues, if you agree:
- have a
verdi data remote check-valid-cache NODE1 NODE2 ...
that (opening a single connection) checks all nodes in some way - e.g. if they are empty - and sets the valid cache to False. (However, I'm thinking if this is something general enough to be done in AiiDA... maybe we should move it toaiida-quantumespresso check-remotedata-valid-cache
or something like this, and once implemented suggest this as a reference to the other plugins, since the plugin will know what to check) - In the long term (not for now), if we realise that the performance of checking these hashing/caching extras is not good, we might want to move these to a table column that is indexed, since this will be much more efficient in searching for nodes with the same hash (since extras are not indexed, so it's always a table scan).
node_hash = self._get_hash() | ||
|
||
if not node_hash or not self._cachable: | ||
return iter(()) | ||
|
||
builder = QueryBuilder(backend=self.backend) | ||
builder.append(self.__class__, filters={'extras._aiida_hash': node_hash}, project='*', subclassing=False) | ||
nodes_identical = (n[0] for n in builder.iterall()) | ||
builder.append(self.__class__, filters={f'extras.{self._HASH_EXTRA_KEY}': node_hash}, subclassing=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just delegate the query builder to already filter by node.is_valid_cache, rather than doing it in python below via the getter? I imagine this can only be more efficient (one needs to be careful to the default value if not set, of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be possible if the extra is the only thing that determines whether the node is a valid caching source. However, the method can and actually is already overridden by the ProcessNode
class. It also considers the exit code that is stored, which if it has the invalidates_cache
set to True
on the associated ProcessSpec
then is_valid_cache
also returns False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me! Thanks.
5c33bfc
to
e90ca9e
Compare
This is really not relevant for data classes, because a data node is never considered as a caching source, since caching is disabled for data nodes. Note that the value returned by
I think for ensuring that
Fully agree. This ties in to the discussion of having a concept of "internal attributes" that control various AiiDA mechanisms but are not really the same as normal node attributes. Having them in extras is too fragile as they can be accidentally changed or removed by users. When we have this new private extras, we should think which ones should be indexed and how. |
e90ca9e
to
6ff7897
Compare
This property allows to mark an individual node as invalid for use as a caching source. It works by setting the `_aiida_valid_cache` extra to either `True` or `False`. Previously, the unofficial way of ensuring a node was no longer used as a cache source, was to remove the `_aiida_hash` extra, containing the hash of the node. This would achieve the desired effect, since without a hash, the node would never be found when looking for identical nodes to cache from. However, this not an ideal approach because the hash may be useful for other purposes. In addition, if the node were to be rehashed at some point, which could happen by accident if all nodes of a particular class got rehashed, it would all of a sudden be used in caching again. The solution is to have a way to persistently mark node instances as invalid for caching. Note that the `is_valid_cache` property of the `Node` before was intended as a hook to allow changing the caching behavior of subclasses of `Node` as a whole, and not on the instance level. This is now changed as the property has a setter that operates on the instance level. Subclasses should take care to respect this base class implementation if it makes sense. Since `Node` can only be subclassed by `aiida-core` and not in plugins, we can easily enforce this system.
6ff7897
to
f8a6753
Compare
Fixes #5189
This property allows to mark an individual node as invalid for use as a
caching source. It works by setting the
_aiida_valid_cache
extra toeither
True
orFalse
.Previously, the unofficial way of ensuring a node was no longer used as
a cache source, was to remove the
_aiida_hash
extra, containing thehash of the node. This would achieve the desired effect, since without a
hash, the node would never be found when looking for identical nodes to
cache from. However, this not an ideal approach because the hash may be
useful for other purposes. In addition, if the node were to be rehashed
at some point, which could happen by accident if all nodes of a
particular class got rehashed, it would all of a sudden be used in
caching again.
The solution is to have a way to persistently mark node instances as
invalid for caching. Note that the
is_valid_cache
property of theNode
before was intended as a hook to allow changing the cachingbehavior of subclasses of
Node
as a whole, and not on the instancelevel. This is now changed as the property has a setter that operates on
the instance level. Subclasses should take care to respect this base
class implementation if it makes sense. Since
Node
can only besubclassed by
aiida-core
and not in plugins, we can easily enforcethis system.