-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move aiida.cmdline.utils.common.get_database_summary
to storage backend
#5387
Move aiida.cmdline.utils.common.get_database_summary
to storage backend
#5387
Conversation
4ae0d77
to
eecf759
Compare
The things I think are definitely open for discussion are:
|
Hey, thanks for considering my concerns! Indeed this seems like a quite interesting solution to the problem of balancing the agnosticity of using the ORM and allowing for further improvement for each specific backend. However, what I have now noticed after thinking a bit about this, is that we seem to be mixing high level concepts from aiida (like nodes and links) with "low" level stats from the storage backend (size, number of files/tables, etc). Mind you, I'm not saying this is being introduced here: I'm just noticing it now that the backend has been more cleanly separated, making this aspect stick out more. In other words, I think the number and type of nodes, groups, etc are not a "database" stat and should perhaps not be reported as such. Maybe we have another command for BTW just as a reference, database stats could be stuff like the number of tables, rows in each table, size in mb, etc. It can also be useful to have these for contrasting with the results for the more "general" content stats to see if they are consistent. What do you think? Does this make sense? |
I think your analysis makes sense. But if I understand it correctly, you agree that it still makes sense to have For me that would be fine, just not sure what the other key should be named. Had similar problems with naming the method |
Yeh lets not over-complicate too much 😬 Just to add context, I only added |
Mmm 🤔, I'm still unsure of where I think would be the best place for these higher level stats, but I guess for now it could go there. Naming is annoying, yes. I think |
1e9d0ce
to
77e39c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good for me, just maybe one comment / question below.
Also, just to check (because github interface is still a bit of a PITA to check moved content): the get_orm_entities
has the exact same content that used to be in get_database_summary
, right? You didn't change anything inside the method itself.
def get_info(self, statistics: bool = False, **kwargs) -> dict: | ||
raise NotImplementedError | ||
return {'Objects': len(list(self.list_objects()))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this one doesn't use super
as well? (also "objects" maybe in lowercase and with a nested dict to make it consistent with the others?)
def get_info(self, statistics: bool = False, **kwargs) -> dict:
results = super().get_info(statistics=statistics)
results['objects'] = {'count': len(list(self.list_objects()))}
return results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me to make it lower case, but DiskObjectStoreRepositoryBackend.get_info
actually uses Objects
with a capital.
Not using super was just an omission, will add it now.
Regarding the get_orm_entities
, yes I kept the implementation, just changing the verbose
variable to statistics
.
77e39c0
to
787e6fb
Compare
Mmm true, it actually does a weird thing where it alternates capitalizing and lowercase...
Should we set a standard for this? Or leave it free? In which case apologies for the pedantry 😅 (BTW good to go for me now, we can think about standardizing this later, I'm only waiting for the test to pass to approve) |
…kend This utility was used in `verdi archive inspect` and `verdi storage info` to give information about the contents of an archive or normal storage backend. Historically, a distinction used to be made between the contents of the database and the repository, but in v2.0 these are unified by the storage backend. The information about the repository is already retrieved through `StorageBackend.get_info` but the contents of the database were still built through the `get_database_summary` function. The implementation is moved to the `get_orm_entity_overview` method of the `StorageBackend` base class. This is done because the implementation is currently still independent of the storage backend implementaiton. So for now the implementations can simply call to this method in their `get_info` implementation, but it leaves the option to override it with a more performant implementation that doesn't go through the shared ORM of AiiDA.
787e6fb
to
9f02956
Compare
def get_info(self, statistics: bool = False, **kwargs) -> dict: | ||
raise NotImplementedError | ||
return {'objects': {'count': len(list(self.list_objects()))}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still forgot to add the super
here, no? Oh bah, perhaps you were not finished with the modifications yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't have one. Note this is the repository implementation, the base abstract class of which doesn't define a get_info
. Only the implementation for the disk object store does.
The base class with a get_info
implementation is the one for the storage backend. I think you may be confusing it with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh ok, I see. Then my previous comment was also wrong, haha. Ok, all good now then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were confused 😉 you put get_info
and maintain
under SqliteBackendQueryBuilder
not SqliteZipBackend
Its ok, I fixed it in #5375
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were confused 😉 you put get_info
and maintain
under SqliteBackendQueryBuilder
not SqliteZipBackend
Its ok, I fixed it in #5375
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Fixes #5384
This utility was used in
verdi archive inspect
andverdi storage info
to give information about the contents of an archive or normal storage
backend. Historically, a distinction used to be made between the contents
of the database and the repository, but in v2.0 these are unified by the
storage backend. The information about the repository is already
retrieved through
StorageBackend.get_info
but the contents of thedatabase were still built through the
get_database_summary
function.The implementation is moved to the
get_orm_entity_overview
method ofthe
StorageBackend
base class. This is done because the implementationis currently still independent of the storage backend implementation. So
for now the implementations can simply call to this method in their
get_info
implementation, but it leaves the option to override it witha more performant implementation that doesn't go through the shared ORM
of AiiDA.