-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IMPROVE: Lock mechanism and migration to storage
class for the maintain operation
#5331
IMPROVE: Lock mechanism and migration to storage
class for the maintain operation
#5331
Conversation
6eafc61
to
ed06bac
Compare
Codecov Report
@@ Coverage Diff @@
## develop #5331 +/- ##
===========================================
+ Coverage 79.45% 82.13% +2.68%
===========================================
Files 513 533 +20
Lines 36747 38497 +1750
===========================================
+ Hits 29195 31617 +2422
+ Misses 7552 6880 -672
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ramirezfranciscof . Agree with your discussion on where to put the lock. Just have some minor suggestions on implementation.
aiida/backends/control.py
Outdated
if full: | ||
with ProfileAccessManager(profile).lock(): | ||
perform_tasks() | ||
else: | ||
perform_tasks() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this kind of paradigm where the contextmanager is conditional or optional, I personally prefer the following approach:
if full: | |
with ProfileAccessManager(profile).lock(): | |
perform_tasks() | |
else: | |
perform_tasks() | |
from contextlib import nullcontext | |
if full: | |
context = ProfileAccessManager(profile).lock | |
else: | |
context = nullcontext | |
with context(): | |
unreferenced_objects = get_unreferenced_keyset(aiida_backend=backend) | |
MAINTAIN_LOGGER.info(f'Deleting {len(unreferenced_objects)} unreferenced objects ...') | |
if not dry_run: | |
repository.delete_objects(list(unreferenced_objects)) | |
MAINTAIN_LOGGER.info('Starting repository-specific operations ...') | |
repository.maintain(live=not full, dry_run=dry_run, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, I didn't know this nullcontext
option. That is neat, thanks!
this seems slightly off, and maybe not fitting into #5172, please don't merge until we have can discuss a bit cheers |
If I understand correctly, after #5172 the backend will have the link to its profile via the Also, now that I think of it, I am working under the assumption that the relationship between backends and profiles is 1:1. If a profile can have multiple backends I think all of this still works, but if a single backend instance can service multiple profiles then that could be problematic. Do we need to schedule a zoom call to discuss this? @sphuber @chrisjsewell |
Probably a good idea |
yep 👍 |
Let's schedule a meeting anyway and we can discuss both. If it is such a big PR as you are making it appear to be, probably it would be best to go through it in a call and review that way, rather than me going line by line. This will also be useful for @ramirezfranciscof to understand and we can then also discuss this PR right after. @ramirezfranciscof can you send out a doodle? |
yep, just not today; no point trying to explain it when its half done |
FYI @chrisjsewell I have changed this following the discussion with @sphuber here. Now I pass the Would that solve this particular incompatibility or is it still problematic with the other changes? |
ermm, this seems worse 😬 you could pass a profile, but then end up with a backend that does not actually correspond to that profile. |
To note, these are obviously still very helpful for users, i.e. that are just working with a globally loaded profile/backend, and don't want to have to worry about this kind of thing. |
Mmm unless I misunderstood what @sphuber told me, after you use the |
The other key thing here is that, after #4985, it will be perfectly possible to use multiple backends during the same python session, you don't have to load anything. with create_sqlalchemy_engine(profile).connect() as connection:
# do whatever I want to do with my database connection, then reliquish it
... |
101a2cc
to
77316f4
Compare
Hey @chrisjsewell, couple of questions:
|
Heya
I would say it should not be merged until after #5364 (that is just moving some things around rather than any logic change)
Very much no 😬; this is a method that is "private" to the I'll have to have a look through the code here, to understand the best way to implement this; |
I agree that with the work to properly abstract the storage we should not start to leak implementation details again. So I would also vote for something like a generic |
77316f4
to
22f960e
Compare
Ok, I can move the method to the |
2e78834
to
c32328f
Compare
abe9f80
to
f8772ca
Compare
Moved the general purpose functions into methods of the storage class Use the lock mechanism for the maintain operation
f8772ca
to
4e76e51
Compare
storage
class for the maintain operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ramirezfranciscof . Had a quick look and most of it seems fine to me.
7149075
to
c98fb5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright thanks @ramirezfranciscof looks good to me
'database': get_database_summary(QueryBuilder, statistics), | ||
'repository': get_repository_info(statistics=statistics), | ||
'repository': storage.get_info(statistics=statistics), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't quite make sense 😬
get_info
doctring: "Return general information on the storage.", i.e. not just about the repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized this, but considered this out of the scope of this PR, which is just introducing the use of the locking mechanism. I would leave this for another PR, which then also moves get_database_summary
into the StorageBackend
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we open an issue for this then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I wouldn't really say it is out of scope, because it is this PR that is introducing the "semantic" bug that was not there before.
But won't block this PR 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see @sphuber already sumarized better than I can the problem with get_database_summary
using more high level methods to get its information. I'll just mention that I see usefulness both in being able to get some statistics in a "backend" agnostic way, as well as being able to get more details specific to the backend. We might need to just think a bit if there is some structural design that allows to have both.
In the meantime, I generalized storage.get_info
so now the docstring is correct, it returns the information of the storage: that is, both the repository
and the database
(it just happens so that the database
is currently empty). Then the storage_info
command just calls get_database_summary
to add a summary
key to the database
after it is returned. I believe this is the way in which we keep all the prior information with minimal changes that still leaves the methods in the best baseline to later re-organize the content of storage.get_info
and get_database_summary
.
d98e301
a nested dict with the relevant information (with at least one key for `database` | ||
and one for `repository`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the abstract class, it should not require that these keys are present. There could be some storage that doesn't have a separate database and repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm ok, true, but as it is right now I need at least database
to add the summary
entry. Options for this:
- We just accept this description for now and once we deal with
get_database_summary
we change it. - I put the output of
get_database_summary
in a separateoutput_dict['database_summary']
instead ofoutput_dict['database']['summary']
. - I put the output of
get_database_summary
inoutput_dict['database']
, risking overwriting what could come in that key fromstorage.get_info
(which is currently empty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah, there is also the option of adding the database
key if it is not there. I will default to that now and push, let me know if you prefer one of the other 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me to merge. I'll take care of moving the database info logic in another PR.
@chrisjsewell you happy as well? |
Two relevant notes about the modifications:
Now
repository_maintain
receives aManager
instance insteadof a
Backend
instance, because it needs to also get access tothe
Profile
object to acquire the lock.Where exactly to acquire the lock is not an obvious decission.
I decided to do so inside the
repository_maintain
method sincethis is what would be called both from the CLI and from the ORM.
Also, although technically the lock is not needed to do the
deletion of unreferenced objects, I decided to put this inside
of it to reduce the time between the call to the method and the
acquisition of the lock to the minimum (users might feel lied to
if the method warns that it will block the profile, run for a
couple of minutes, and then fail because a profile was loaded
while the deletion of unreferenced objects was taking place).