IMPROVE: Lock mechanism and migration to `storage` class for the maintain operation #5331

ramirezfranciscof · 2022-01-26T08:52:04Z

Two relevant notes about the modifications:

Now repository_maintain receives a Manager instance instead
of a Backend instance, because it needs to also get access to
the Profile object to acquire the lock.
Where exactly to acquire the lock is not an obvious decission.
I decided to do so inside the repository_maintain method since
this is what would be called both from the CLI and from the ORM.
Also, although technically the lock is not needed to do the
deletion of unreferenced objects, I decided to put this inside
of it to reduce the time between the call to the method and the
acquisition of the lock to the minimum (users might feel lied to
if the method warns that it will block the profile, run for a
couple of minutes, and then fail because a profile was loaded
while the deletion of unreferenced objects was taking place).

codecov · 2022-01-26T10:39:24Z

Codecov Report

Merging #5331 (101a2cc) into develop (250dd96) will increase coverage by 2.68%.
The diff coverage is 100.00%.

❗ Current head 101a2cc differs from pull request most recent head 71e0dc0. Consider uploading reports for the commit 71e0dc0 to get more accurate results

@@             Coverage Diff             @@
##           develop    #5331      +/-   ##
===========================================
+ Coverage    79.45%   82.13%   +2.68%     
===========================================
  Files          513      533      +20     
  Lines        36747    38497    +1750     
===========================================
+ Hits         29195    31617    +2422     
+ Misses        7552     6880     -672

Flag	Coverage Δ
django	`77.22% <100.00%> (?)`
sqlalchemy	`76.52% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
aiida/backends/control.py	`97.78% <100.00%> (ø)`
aiida/cmdline/commands/cmd_storage.py	`97.30% <100.00%> (+1.20%)`	⬆️
aiida/cmdline/params/options/commands/setup.py	`54.55% <0.00%> (-39.33%)`	⬇️
aiida/cmdline/commands/cmd_setup.py	`56.87% <0.00%> (-37.87%)`	⬇️
aiida/manage/external/postgres.py	`63.10% <0.00%> (-20.23%)`	⬇️
aiida/cmdline/params/types/profile.py	`64.45% <0.00%> (-6.66%)`	⬇️
aiida/cmdline/params/types/strings.py	`73.59% <0.00%> (-5.66%)`	⬇️
aiida/orm/utils/log.py	`76.67% <0.00%> (-4.81%)`	⬇️
aiida/tools/archive/implementations/sqlite/main.py	`87.18% <0.00%> (-4.24%)`	⬇️
aiida/orm/implementation/querybuilder.py	`90.20% <0.00%> (-3.42%)`	⬇️
... and 378 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 250dd96...71e0dc0. Read the comment docs.

sphuber

Thanks @ramirezfranciscof . Agree with your discussion on where to put the lock. Just have some minor suggestions on implementation.

aiida/backends/control.py

sphuber · 2022-01-26T20:21:25Z

aiida/backends/control.py

+    if full:
+        with ProfileAccessManager(profile).lock():
+            perform_tasks()
+    else:
+        perform_tasks()


For this kind of paradigm where the contextmanager is conditional or optional, I personally prefer the following approach:

Suggested change

if full:

with ProfileAccessManager(profile).lock():

perform_tasks()

else:

perform_tasks()

from contextlib import nullcontext

if full:

context = ProfileAccessManager(profile).lock

else:

context = nullcontext

with context():

unreferenced_objects = get_unreferenced_keyset(aiida_backend=backend)

MAINTAIN_LOGGER.info(f'Deleting {len(unreferenced_objects)} unreferenced objects ...')

if not dry_run:

repository.delete_objects(list(unreferenced_objects))

MAINTAIN_LOGGER.info('Starting repository-specific operations ...')

repository.maintain(live=not full, dry_run=dry_run, **kwargs)

Uhm, I didn't know this nullcontext option. That is neat, thanks!

tests/cmdline/commands/test_storage.py

chrisjsewell · 2022-01-31T08:40:06Z

Now repository_maintain receives a Manager instance instead of a Backend instance, because it needs to also get access to the Profile object to acquire the lock.

this seems slightly off, and maybe not fitting into #5172, please don't merge until we have can discuss a bit cheers

ramirezfranciscof · 2022-01-31T08:50:05Z

Now repository_maintain receives a Manager instance instead of a Backend instance, because it needs to also get access to the Profile object to acquire the lock.

this seems slightly off, and maybe not fitting into #5172, please don't merge until we have can discuss a bit cheers

If I understand correctly, after #5172 the backend will have the link to its profile via the _profile attribute, right? At that point then passing the backend can suffice. The problem is that currently there is no direct links between profiles and backends, the only thing that holds a reference to both is the Manager, correct?

Also, now that I think of it, I am working under the assumption that the relationship between backends and profiles is 1:1. If a profile can have multiple backends I think all of this still works, but if a single backend instance can service multiple profiles then that could be problematic.

Do we need to schedule a zoom call to discuss this? @sphuber @chrisjsewell

sphuber · 2022-01-31T08:57:00Z

Do we need to schedule a zoom call to discuss this? @sphuber @chrisjsewell

Probably a good idea

chrisjsewell · 2022-01-31T08:58:59Z

Do we need to schedule a zoom call to discuss this? @sphuber @chrisjsewell

yep 👍
Lets wait until #4985 is ready though (hopefully by the end of today), as that will be very relevant

sphuber · 2022-01-31T09:03:32Z

Let's schedule a meeting anyway and we can discuss both. If it is such a big PR as you are making it appear to be, probably it would be best to go through it in a call and review that way, rather than me going line by line. This will also be useful for @ramirezfranciscof to understand and we can then also discuss this PR right after. @ramirezfranciscof can you send out a doodle?

chrisjsewell · 2022-01-31T09:05:31Z

Let's schedule a meeting anyway and we can discuss both.

yep, just not today; no point trying to explain it when its half done

ramirezfranciscof · 2022-01-31T10:15:44Z

Now repository_maintain receives a Manager instance instead of a Backend instance, because it needs to also get access to the Profile object to acquire the lock.

this seems slightly off, and maybe not fitting into #5172, please don't merge until we have can discuss a bit cheers

FYI @chrisjsewell I have changed this following the discussion with @sphuber here.

Now I pass the Profile instance around and get the corresponding Backend it uses by loading that profile and calling get_manager().get_backend().

Would that solve this particular incompatibility or is it still problematic with the other changes?

chrisjsewell · 2022-01-31T10:23:14Z

Now I pass the Profile instance around and get the corresponding Backend it uses by loading that profile and calling get_manager().get_backend().

ermm, this seems worse 😬 you could pass a profile, but then end up with a backend that does not actually correspond to that profile.
the main thing here, is that I'm trying to move away from "hard-coding" the use of global variable (which are the "evil"), i.e. PROFILE,BACKEND,ENGINE,SESSION_FACTORY, where possible you should be explicitly passing profile/backend and never calling get_profile(),get_manager(),get_backend() etc

chrisjsewell · 2022-01-31T10:29:33Z

the main thing here, is that I'm trying to move away from "hard-coding" the use of global variable

To note, these are obviously still very helpful for users, i.e. that are just working with a globally loaded profile/backend, and don't want to have to worry about this kind of thing.
But in our "internal API", we should avoid it like the plague 😉

ramirezfranciscof · 2022-01-31T10:38:41Z

ermm, this seems worse 😬 you could pass a profile, but then end up with a backend that does not actually correspond to that profile.

Mmm unless I misunderstood what @sphuber told me, after you use the load_profile with a given profile the Manager singleton will return the corresponding backend.

chrisjsewell · 2022-01-31T10:44:12Z

The other key thing here is that, after #4985, it will be perfectly possible to use multiple backends during the same python session, you don't have to load anything.
This is because, with django you have to "setup" a database connection, to a single database which can then never be undone (in that python session). But with sqlalchemy, you simply just do, e.g.

with create_sqlalchemy_engine(profile).connect() as connection:
    # do whatever I want to do with my database connection, then reliquish it
   ...

ramirezfranciscof · 2022-02-21T11:42:03Z

Hey @chrisjsewell, couple of questions:

Is this still blocked or is it ok to proceed?
Is the create_sqlalchemy_engine method intended to be the default way to access AiiDA profiles how you used it in the previous message or is that still WIP? (I ask because I only saw it in aiida/cmdline/commands/cmd_devel.py and in some internals of SQLA). I'm wondering if I should wrap the content of repository_maintain in a with create_sqlalchemy_engine statement instead of using load_profile, or if I'm misunderstanding the intended use of this context.

chrisjsewell · 2022-02-21T11:54:00Z

Heya

Is this still blocked or is it ok to proceed?

I would say it should not be merged until after #5364 (that is just moving some things around rather than any logic change)

Is the create_sqlalchemy_engine method intended to be the default way to access AiiDA profiles

Very much no 😬; this is a method that is "private" to the psql_dos, any access to the profile should be storage backend agnostic.

I'll have to have a look through the code here, to understand the best way to implement this;
I'm sceptical that there should even be this "isolated" aiida/backends/control.py::repository_maintain function, and instead there should just be a maintain abstract method on the StorageBackend, that is then implemented for e.g. PsqlDosBackend

sphuber · 2022-02-21T12:44:41Z

I agree that with the work to properly abstract the storage we should not start to leak implementation details again. So I would also vote for something like a generic Storage.maintain which for the PsqlDos implementation then includes the maintenance of the disk-object store.

ramirezfranciscof · 2022-02-21T14:12:52Z

Ok, I can move the method to the PsqlDosBackend. Is there any logger currently being used for storage procedures or do I create a new one there?

Moved the general purpose functions into methods of the storage class Use the lock mechanism for the maintain operation

ramirezfranciscof · 2022-02-22T13:12:52Z

I would say it should not be merged until after #5364 (that is just moving some things around rather than any logic change)

I see #5364 is now merged, does that mean we can proceed with this?

I already moved the methods to PsqlDosBackend class.

sphuber

Thanks @ramirezfranciscof . Had a quick look and most of it seems fine to me.

aiida/cmdline/commands/cmd_storage.py

aiida/orm/implementation/storage_backend.py

tests/cmdline/commands/test_storage.py

tests/storage/psql_dos/test_backend.py

sphuber

Alright thanks @ramirezfranciscof looks good to me

chrisjsewell · 2022-02-23T10:31:12Z

aiida/cmdline/commands/cmd_storage.py

            'database': get_database_summary(QueryBuilder, statistics),
-            'repository': get_repository_info(statistics=statistics),
+            'repository': storage.get_info(statistics=statistics),


This doesn't quite make sense 😬
get_info doctring: "Return general information on the storage.", i.e. not just about the repository

I realized this, but considered this out of the scope of this PR, which is just introducing the use of the locking mechanism. I would leave this for another PR, which then also moves get_database_summary into the StorageBackend.

can we open an issue for this then

Personally, I wouldn't really say it is out of scope, because it is this PR that is introducing the "semantic" bug that was not there before.
But won't block this PR 👍

I see @sphuber already sumarized better than I can the problem with get_database_summary using more high level methods to get its information. I'll just mention that I see usefulness both in being able to get some statistics in a "backend" agnostic way, as well as being able to get more details specific to the backend. We might need to just think a bit if there is some structural design that allows to have both.

In the meantime, I generalized storage.get_info so now the docstring is correct, it returns the information of the storage: that is, both the repository and the database (it just happens so that the database is currently empty). Then the storage_info command just calls get_database_summary to add a summary key to the database after it is returned. I believe this is the way in which we keep all the prior information with minimal changes that still leaves the methods in the best baseline to later re-organize the content of storage.get_info and get_database_summary.

sphuber · 2022-02-23T11:18:37Z

aiida/orm/implementation/storage_backend.py

+            a nested dict with the relevant information (with at least one key for `database`
+            and one for `repository`).


Since this is the abstract class, it should not require that these keys are present. There could be some storage that doesn't have a separate database and repository

Mmm ok, true, but as it is right now I need at least database to add the summary entry. Options for this:

We just accept this description for now and once we deal with get_database_summary we change it.

I put the output of get_database_summary in a separateoutput_dict['database_summary'] instead of output_dict['database']['summary'].

I put the output of get_database_summary in output_dict['database'], risking overwriting what could come in that key from storage.get_info (which is currently empty).

Bah, there is also the option of adding the database key if it is not there. I will default to that now and push, let me know if you prefer one of the other 3.

sphuber

Fine for me to merge. I'll take care of moving the database info logic in another PR.

sphuber · 2022-02-23T12:38:40Z

@chrisjsewell you happy as well?

ramirezfranciscof force-pushed the incorporate_lock branch 3 times, most recently from 6eafc61 to ed06bac Compare January 26, 2022 10:15

ramirezfranciscof requested a review from sphuber January 26, 2022 11:15

sphuber requested changes Jan 26, 2022

View reviewed changes

chrisjsewell added the pr/blocked PR is blocked by another PR that should be merged first label Jan 31, 2022

chrisjsewell removed the pr/blocked PR is blocked by another PR that should be merged first label Feb 17, 2022

ramirezfranciscof force-pushed the incorporate_lock branch from 101a2cc to 77316f4 Compare February 21, 2022 10:19

ramirezfranciscof force-pushed the incorporate_lock branch from 77316f4 to 22f960e Compare February 21, 2022 13:54

ramirezfranciscof force-pushed the incorporate_lock branch 5 times, most recently from 2e78834 to c32328f Compare February 21, 2022 16:49

ramirezfranciscof force-pushed the incorporate_lock branch 4 times, most recently from abe9f80 to f8772ca Compare February 22, 2022 12:29

Improve storage maintain related methods

4e76e51

Moved the general purpose functions into methods of the storage class Use the lock mechanism for the maintain operation

ramirezfranciscof force-pushed the incorporate_lock branch from f8772ca to 4e76e51 Compare February 22, 2022 12:30

ramirezfranciscof changed the title ~~IMPROVE: Use the lock mechanism for the maintain operation~~ IMPROVE: Lock mechanism and migration to storage class for the maintain operation Feb 22, 2022

ramirezfranciscof requested review from chrisjsewell and sphuber February 22, 2022 15:29

sphuber requested changes Feb 22, 2022

View reviewed changes

Apply feedback from PR review

c98fb5c

ramirezfranciscof force-pushed the incorporate_lock branch from 7149075 to c98fb5c Compare February 23, 2022 08:55

ramirezfranciscof requested a review from sphuber February 23, 2022 09:35

sphuber previously approved these changes Feb 23, 2022

View reviewed changes

chrisjsewell suggested changes Feb 23, 2022

View reviewed changes

Merge branch 'develop' into incorporate_lock

98f6bc4

chrisjsewell mentioned this pull request Feb 23, 2022

Move the functionality of aiida.cmdline.utils.common.get_database_summary to StorageBackend.get_info #5384

Closed

chrisjsewell previously approved these changes Feb 23, 2022

View reviewed changes

Generalize the get_info method.

d98e301

ramirezfranciscof dismissed stale reviews from chrisjsewell and sphuber via d98e301 February 23, 2022 11:13

sphuber reviewed Feb 23, 2022

View reviewed changes

Final details

1b0c413

sphuber approved these changes Feb 23, 2022

View reviewed changes

chrisjsewell approved these changes Feb 23, 2022

View reviewed changes

sphuber merged commit f3ffacf into aiidateam:develop Feb 23, 2022

sphuber deleted the incorporate_lock branch February 23, 2022 12:51

chrisjsewell mentioned this pull request Mar 12, 2022

🔀 MERGE: Release v2.0.0b1 #5426

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IMPROVE: Lock mechanism and migration to `storage` class for the maintain operation #5331

IMPROVE: Lock mechanism and migration to `storage` class for the maintain operation #5331

ramirezfranciscof commented Jan 26, 2022

codecov bot commented Jan 26, 2022 •

edited

Loading

sphuber left a comment

sphuber Jan 26, 2022

ramirezfranciscof Jan 31, 2022

chrisjsewell commented Jan 31, 2022

ramirezfranciscof commented Jan 31, 2022

sphuber commented Jan 31, 2022

chrisjsewell commented Jan 31, 2022

sphuber commented Jan 31, 2022

chrisjsewell commented Jan 31, 2022 •

edited

Loading

ramirezfranciscof commented Jan 31, 2022

chrisjsewell commented Jan 31, 2022 •

edited

Loading

chrisjsewell commented Jan 31, 2022

ramirezfranciscof commented Jan 31, 2022

chrisjsewell commented Jan 31, 2022 •

edited

Loading

ramirezfranciscof commented Feb 21, 2022

chrisjsewell commented Feb 21, 2022

sphuber commented Feb 21, 2022

ramirezfranciscof commented Feb 21, 2022

ramirezfranciscof commented Feb 22, 2022

sphuber left a comment

sphuber left a comment

chrisjsewell Feb 23, 2022

sphuber Feb 23, 2022

chrisjsewell Feb 23, 2022

sphuber Feb 23, 2022

chrisjsewell Feb 23, 2022

ramirezfranciscof Feb 23, 2022

sphuber Feb 23, 2022

ramirezfranciscof Feb 23, 2022

ramirezfranciscof Feb 23, 2022

sphuber left a comment

sphuber commented Feb 23, 2022

-    if full:
-        with ProfileAccessManager(profile).lock():
-            perform_tasks()
-    else:
-        perform_tasks()
+    from contextlib import nullcontext
+    if full:
+        context = ProfileAccessManager(profile).lock
+    else:
+        context = nullcontext
+    with context():
+        unreferenced_objects = get_unreferenced_keyset(aiida_backend=backend)
+        MAINTAIN_LOGGER.info(f'Deleting {len(unreferenced_objects)} unreferenced objects ...')
+        if not dry_run:
+            repository.delete_objects(list(unreferenced_objects))
+        MAINTAIN_LOGGER.info('Starting repository-specific operations ...')
+        repository.maintain(live=not full, dry_run=dry_run, **kwargs)

		a nested dict with the relevant information (with at least one key for `database`
		and one for `repository`).

IMPROVE: Lock mechanism and migration to storage class for the maintain operation #5331

IMPROVE: Lock mechanism and migration to storage class for the maintain operation #5331

Conversation

ramirezfranciscof commented Jan 26, 2022

codecov bot commented Jan 26, 2022 • edited Loading

Codecov Report

sphuber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisjsewell commented Jan 31, 2022

ramirezfranciscof commented Jan 31, 2022

sphuber commented Jan 31, 2022

chrisjsewell commented Jan 31, 2022

sphuber commented Jan 31, 2022

chrisjsewell commented Jan 31, 2022 • edited Loading

ramirezfranciscof commented Jan 31, 2022

chrisjsewell commented Jan 31, 2022 • edited Loading

chrisjsewell commented Jan 31, 2022

ramirezfranciscof commented Jan 31, 2022

chrisjsewell commented Jan 31, 2022 • edited Loading

ramirezfranciscof commented Feb 21, 2022

chrisjsewell commented Feb 21, 2022

sphuber commented Feb 21, 2022

ramirezfranciscof commented Feb 21, 2022

ramirezfranciscof commented Feb 22, 2022

sphuber left a comment

Choose a reason for hiding this comment

sphuber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sphuber left a comment

Choose a reason for hiding this comment

sphuber commented Feb 23, 2022

IMPROVE: Lock mechanism and migration to `storage` class for the maintain operation #5331

IMPROVE: Lock mechanism and migration to `storage` class for the maintain operation #5331

codecov bot commented Jan 26, 2022 •

edited

Loading

chrisjsewell commented Jan 31, 2022 •

edited

Loading

chrisjsewell commented Jan 31, 2022 •

edited

Loading

chrisjsewell commented Jan 31, 2022 •

edited

Loading