♻️ REFACTOR: Profile storage backend configuration #5320

chrisjsewell · 2022-01-20T09:59:24Z

This PR refactors the Profile class, and provides a complimentary config migration, in two steps (commits):

Storage backend

Configuration for the storage backend is changed from:

{
      "AIIDADB_BACKEND": "django",
      "AIIDADB_ENGINE": "postgresql_psycopg2",
      "AIIDADB_PASS": "some_random_password",
      "AIIDADB_NAME": "aiidadb_qs_some_user",
      "AIIDADB_HOST": "localhost",
      "AIIDADB_PORT": "5432",
      "AIIDADB_USER": "aiida_qs_greschd",
      "AIIDADB_REPOSITORY_URI": "file:////home/some_user/.aiida/repository-quicksetup/"
}

to

{
  "storage": {
      "backend": "django",
      "config": {
          "database_engine": "postgresql_psycopg2",
          "database_password": "some_random_password",
          "database_name": "aiidadb_qs_some_user",
          "database_hostname": "localhost",
          "database_port": "5432",
          "database_username": "aiida_qs_greschd",
          "repository_uri": "file:////home/some_user/.aiida/repository-quicksetup/"
      }
  }
}

Storage configuration should be specific to the storage backend (and in fact all operations on storage should go via the backend).
It is envisaged that eventually (as part of #5172) the storage_config will be directly parsed to the backend for validation/instantiation, rather than indirectly obtaining it from the (global) profile, e.g. something like:

backend_cls = get_backend_type(profile["storage_backend"])
backend = backend_cls(config=profile["storage_config"])

Rabbitmq configuration

Configuration for the storage backend is changed from:

{
      "broker_protocol": "amqp",
      "broker_username": "guest",
      "broker_password": "guest",
      "broker_host": "127.0.0.1",
      "broker_port": 5672,
      "broker_virtual_host": ""
}

to

{
  "process_control": {
      "backend": "rabbitmq",
      "config": {
          "broker_protocol": "amqp",
          "broker_username": "guest",
          "broker_password": "guest",
          "broker_host": "127.0.0.1",
          "broker_port": 5672,
          "broker_virtual_host": ""
      }
  }
}

It is highly possible that, in the future, RabbitMQ will be replaced (see aiidateam/AEP#30).
This change begins to move aiida-core away from "hard-coding" its use.
It also makes clearer, the purpose of these configuration variables.

This PR also removes the Profiles behaviour, to strip unknown keys from the config (which then may be subsequently written to disk). This stripping is unnecessary, and the keys may be there to aid in "lose-less" upgrade/downgrade of the config.

codecov · 2022-01-20T10:33:34Z

Codecov Report

Merging #5320 (05aa2d1) into develop (fe1acf9) will decrease coverage by 0.03%.
The diff coverage is 79.32%.

@@             Coverage Diff             @@
##           develop    #5320      +/-   ##
===========================================
- Coverage    82.13%   82.11%   -0.02%     
===========================================
  Files          533      533              
  Lines        38478    38425      -53     
===========================================
- Hits         31601    31548      -53     
  Misses        6877     6877

Flag	Coverage Δ
django	`77.18% <79.32%> (-0.02%)`	⬇️
sqlalchemy	`76.48% <77.94%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
aiida/cmdline/commands/cmd_setup.py	`56.87% <0.00%> (+5.99%)`	⬆️
aiida/cmdline/commands/cmd_status.py	`84.77% <ø> (ø)`
aiida/cmdline/params/options/commands/setup.py	`54.55% <0.00%> (-2.29%)`	⬇️
aiida/cmdline/params/types/profile.py	`64.45% <0.00%> (ø)`
aiida/manage/configuration/__init__.py	`83.73% <50.00%> (+0.13%)`	⬆️
aiida/manage/configuration/config.py	`89.48% <71.43%> (+0.30%)`	⬆️
aiida/manage/external/postgres.py	`63.10% <75.00%> (+0.14%)`	⬆️
aiida/backends/utils.py	`93.34% <83.34%> (-6.66%)`	⬇️
aiida/manage/configuration/profile.py	`89.60% <84.91%> (-6.67%)`	⬇️
...iida/manage/configuration/migrations/migrations.py	`93.89% <90.91%> (-0.35%)`	⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe1acf9...05aa2d1. Read the comment docs.

sphuber · 2022-01-21T19:14:23Z

I will wait with full review once you have merged the other PR and rebased this. That should make it easier. But I would already mention one point that I would consider changing and that is the structure of the new config. I am perfectly happy with grouping it under a storage key, but I think it would make sense to have separate keys within that for the database and the repository. In short I would propose the structure:

'storage': {
    'backend': 'django',
    'database': {
        ...
    },
    'repository': {
        ...
    }
}

The reason is that in the future we want to have the possibility to support other repository implementations and by separating those configuration details from the database one, makes it clearer. I don't think we will be introducing any backends that are purely database and no longer have a separate file repository so to me it makes sense to separate them.

In that same light, it is a bit incomplete to have the "backend" name just reference the database backend. Really it is a combination of a backend for the relational database and the file repository. Should we have a 'backend' key for both the database and repository? For the repository for now it will always be disk-objectstore, or something like that, but that would leave it open to be configured to a different backend implementation at some point.

chrisjsewell · 2022-01-21T19:22:27Z

@sphuber I very much disagree, as we have already discussed, a storage backend is both the database AND the repository, they are intrinsically linked. For example, you have a single migration for both.
Also the archive is a backend which does not have separation between the repository and database; they are a single file.
You cannot swap just repository implementations in the same backend class, this would be a separate backend type.
The fact that there is a separate repository and database is really just an implementation detail

chrisjsewell · 2022-01-21T20:22:49Z

In that same light, it is a bit incomplete to have the "backend" name just reference the database backend

yes exactly, that is incorrect, and something I will be looking to "fix" at a later date

chrisjsewell · 2022-01-21T20:35:50Z

I'd also note, we now have verdi storage, for which all commands act on the db+repo as a singular backend "entity".

In fact, I was going to open an issue, saying that all references to "backend" (which is just too nondescript) should be changed to "storage"

sphuber · 2022-01-21T21:31:38Z

a storage backend is both the database AND the repository

I agree that the our "storage" comprises to components and form a whole and that we talk in term of the storage as one thing from the users perspective. That being said...

The fact that there is a separate repository and database is really just an implementation detail

That may well be the case, but there are still two different parts to the implementation that need to be configured. There is nothing preventing us using the PostgreSQL database with a file repository implemented on an actual object store, like say for example S3. Would you call that backend then still "sqlalchemy"? This is incorrect and there is no reason to limit ourselves to this.

So really you seem to be agreeing with me. The storage is one piece from the user's perspective in terms of where data gets stored (so there is one key for it in the configuration) but it comprises to separate data stores whose implementations can be mixed. Of course once data is stored, they are intrinsically linked. But you can use different backend implementations and so it makes perfect sense to have two separate config dictionaries, within the storage configuration as a whole.

chrisjsewell · 2022-01-21T21:43:39Z

The archive is a storage backend, it has one config variable, the path to the archive. Should that variable then go under database or repository?

chrisjsewell · 2022-01-21T21:49:17Z

The "django" and "sqlalchemy" backend names (again I would not personally have called them this) each point to a single backend class DjangoBackend/SqlaBackend (currently hard-coded, but in the future will likely be an entry-point name). This is what will get initialized with the storage configuration.
Unless you are insinuating that we should split these classes into two classes, then there should be one storage configuration: SqlaBackend(storage_config)

sphuber · 2022-01-21T22:08:25Z

Unless you are insinuating that we should split these classes into two classes, then there should be one storage configuration: SqlaBackend(storage_config)

In my view, they are already separated in different classes. The database and repository both have their own separate interface, their own implementations and their own configuration (per implementation). The storage backend is just a wrapper class that wraps the two.

The "django" and "sqlalchemy" backend names (again I would not personally have called them this) each point to a single backend class DjangoBackend/SqlaBackend (currently hard-coded, but in the future will likely be an entry-point name). This is what will get initialized with the storage configuration.

What I am saying is that the names DjangoBackend and SqlaBackend are misnomers. The Django and Sqla part only refer to the backend implementation of the relational database, but it implicitly assumes the DiskObjectStoreRepositoryBackend for the file repository. This may not be a problem now since we only have one repository implementation, but we will soon have the need for another file repository implementation that is also perfectly compatible with the SqlAlchemy implementation for the database backend.

I don't see the problem to use additional substructuring in the storage config where it makes sense. It doesn't have to be required. For example, why does the following not make sense:

{
    'profile_one': {
        'broker': {},
        'storage': {
            'type': 'aiida.storage:core.archive',
            'filepath': '/some/path/archive.aiida',
        },
        ...
    },
    'profile_two': {
        'broker': {},
        'storage': {
            'type': 'aiida.storage:core.sqla-dos'
            'database': {
                'hostname': 'localhost',
                ...
            },
            'repository': {
                'filepath_container': '/some/filepath/container'
            }
        },
        ...
    },
    'profile_three': {
        'broker': {},
        'storage': {
            'type': 'aiida.storage:core.sqla-s3'
            'database': {
                'hostname': 'localhost',
                ...
            },
            'repository': {
                'hostname': 's3.aiida.net',
                'username':
                'container': '/some/filepath/container'
            }
        },
        ...
    }
}

I think we are really saying the same, I just think it is useful and makes sense to keep the exact structure of the storage key flexible and allow additional structuring where it makes sense.

chrisjsewell · 2022-01-21T22:11:54Z

why does the following not make sense

because you need to load the storage backend class, to validate the schema of the storage config, they are two distinct things

chrisjsewell · 2022-01-21T22:15:12Z

The storage backend is just a wrapper class that wraps the two.

No. The storage backend is the interface to the storage. You either completely separate repository and database: separate versions, separate migrations, separate CLI commands, etc, or you treat them as one, you can't have it both ways.

chrisjsewell · 2022-01-21T22:17:18Z

What I am saying is that the names DjangoBackend and SqlaBackend are misnomers.

again, you keep fixating on the names, and I keep telling you they are the wrong names, and they should not have been named this in the first place.

sphuber

I really think that we are agreeing here in our long discussion. All I was saying is that I think it may make sense to allow for additional nesting under the storage_config key, that is all. It wouldn't stop anything from what you are doing here. But whatever, let's move on.

Since we are doing this, I would really do the same for the broker key and make that a dictionary. Besides that, there are just some minor suggestions and questions.

aiida/manage/configuration/migrations/migrations.py

aiida/manage/configuration/profile.py

sphuber · 2022-01-23T18:43:20Z

aiida/manage/configuration/profile.py

+        # to-do currently this is not actually used anywhere,
+        # because e.g. the documentation is loaded with an incomplete (dummy) configuration
+        # in actual usage though, this could lead to later key errors, when retrieving an attribute
+        if validate and not set(config.keys()).issuperset(self.REQUIRED_KEYS):


Where is this excepting then if validate were to be True? If it is not needed, shouldn't we just get rid of validate?

As it says in the to-do comment above, I think this should be on always, since otherwise you could have obscure failures, where you get KeyError attribute retrieval on e.g. Profile.storage_backend. The reason why it is not at present, is that in some testing fixtures and the load_documentation function, an incomplete config is supplied to Profile, so this would fail. We should maybe just add "dummy" config to them (for e.g. rabbitmq config), so we can always run this validation.

I read the to-do comment but I don't understand it. If you don't currently use the validate keyword anywhere, then why not just remove it. You say it is needed for the load_documentation one, but that means it is being used and so the to-do is incorrect and can be removed?

In ba998ffbc76785c1b5209089dc6ce42c2aa29bc9, I have "fixed" places that were loading incomplete config, and turned on validation by default.

Cheers, I think this still needs to be revisited at some point, but we can do that later.

chrisjsewell · 2022-01-23T19:13:47Z

I really think that we are agreeing here in our long discussion.

yeh just about 😄 I just wanted to stress that anything under the storage_config key will just be parsed to the storage_backend as a single "entity" (and validated by it similarly), so having nesting doesn't really matter.

I would really do the same for the broker key and make that a dictionary

Indeed, as was going to do that as a separate PR 👍

sphuber · 2022-01-23T19:16:51Z

Indeed, as was going to do that as a separate PR +1

But wouldn't that require a new separate config migration? Might as well just do it here.

chrisjsewell · 2022-01-23T19:30:08Z

But wouldn't that require a new separate config migration? Might as well just do it here.

Well not if you just append it to this config migration. After all people should not be using develop for actual work 😉.

Its more than just a config migration, and I don't think the two changes should be conflated; they should be separate commits.
If you want, I can create the PR for that (based on this) and we merge the two together (once that is reviewed), merge commit a PR with two commits.

There's some other "discussions" I want to have around that as well: I don't think it should be under a broker key: that's too specific to rabbitmq (which we may well be replacing), and also no end users are going to understand that terminology.

I'll probably open this as a separate issue, but basically I think we can do a bit more to standardise our terminology, and make it more understandable for users:

AiiDA is a workflow engine framework, which can be abstracted into five key concerns:

Storage: How do we store generated inputs, outputs, and the provenance between them.

Communication: How do we communicate with compute services (such as HPCs) and transfer data to/from them.

Processing: How do we run calculations and workflows (locally and externally)

Developer interface: How can developers create plugins to extend aiida-core (such as aiida-quantumespresso)

User interface: How do users interact with AiiDA (Python API, CLI, web-based APIs etc.)

In this abstraction, the rabbitmq configuration would come under processing/process

sphuber · 2022-01-23T21:35:04Z

Well not if you just append it to this config migration. After all people should not be using develop for actual work

Fair enough, but that does mean it has to happen soon, within the next week or so, since we want to be releasing very soon. If you think you'll add it before then (or want me to do it), fine to have it in separate commits or PRs.

chrisjsewell · 2022-01-24T09:56:39Z

Fair enough, but that does mean it has to happen soon, within the next week or so

Once you "sign-off" the current code in this PR, I'll rebase/squash into one commit, and add this on top of that (in this PR) for you to review

sphuber · 2022-01-24T18:05:21Z

Alright @chrisjsewell , considered this "signed-off" and feel free to do squash and do the second part

chrisjsewell · 2022-01-24T18:12:57Z

cheers! will do it later tonight

sphuber · 2022-01-25T16:08:55Z

aiida/manage/configuration/migrations/migrations.py

-class AbstractStorage(SingleMigration):
-    """Move the storage configuration under a top-level "storage" key.
+class AbstractStorageAndProcess(SingleMigration):
+    """Move the storage config under a top-level "storage" key and rabbitmq config under "processing".


This is not actually true I think, although I think that would actually be ideal:

{ 'storage': { 'backend': 'string', 'config': {} }, 'process_control': { 'backend': 'string', 'config': {} } }

Think that is the clearest.

But at least if you keep the current layout, then please update the docstring.

No I think that seems reasonable 👍

done; don't say I never give you anything 😉

sphuber · 2022-01-25T16:09:32Z

aiida/manage/configuration/migrations/migrations.py

                if new in profile.get('storage_config', {}):
                    profile[old] = profile['storage_config'].pop(new)
            profile.pop('storage_config', None)
            if 'storage_backend' in profile:
                profile['AIIDADB_BACKEND'] = profile.pop('storage_backend')
+            for key in self.process_keys:
+                if key in profile.get('process_control_config', {}):
+                    profile[key] = profile['process_control_config'].pop(key)


Probably add the same warning as for the storage conversion

sphuber · 2022-01-25T16:11:50Z

aiida/manage/configuration/schema/config-v6.schema.json

+          "description": "The configuration to parse to the storage backend",
+          "type": "object",
+          "properties": {
+            "database_engine": {


Although it is nice to have the explicit config options for the storage and process_control backends, this is problematic in principle since the config schema will be backend specific and so dynamic. Not sure if the JSON schema allows for such a concept. For now it is ok to keep since anyway we don't have a way to dynamically change the backends, but with your idea of making that possible in the future, just wanted to highlight this.

Yeh, I was well aware of this incongruity, just wanted to still have this "specification" somewhere. Probably want to put this validation onto a class method on the storage/process_control backend class, e.g.

storage_cls = load_storage_cls(config["storage_backend"]) storage_cls.validate_config(config["storage_config"])

tests/manage/configuration/migrations/test_migrations.py

chrisjsewell requested a review from sphuber January 20, 2022 09:59

chrisjsewell mentioned this pull request Jan 20, 2022

👌 IMPROVE: Configuration migrations #5319

Merged

chrisjsewell force-pushed the migrate-config-storage branch from ea3b6aa to 99dd5cc Compare January 21, 2022 04:59

chrisjsewell force-pushed the migrate-config-storage branch 2 times, most recently from f6a5046 to 2dd32e2 Compare January 21, 2022 20:20

chrisjsewell force-pushed the migrate-config-storage branch from 262b9cc to 4d032f1 Compare January 21, 2022 20:23

sphuber requested changes Jan 23, 2022

View reviewed changes

chrisjsewell requested a review from sphuber January 24, 2022 09:55

chrisjsewell mentioned this pull request Jan 24, 2022

Improving AiiDA terminology #5324

Open

♻️ REFACTOR: Profile storage backend configuration

fe28f48

chrisjsewell force-pushed the migrate-config-storage branch from ba998ff to fe28f48 Compare January 25, 2022 12:19

chrisjsewell force-pushed the migrate-config-storage branch from 8706dbe to ab9d156 Compare January 25, 2022 14:10

sphuber requested changes Jan 25, 2022

View reviewed changes

♻️ REFACTOR: Profile rabbitmq configuration

297da50

chrisjsewell force-pushed the migrate-config-storage branch from dad7a57 to 297da50 Compare January 25, 2022 18:43

chrisjsewell requested a review from sphuber January 25, 2022 19:43

♻️ REFACTOR: Profile configuration top-level keys

05aa2d1

chrisjsewell force-pushed the migrate-config-storage branch from 48817ab to 05aa2d1 Compare January 25, 2022 19:48

sphuber approved these changes Jan 25, 2022

View reviewed changes

sphuber merged commit 3af6f6a into aiidateam:develop Jan 25, 2022

chrisjsewell mentioned this pull request Mar 12, 2022

🔀 MERGE: Release v2.0.0b1 #5426

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

♻️ REFACTOR: Profile storage backend configuration #5320

♻️ REFACTOR: Profile storage backend configuration #5320

chrisjsewell commented Jan 20, 2022 •

edited

Loading

codecov bot commented Jan 20, 2022 •

edited

Loading

sphuber commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022 •

edited

Loading

chrisjsewell commented Jan 21, 2022 •

edited

Loading

sphuber commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022

sphuber commented Jan 21, 2022 •

edited

Loading

chrisjsewell commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022

sphuber left a comment

sphuber Jan 23, 2022

chrisjsewell Jan 23, 2022

sphuber Jan 23, 2022

chrisjsewell Jan 24, 2022

sphuber Jan 24, 2022

chrisjsewell commented Jan 23, 2022

sphuber commented Jan 23, 2022

chrisjsewell commented Jan 23, 2022 •

edited

Loading

sphuber commented Jan 23, 2022

chrisjsewell commented Jan 24, 2022

sphuber commented Jan 24, 2022

chrisjsewell commented Jan 24, 2022

sphuber Jan 25, 2022

chrisjsewell Jan 25, 2022

chrisjsewell Jan 25, 2022

sphuber Jan 25, 2022

chrisjsewell Jan 25, 2022

sphuber Jan 25, 2022

chrisjsewell Jan 25, 2022 •

edited

Loading

♻️ REFACTOR: Profile storage backend configuration #5320

♻️ REFACTOR: Profile storage backend configuration #5320

Conversation

chrisjsewell commented Jan 20, 2022 • edited Loading

Storage backend

Rabbitmq configuration

codecov bot commented Jan 20, 2022 • edited Loading

Codecov Report

sphuber commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022 • edited Loading

chrisjsewell commented Jan 21, 2022 • edited Loading

sphuber commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022

sphuber commented Jan 21, 2022 • edited Loading

chrisjsewell commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022

chrisjsewell commented Jan 21, 2022

sphuber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisjsewell commented Jan 23, 2022

sphuber commented Jan 23, 2022

chrisjsewell commented Jan 23, 2022 • edited Loading

sphuber commented Jan 23, 2022

chrisjsewell commented Jan 24, 2022

sphuber commented Jan 24, 2022

chrisjsewell commented Jan 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisjsewell Jan 25, 2022 • edited Loading

Choose a reason for hiding this comment

chrisjsewell commented Jan 20, 2022 •

edited

Loading

codecov bot commented Jan 20, 2022 •

edited

Loading

chrisjsewell commented Jan 21, 2022 •

edited

Loading

chrisjsewell commented Jan 21, 2022 •

edited

Loading

sphuber commented Jan 21, 2022 •

edited

Loading

chrisjsewell commented Jan 23, 2022 •

edited

Loading

chrisjsewell Jan 25, 2022 •

edited

Loading