Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ REFACTOR: Reorganise Backend class UML #5172

Closed

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Oct 10, 2021

This PR addresses parts of #5154, in conjunction with #5169 (also relevant are #5145 and #5171)
It is of note that this PR does not change anything for the "front-end" user, i.e. there should be no apparent change when using the AiiDA API or CLI.

The goal of this PR and #5154, is to improve the structure of the Backend; simplifying it and allowing for the possibility of new Backends (such as the archive).
This is an illustration of the current UML of the SQLAlchemy backend for the Node entity (from develop branch).

(the diagrams are available at https://www.figma.com/file/oluW9WVn1v1yDj6pWpTiQ7/aiida-backend-design?node-id=0%3A1)

Current-UML

There are some key issues here:

  1. The RepositoryBackend is implemented on the Profile, meaning that it can not be implemented on a per-backend basis.
  2. The SqlaBackendManager is an implementation detail of the Django/SQLA backends, yet is exposed and accessed via the Manager.
  3. The connection to the database is stored in the ENGINE/SCOPED_SESSION global variables, meaning that only one backend can be used at a time, and access to the database often by-passes the backend; using directly get_scoped_session.

In terms of the global variable use for database access, it is clear that this has risen from the original choice of using Django as the database API. This API has a key shortcoming, in that it also ties a Python process to a single connection configuration (in aiida/backends/djsite/settings.py, via djago.setup).
With sqlalchemy though, this is not necessary and there is a clear separation of concerns between the ORM models, to provide the schema for each database table, and the engine/session, which provide the interface to the actual database.
AiiDA's current implementation, tries to "force" the sqlalchemy implementation to be like the Django one, and thus "inherits" its shortcomings.

This is now the new UML:

New-UML

The key changes then, are ensuring all persistent data access flows through the Backend instance:

  • The RepositoryBackend is only accessed via the backend
  • The BackendManager is only accesses via the backend
  • The sqlalchemy engine/session is only accessed via the backend, which is responsible for it (rather than having a global variable)
  • The ModeWrapper is renamed StorableModel; this class links a DbNode (or other SQLA mapper) to the backend's DB connection (i.e. session).
    • All ad-hoc code has been removed from the DbNode, re-separating its concern to only be "describing" the database table, rather than dealing with the connection.
  • When the Manager is closed, this also calls close() on the backend, which shutdown the session and engine connection to the database.

the Backend is also now instantiated with the Profile, which specifies the configuration for accessing the persistent storage (e.g. the database connection config and repository URI)

For the sqlalachemy backend this actually means that multiple backends can "exist" at the same time (in particular the main backend and the archive backend). For django, because of its design, you are still restricted to one backend.
(note though the Manager still restricts one "main" Backend to be loaded, so users will not notice this)

To accommodate these changes (the backend being responsible for the session), a few changes had to be made to AiidaTestCase:

  • The default User & Computer were both persisted as class variables, meaning they were always using the same backend, despite the manager being reset after each test.
    • These are now reset after each test, and any use of them as a class variable has been removed.
  • The test were "leaking" backends, which were not been correctly closed after each test

Note, these changes were not required for the newer pytest fixtures (which eventually should deprecate AiidaTestCase), since they reset the DB at the same time as resetting the manager and don't store class variables.

The `BackendManager` is an implementation detail of the Django/SQLA backends and should not be accessed directly.
@codecov
Copy link

codecov bot commented Oct 10, 2021

Codecov Report

Merging #5172 (d5bf71e) into develop (23981a5) will decrease coverage by 0.01%.
The diff coverage is 86.20%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5172      +/-   ##
===========================================
- Coverage    81.00%   81.00%   -0.00%     
===========================================
  Files          535      535              
  Lines        37376    37323      -53     
===========================================
- Hits         30274    30230      -44     
+ Misses        7102     7093       -9     
Flag Coverage Δ
django 75.89% <51.86%> (+0.08%) ⬆️
sqlalchemy 74.98% <76.44%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/backends/__init__.py 100.00% <ø> (+9.10%) ⬆️
aiida/backends/djsite/__init__.py 100.00% <ø> (ø)
aiida/backends/djsite/manage.py 0.00% <0.00%> (ø)
aiida/backends/sqlalchemy/__init__.py 100.00% <ø> (ø)
aiida/backends/sqlalchemy/manage.py 0.00% <0.00%> (ø)
aiida/backends/utils.py 100.00% <ø> (ø)
aiida/cmdline/commands/cmd_setup.py 50.46% <0.00%> (+0.46%) ⬆️
aiida/common/__init__.py 100.00% <ø> (ø)
aiida/manage/configuration/__init__.py 79.21% <0.00%> (ø)
aiida/orm/implementation/django/__init__.py 100.00% <ø> (ø)
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23981a5...d5bf71e. Read the comment docs.

@chrisjsewell chrisjsewell changed the title ♻️ REFACTOR: Remove direct access to BackendManager ♻️ REFACTOR: Reorganise Backend class UML Oct 15, 2021
@chrisjsewell chrisjsewell marked this pull request as ready for review October 15, 2021 12:32
@muhrin
Copy link
Contributor

muhrin commented Dec 7, 2021

In general all good here, I think this is a positive tidy up of some of the backend classes.

My main question is why rename the SQLA ModelWrapper to StorableModel but the Django version remains ModelWrapper. As I think you've seen, these (at least originally) play the same role i.e. hiding from their clients the 'synchronisation points' at which data is flushed to or retrieved from the backend. While the two hierarchies are disjoint I think there's still a benefit to keeping the roles (and names) of these objects roughly the same (unless of course, the differences in backend architectures mean they diverge significantly).

What do you think?

@chrisjsewell
Copy link
Member Author

Heya, thanks for taking a look. Yeh I think that makes sense (can't remember any reason specifically I didn't change the django one)

@chrisjsewell
Copy link
Member Author

This was folded in to #5330 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants