Skip to content

Commit

Permalink
Profile: only initialise the repository during the migration (#4900)
Browse files Browse the repository at this point in the history
The new disk object store migration that will ship with `v2.0` requires
to be initialised once and only once, and it will generate the necessary
folders and configuration file. This process was being done in the
method `Profile.get_repository` guarded by a check in case it was
already initialised.

The problem with this was that the container could be initialised too
early during the early stages of a profile setup. As soon as the
repository was fetched, it would be initialised generating, among other
things, the UUID. This would then trigger the check that the database
contained the same UUID, which would of course fail, since the database
was empty. This could have been fixed by ignoring the check at this
point, but the real problem is that the repository should not be
initialised at this point. The only point at which the repo should be
initialised is during the corresponding database migration that
introduced the disk object store repository. Both for existing as well
as for new profiles, they will go through this migration and so it and
it alone should be responsible for initialising the repository.

After removing the automatic initialization in `get_repository_container`
of the `Profile` class, it revealed an actual bug in the migration. The
migration initialised the new disk object store container only after
the check whether the database is empty, in which case it would short cut
and skip the repository migration entirely, since there are no nodes to
migrate anyway. However, this would leave the repository uninitialised
and cause problems down the road. This wasn't seen before because the
repository would be initialised anyway as soon as it was fetched the
first time. With the change in behavior of `get_repository_container`
this is no longer the case and so the migration had to be fixed to always
initialise the repository, even for empty databases, which is often the
case for newly created profiles.

This approach did create problems for the unittests though, as they
would sometimes clean the repository. It would this not by just removing
the contents, but it would delete the entire container. This meant it
had to be recreated, but since in normal operations this only happens
during the migration (which also during tests only happens once, unless
maybe during the migration tests themselves) and so an error would be
raised that the repository is not initialised. The solution is to
reinitialise a new repo as soon as the old one was destroyed. Currently
this is done by simply deleting the folder on disk and reinitialising an
entire new instance. In the future, it would be better if the existing
container could be kept and its contents could simply be dropped, but
this would require a feature in the `disk-objectstore` library.
  • Loading branch information
sphuber authored May 4, 2021
1 parent 46eb180 commit 87ab7e3
Show file tree
Hide file tree
Showing 5 changed files with 32 additions and 35 deletions.
25 changes: 13 additions & 12 deletions aiida/backends/general/migrations/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,9 +153,21 @@ def migrate_legacy_repository(node_count, shard=None):
basepath = pathlib.Path(profile.repository_path) / 'repository' / 'node'
container = Container(filepath)

# The new container needs to be initialised. Since this function can be called multiple times, it shouldn't
# initialise the repository each time since it will delete all content. By checking the value of the shard, we
# should just initialise it for the first iteration.
if shard is None or shard == '00':
if container.is_initialised and not profile.is_test_profile:
raise exceptions.DatabaseMigrationError(
f'the container {filepath} already exists. If you ran this migration before and it failed simply '
'delete this directory and restart the migration.'
)

container.init_container(clear=True, **profile.defaults['repository'])

if not basepath.is_dir():
# If the database is empty, this is a new profile and so it is normal the repo folder doesn't exist. We simply
# return as there is nothing to migrate
# return as there is nothing to migrate.
if profile.is_test_profile or node_count == 0:
return None, None

Expand All @@ -164,17 +176,6 @@ def migrate_legacy_repository(node_count, shard=None):
'nodes. Aborting the migration.'
)

# When calling this function multiple times, once for each shard, we should only check whether the container has
# already been initialized for the first shard.
if shard is None or shard == '00':
if container.is_initialised and not profile.is_test_profile:
raise exceptions.DatabaseMigrationError(
f'the container {filepath} already exists. If you ran this migration before and it failed simply '
'delete this directory and restart the migration.'
)

container.init_container(clear=True, **profile.defaults['repository'])

node_repository_dirpaths, missing_sub_repo_folder = get_node_repository_dirpaths(basepath, shard)

filepaths = []
Expand Down
10 changes: 10 additions & 0 deletions aiida/backends/testbase.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ def setUpClass(cls):
cls._class_was_setup = True

cls.refurbish_db()
cls.initialise_repository()

@classmethod
def tearDownClass(cls):
Expand Down Expand Up @@ -136,6 +137,14 @@ def clean_db(cls):

reset_manager()

@classmethod
def initialise_repository(cls):
"""Initialise the repository"""
from aiida.manage.configuration import get_profile
profile = get_profile()
repository = profile.get_repository()
repository.initialise(clear=True, **profile.defaults['repository'])

@classmethod
def refurbish_db(cls):
"""Clean up database and repopulate with initial data.
Expand Down Expand Up @@ -168,6 +177,7 @@ def clean_repository(cls):
# Clean the test repository
shutil.rmtree(dirpath_repository, ignore_errors=True)
os.makedirs(dirpath_repository)
cls.initialise_repository()

@classproperty
def computer(cls): # pylint: disable=no-self-argument
Expand Down
14 changes: 3 additions & 11 deletions aiida/cmdline/commands/cmd_setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def setup(
# Migrate the database
echo.echo_info('migrating the database.')
manager = get_manager()
backend = manager._load_backend(schema_check=False) # pylint: disable=protected-access
backend = manager._load_backend(schema_check=False, repository_check=False) # pylint: disable=protected-access

try:
backend.migrate()
Expand All @@ -96,19 +96,11 @@ def setup(
repository_uuid_database = backend_manager.get_repository_uuid()
repository_uuid_profile = profile.get_repository().uuid

# If database contains no repository UUID, it should be a clean database so associate it with the repository
if repository_uuid_database is None:
backend_manager.set_repository_uuid(repository_uuid_profile)

# Otherwise, if the database UUID does not match that of the repository, it means they do not belong together. Note
# that if a new repository path was specified, which does not yet contain a container, the call to retrieve the
# repo by `get_repository_container` will initialize the container and generate a UUID. This guarantees that if a
# non-empty database is configured with an empty repository path, this check will hit.
elif repository_uuid_database != repository_uuid_profile:
if repository_uuid_database != repository_uuid_profile:
echo.echo_critical(
f'incompatible database and repository configured:\n'
f'Database `{db_name}` is associated with the repository with UUID `{repository_uuid_database}`\n'
f'However, the configured repository `{repository}` has UUID `{repository_uuid_profile}`.'
f'However, the configured repository has UUID `{repository_uuid_profile}`.'
)

# Optionally setting configuration default user settings
Expand Down
13 changes: 2 additions & 11 deletions aiida/manage/configuration/profile.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,22 +129,13 @@ def __init__(self, name, attributes, from_config=False):
self._test_profile = bool(self.name.startswith('test_'))

def get_repository(self) -> 'Repository':
"""Return the repository configured for this profile.
.. note:: The repository will automatically be initialised if it wasn't yet already.
"""
"""Return the repository configured for this profile."""
from disk_objectstore import Container
from aiida.repository import Repository
from aiida.repository.backend import DiskObjectStoreRepositoryBackend

container = Container(self.repository_path / 'container')
backend = DiskObjectStoreRepositoryBackend(container=container)
repository = Repository(backend=backend)

if not repository.is_initialised:
repository.initialise(clear=True, **self.defaults['repository']) # pylint: disable=unsubscriptable-object

return repository
return Repository(backend=backend)

@property
def uuid(self):
Expand Down
5 changes: 4 additions & 1 deletion aiida/manage/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,10 @@ def get_backend_manager(self) -> 'BackendManager':
from aiida.common import ConfigurationError

if self._backend_manager is None:
self._load_backend()

if self._backend is None:
self._load_backend()

profile = self.get_profile()
if profile is None:
raise ConfigurationError(
Expand Down

0 comments on commit 87ab7e3

Please sign in to comment.