From 70790e460ea31d8a1b4d8de7b96b9a081712e4e5 Mon Sep 17 00:00:00 2001 From: Giovanni Pizzi Date: Tue, 28 Apr 2020 12:01:33 +0200 Subject: [PATCH] Addressing comments by Dominik --- .../readme.md | 24 +++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/003_efficient_object_store_for_repository/readme.md b/003_efficient_object_store_for_repository/readme.md index 81734f6..7fb6be8 100644 --- a/003_efficient_object_store_for_repository/readme.md +++ b/003_efficient_object_store_for_repository/readme.md @@ -12,7 +12,7 @@ ## Background AiiDA writes the "content" of each node in two places: attributes in the database, and files (that do not need fast query) in a disk repository. -These files include for instance raw inputs and otputs of a job calculation, but also other binary or +These files include for instance raw inputs and outputs of a job calculation, but also other binary or textual information best stored directly as a file (some notable examples: pseudopotential files, numpy arrays, crystal structures in CIF format). @@ -20,7 +20,7 @@ Currently, each of these files is directly stored in a folder structure, where e is based on the node UUID with two levels of sharding (that is, if the node UUID is `4af3dd55-a1fd-44ec-b874-b00e19ec5adf`, the folder will be `4a/f3/dd55-a1fd-44ec-b874-b00e19ec5adf`). -Files of a nodes are stored within the node repository folder, +Files of a node are stored within the node repository folder, possibly within a folder structure. While quite efficient when retrieving a single file @@ -247,6 +247,26 @@ the different requirements, and represent what can be found in the current imple As a note, seeking a file to a given position is what one typically does when watching a video and jumping to a different section. +- Packing in general, at this stage, is left to the user. We can decide (at the object-store level, or probably + better at the AiiDA level) to suggest the user to repack, or to trigger the repacking automatically. + This can be a feature introduced at a second time. For instance, the first version we roll out could just suggest + to repack periodically in the docs to repack. + This could be a good approach, also to bind the repacking with the backing up (at the moment, + probably backups need to be executed using appropriate scripts to backup the DB index and the repository + in the "right order", and possibly using SQLite functions to get a dump). + As a note, even if repacking is never done, the situation is anyway as the current one in AiiDA, and actually + a bit better because getting the list of files for a node without files wouldn't need anymore to access the disk, + and similarly there wouldn't be anymore empty folders created for nodes without files. + + In a second phase, we can print suggestions, e.g. when restarting the daemon, + that suggests to repack, for instance if the number of loose objects is too large. + We can also provide `verdi` commands for this. + + Finally, if we are confident that this approach works fine, we can also automate the repacking. We need to be careful + that two different processes don't start packing at the same time, and that the user is aware that packing will be + triggered, that it might take some time, and that the packing process should not be killed + (this might be inconvenient, and this is why I would think twice before implementing an automatic repacking). + ### Why a custom implementation of the library We have been investigating if existing codes could be used for the current purpose.