Efficient object store for the AiiDA repository #11

giovannipizzi · 2020-04-17T20:10:54Z

greschd

I like it, just marked two typos.

A general question I have is this: Would it be conceivable to trigger the packing (and maybe even compressing) somewhat automatically - either at the level of the object store implementation, or at the level of AiiDA itself?

My worry is that if packing / repacking / compressing is left to the user, in many cases it will just never happen. The heuristics / timescales for when these events occur should probably be tweakable, but I think we should provide a default that will perform decently well.

003_efficient_object_store_for_repository/readme.md

sphuber

Thanks @giovannipizzi , looking really good. I just have some questions about the compression of objects.

003_efficient_object_store_for_repository/readme.md

espenfl · 2020-04-21T06:31:18Z

Note comments in this issue: aiidateam/aiida-core#335.

003_efficient_object_store_for_repository/readme.md

espenfl · 2020-04-21T07:56:09Z

Added a few regular comments to the AEP. This is great and I support this. But it does not really address an integration to a proper object store system. I would still like that to be on the roadmap as a priority if this is implemented.

giovannipizzi · 2020-04-28T10:02:37Z

@greschd I added a note on your comment of automatic repacking in a new commit. You can check 70790e4

I will address the other comments later on

003_efficient_object_store_for_repository/readme.md

giovannipizzi · 2020-04-28T11:11:50Z

I think the packing procedures should provide some guarantees (if at all possible) that it will remain in a valid state even if killed. This could of course happen even if the user triggers it manually.
Agreed. This is in general true, but there are very weird corner cases to take into account... You can follow the discussion on these issues on this issue of the repository.

giovannipizzi · 2020-04-28T11:12:17Z

@sphuber I addressed your comments with a new commit c9f86f7

giovannipizzi · 2020-04-28T11:45:02Z

Hi @espenfl I addressed your comments in 5af4c7d

The only (major) think that still needs to be addressed is your comment that this is not an object store. This was due my lack of knowledge to the meaning attached with the work "object store", and I should have called a "key-value store". Before renaming, I'll wait a bit.

Just to clarify: after a lot of consideration, we have realised that:

the critical problem now, for AiiDA large profiles, is: storing, backing up, and accessing a lot of small files
This AEP aims at solving the issue of having too many small files
Unfortunately, object stores are not designed for this goal, and are actually much worse than a filesystem exactly in this respect (accessing a lot of small objects); so they cannot be used as a solution for the specific issue at hand here
I am trying to see in this issue how to make sure it's easy (at least in read-only mode) to use an object store as a backend to store the files. But due to the intrinsic speed limitations of a real object store, I don't see how one can get good performance there... My most recent findings are that if we only use the repo in read-only mode, we pack everything before putting on the object store, and we are ok with caching on the local machines the actual data, one can get reasonable performance with a relatively easy setup (mount via a S3 interface the bucket with the packed repository, and set a disk caching policy, see details in the linked issue). In this way, we can rely on existing solutions (BOTO, s3fs, clone) to deal with "real" object stores, and we don't have to maintain code for that in AiiDA

Of course, suggestions are welcome, but at the moment the only way to address the critical performance issues that we encounter, with the human resources we have, was to: split the problem, decouple it from the support of "actual" object stores, and try to address the performance issue first (hoping that it can be easily used to support also the object store case (that we didn't forget, but has lower priority with respect to the main issue discussed here), at least for some scenarios like serving a REST API of AiiDA in read-only mode, with data stored once and for all in an object store).

espenfl · 2020-04-28T13:27:48Z

The only (major) think that still needs to be addressed is your comment that this is not an object store. This was due my lack of knowledge to the meaning attached with the work "object store", and I should have called a "key-value store". Before renaming, I'll wait a bit.

I think we should at least call it something else than object store, or at least specify that in this context we do not mean what the "object store" that the larger community associate with this.

Just to clarify: after a lot of consideration, we have realised that:
1. the critical problem now, for AiiDA large profiles, is: storing, backing up, and accessing a lot of small files

Yes, this I understand and support.

3. Unfortunately, object stores are not designed for this goal, and are actually much worse than a filesystem exactly in this respect (accessing a lot of small objects); so they cannot be used as a solution for the specific issue at hand here

In fact, one of the design goals of the object store is precisely to be able to support an infinite number of files. There are also a large number of systems and ways to host and interact with it, so we should be a bit careful putting all solutions into one box. After all high-performance object store is used in high-performance and critical high-availability scenarios that are more complex and intense than what we (maybe ever will) see in AiiDA. But, the problem with these solutions is that they are tailored to a specific task, including the hardware around it. This is a problem; e.g. at this point spend time on solutions in AiiDA that only works (at least to a satisfactory degree) on dedicated hardware and setups. However, in the long term perspective, this is something that anyway needs to be done as the databases and repositories are going to be massive for larger production runs with a lot of partners. But I certainly agree that at this point we should not address this point. However, we should try to avoid introducing concepts now that make such initiatives downstream difficult.

4. [I am trying to see in this issue](https://github.com/giovannipizzi/disk-objectstore/issues/17) how to make sure it's easy (at least in read-only mode) to use an object store as a backend to store the files. But due to the intrinsic speed limitations of a real object store, I don't see how one can get good performance there... My most recent findings are that if we only use the repo in read-only mode, we pack everything before putting on the object store, and we are ok with caching on the local machines the actual data, one can get reasonable performance with a relatively easy setup (mount via a S3 interface the bucket with the packed repository, and set a disk caching policy, see details in the linked issue). In this way, we can rely on existing solutions (BOTO, s3fs, clone) to deal with "real" object stores, and we don't have to maintain code for that in AiiDA

In order to gauge performance versus a local or dedicated remote system on fast interconnect one would have to test this when sitting on the storage network. Not sure of the details of the test, but then, the performance is usually rather good and similar to what the parallel file systems can support. And of course, we cannot do better than this, but this goes for all solutions, including the solution that exists today.

Then we have the cases where maximum performance is not necessary an issue, but stability, scalability and longevity is.

Of course, suggestions are welcome, but at the moment the only way to address the critical performance issues that we encounter, with the human resources we have, was to: split the problem, decouple it from the support of "actual" object stores, and try to address the performance issue first (hoping that it can be easily used to support also the object store case (that we didn't forget, but has lower priority with respect to the main issue discussed here), at least for some scenarios like serving a REST API of AiiDA in read-only mode, with data stored once and for all in an object store).

I support a solution like this if that was unclear from my comments. I am just concerned that this might block future implementations that pursue a true object store implementation (which I suspect have to come at some point). And this in theory also goes past just moving the repo there, but in fact storing the objects themselves in the system, which most likely demands a larger rewrite of some portions of the code. In fact, when using the object store, which anyway relies on a database, it might be that we can remove the AiiDA specific database altogether and rely on the functionality of object store.

In fact, when thinking more about this, this comes down to what kind of users we would like to target; a local setup of AiiDA, or a more integrated approach. The latter would typically be an enterprise solution. Maybe in the future it would make sense to split AiiDA into two parts, one that is more tailored for enterprises (that would need a dedicated setup, support etc.) and one that is more general.

espenfl · 2020-04-28T13:43:22Z

Thanks for addressing each comment @giovannipizzi. I agree.

dev-zero · 2021-05-11T07:43:00Z

I am not entirely sure this belongs here, and even if it does it might come a bit late. If so, please ignore.
What seems to be missing from the comparison of the alternatives is to store the files directly in the PostgreSQL, either as bytea or text. Even the PostgreSQL Wiki has more or less our use case as an example where it makes sense to store the files in the database (at least the small files). And with PostgreSQL's TOAST feature you get the packing and compression for free. On the other end of spectrum (of file sizes) you then have the Large Objects Feature which would give streaming access for things like trajectories, wavefunctions, HDF5s, etc.
The big advantages would be the consistency and that backing up (even incrementally) would be covered entirely by PostgreSQLs WAL archiving and PITR and all the low-level I/O would rely on PostgreSQL, which is fairly robust and safe. The drawback being the complexity when having to mix two possibilities of storing files.

giovannipizzi · 2021-05-11T19:47:54Z

Thanks @dev-zero, very interesting!
I was checking but I couldn't find a reference in SQLAlchemy to access the Large Objects Feature of PSQL.
In any case, the good news is that in principle in the future it should be relatively easy to switch to a different backend (e.g. directly PSQL or other) and the python API (hopefully) should not change

Merge branch 'master' into 003_efficient_object_store_for_repository

(of disk-objectstore at version 0.6.0)

giovannipizzi · 2021-12-10T14:15:06Z

OK - I have updated this AEP (number 006 now, updated the content to the current state).

In particular I have added a clarification that this is not meant to be a documentation of the disk-objectstore package (as the implementation choices might be adapted in the future), but rather as a reference of the reason for the introduction of the package, of the design decisions, and of why they were made.

I have also adapted the text (it referenced and early version of the library, where the key was not the sha256 hash but a random UUID - so I have adapted those parts, or removed the discussion that is now not relevant anymore.

Finally, I have marked the state as "implemented".

I think the best course is to merge this to avoid this information gets lost.

@sphuber or @chrisjsewell could you please review and, if OK, merge? Thanks!

PS @sphuber maybe after this is merged you might want to update and merge also #7? (There is a reference to that PR in this AEP - when you merge you might what to update the link from the PR to the actual AEP number).

- One sentence per line and one line per sentence - Consistent enumeration symbols - Escape special markdown characters

sphuber

Thanks @giovannipizzi . I have given it a pass. It reads well and I think all important information is there. There are just a few suggestions with small corrections.

006_efficient_object_store_for_repository/readme.md

Co-authored-by: Sebastiaan Huber <mail@sphuber.net>

006_efficient_object_store_for_repository/readme.md

giovannipizzi · 2021-12-15T18:51:48Z

Thanks! I've accepted all (just fixed a couple of additional typos I saw). Indeed the sentence saying that it wasn't needed to track which object is in which pack was wrong and I removed it (it was from the old text/very first implementation).

commit b4b4053 Author: Giovanni Pizzi <giovanni.pizzi@epfl.ch> Date: Wed Dec 15 20:20:05 2021 +0100 AEP 006 - Efficient object store for the AiiDA repository (aiidateam#11) commit 0a5675d Author: Sebastiaan Huber <mail@sphuber.net> Date: Fri Sep 10 18:16:30 2021 +0200 Update README.md (aiidateam#26) commit 5b45258 Author: Sebastiaan Huber <mail@sphuber.net> Date: Fri Sep 10 18:14:31 2021 +0200 AEP 004: Infrastructure to import completed calculation jobs (aiidateam#12) commit 4855195 Author: Chris Sewell <chrisj_sewell@hotmail.com> Date: Sun Jan 10 15:20:52 2021 +0000 Add Archive format AEP (aiidateam#21)

First proposal of the AEP for the object store of AiiDA

774c8d4

giovannipizzi added status/submitted type/standard Standard Track AEP labels Apr 17, 2020

giovannipizzi requested review from ltalirz, ramirezfranciscof and sphuber April 17, 2020 20:10

giovannipizzi force-pushed the 003_efficient_object_store_for_repository branch from fb51c6a to 774c8d4 Compare April 17, 2020 20:11

greschd approved these changes Apr 17, 2020

View reviewed changes

003_efficient_object_store_for_repository/readme.md Outdated Show resolved Hide resolved

003_efficient_object_store_for_repository/readme.md Outdated Show resolved Hide resolved

sphuber reviewed Apr 19, 2020

View reviewed changes

espenfl reviewed Apr 21, 2020

View reviewed changes