-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient object store for the AiiDA repository #11
Efficient object store for the AiiDA repository #11
Conversation
fb51c6a
to
774c8d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, just marked two typos.
A general question I have is this: Would it be conceivable to trigger the packing (and maybe even compressing) somewhat automatically - either at the level of the object store implementation, or at the level of AiiDA itself?
My worry is that if packing / repacking / compressing is left to the user, in many cases it will just never happen. The heuristics / timescales for when these events occur should probably be tweakable, but I think we should provide a default that will perform decently well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @giovannipizzi , looking really good. I just have some questions about the compression of objects.
Note comments in this issue: aiidateam/aiida-core#335. |
Added a few regular comments to the AEP. This is great and I support this. But it does not really address an integration to a proper object store system. I would still like that to be on the roadmap as a priority if this is implemented. |
|
Hi @espenfl I addressed your comments in 5af4c7d The only (major) think that still needs to be addressed is your comment that this is not an object store. This was due my lack of knowledge to the meaning attached with the work "object store", and I should have called a "key-value store". Before renaming, I'll wait a bit. Just to clarify: after a lot of consideration, we have realised that:
Of course, suggestions are welcome, but at the moment the only way to address the critical performance issues that we encounter, with the human resources we have, was to: split the problem, decouple it from the support of "actual" object stores, and try to address the performance issue first (hoping that it can be easily used to support also the object store case (that we didn't forget, but has lower priority with respect to the main issue discussed here), at least for some scenarios like serving a REST API of AiiDA in read-only mode, with data stored once and for all in an object store). |
I think we should at least call it something else than object store, or at least specify that in this context we do not mean what the "object store" that the larger community associate with this.
Yes, this I understand and support.
In fact, one of the design goals of the object store is precisely to be able to support an infinite number of files. There are also a large number of systems and ways to host and interact with it, so we should be a bit careful putting all solutions into one box. After all high-performance object store is used in high-performance and critical high-availability scenarios that are more complex and intense than what we (maybe ever will) see in AiiDA. But, the problem with these solutions is that they are tailored to a specific task, including the hardware around it. This is a problem; e.g. at this point spend time on solutions in AiiDA that only works (at least to a satisfactory degree) on dedicated hardware and setups. However, in the long term perspective, this is something that anyway needs to be done as the databases and repositories are going to be massive for larger production runs with a lot of partners. But I certainly agree that at this point we should not address this point. However, we should try to avoid introducing concepts now that make such initiatives downstream difficult.
In order to gauge performance versus a local or dedicated remote system on fast interconnect one would have to test this when sitting on the storage network. Not sure of the details of the test, but then, the performance is usually rather good and similar to what the parallel file systems can support. And of course, we cannot do better than this, but this goes for all solutions, including the solution that exists today. Then we have the cases where maximum performance is not necessary an issue, but stability, scalability and longevity is.
I support a solution like this if that was unclear from my comments. I am just concerned that this might block future implementations that pursue a true object store implementation (which I suspect have to come at some point). And this in theory also goes past just moving the repo there, but in fact storing the objects themselves in the system, which most likely demands a larger rewrite of some portions of the code. In fact, when using the object store, which anyway relies on a database, it might be that we can remove the AiiDA specific database altogether and rely on the functionality of object store. In fact, when thinking more about this, this comes down to what kind of users we would like to target; a local setup of AiiDA, or a more integrated approach. The latter would typically be an enterprise solution. Maybe in the future it would make sense to split AiiDA into two parts, one that is more tailored for enterprises (that would need a dedicated setup, support etc.) and one that is more general. |
Thanks for addressing each comment @giovannipizzi. I agree. |
I am not entirely sure this belongs here, and even if it does it might come a bit late. If so, please ignore. |
Thanks @dev-zero, very interesting! |
Merge branch 'master' into 003_efficient_object_store_for_repository
(of disk-objectstore at version 0.6.0)
OK - I have updated this AEP (number 006 now, updated the content to the current state). In particular I have added a clarification that this is not meant to be a documentation of the I have also adapted the text (it referenced and early version of the library, where the key was not the sha256 hash but a random UUID - so I have adapted those parts, or removed the discussion that is now not relevant anymore. Finally, I have marked the state as "implemented". I think the best course is to merge this to avoid this information gets lost. @sphuber or @chrisjsewell could you please review and, if OK, merge? Thanks! PS @sphuber maybe after this is merged you might want to update and merge also #7? (There is a reference to that PR in this AEP - when you merge you might what to update the link from the PR to the actual AEP number). |
- One sentence per line and one line per sentence - Consistent enumeration symbols - Escape special markdown characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @giovannipizzi . I have given it a pass. It reads well and I think all important information is there. There are just a few suggestions with small corrections.
Co-authored-by: Sebastiaan Huber <mail@sphuber.net>
Thanks! I've accepted all (just fixed a couple of additional typos I saw). Indeed the sentence saying that it wasn't needed to track which object is in which pack was wrong and I removed it (it was from the old text/very first implementation). |
commit b4b4053 Author: Giovanni Pizzi <giovanni.pizzi@epfl.ch> Date: Wed Dec 15 20:20:05 2021 +0100 AEP 006 - Efficient object store for the AiiDA repository (aiidateam#11) commit 0a5675d Author: Sebastiaan Huber <mail@sphuber.net> Date: Fri Sep 10 18:16:30 2021 +0200 Update README.md (aiidateam#26) commit 5b45258 Author: Sebastiaan Huber <mail@sphuber.net> Date: Fri Sep 10 18:14:31 2021 +0200 AEP 004: Infrastructure to import completed calculation jobs (aiidateam#12) commit 4855195 Author: Chris Sewell <chrisj_sewell@hotmail.com> Date: Sun Jan 10 15:20:52 2021 +0000 Add Archive format AEP (aiidateam#21)
submitted
README.md