Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the RepositoryBackend interface for Swift #335

Closed
sphuber opened this issue Jan 27, 2017 · 13 comments
Closed

Implement the RepositoryBackend interface for Swift #335

sphuber opened this issue Jan 27, 2017 · 13 comments

Comments

@sphuber
Copy link
Contributor

sphuber commented Jan 27, 2017

With the new file repository implementation of aiida-core==2.0.0 it is now possible to implement the AbstractRepositoryBackend interface for Swift since the repository no longer assumes a local file system solution,

Below is the original description of this issue before it was redefined to only consider implementing the repository interface for Swift.

Ideally, a profile can specify multiple file storage repositories and custom selection rules for which files to store in which repository. Which files and in which repository they are stored, will also be stored in the database from now on.

These changes will allow AiiDA to integrate with Swift, an object store, which will the choice of file storage system at CSCS

We should define a common API for all classes accessing files, common to directories and object store. Some methods will stay only for folders (e.g. abspath), others (e.g. open or get_file_content) should be common. The AiiDA code should only use the common API.

@sphuber sphuber self-assigned this Jan 27, 2017
@sphuber sphuber added this to the 1.0 release milestone Jun 10, 2017
@giovannipizzi
Copy link
Member

giovannipizzi commented Jun 21, 2017

See also #404 (closed as duplicate)

@sphuber
Copy link
Contributor Author

sphuber commented Nov 24, 2017

In converting aiida_core to be able to handle multiple repositories, I realized that the current repository is used for multiple things. It is not just used to store files from calculations, but it is also used as a temporary storage for things like sandbox folders and workflow pickles. It seems then, that even if we provide the option of an object store as a repository, we still may want to keep a regular filesystem repository for these temporary files. It seems a little strange that if a user configures an object store, that even these temporary files will all have to be stored in the OS. Should we start to make a distinction between the permanent repository, which can be filesystem or OS, and a temporary repository, that is what we have now and will be kept for temporary things, such as sandboxfolders and workflow related files?

@giovannipizzi
Copy link
Member

Yes, I agree the sandbox should be local, and ok to have a different configuration entry for its location/path.

@ltalirz
Copy link
Member

ltalirz commented Nov 1, 2018

For people who already want to play around with this (before the implementation of the AiiDA swift interface), just pointing out that swift containers can be mounted locally using the swift virtual file system.

We've tested this with swift@CSCS and it seemed to work fine (as long as you don't add tens of thousands of files at once).

Procedure on Ubuntu 16.04

# install svfs
wget https://github.com/ovh/svfs/releases/download/v0.9.1/svfs_0.9.1_amd64.deb
sudo dpkg -i svfs_0.9.1_amd64.deb
sudo apt-get install -f

# set OS environment variables (depends on OS provider)
pip install --upgrade python-openstackclient lxml oauthlib
source os_env_vars.env

# mounting
mkdir /home/ubuntu/osmount
# mount container 'archive' (for read-only use: -o allow_other=0,ro=1)
mount.svfs osdev /home/ubuntu/osmount -o container=archive,allow_other=0
# mount all containers in project with debug info
mount.svfs osdev /home/ubuntu/osmount -o debug=1,allow_other=0

# unmount
fusermount -u /home/ubuntu/osmount

# To understand: how to make the mount persistent

@giovannipizzi giovannipizzi modified the milestones: v1.0.0, v2.0.0 Dec 3, 2018
@giovannipizzi
Copy link
Member

I think that before crystallising on an API and/or implementation, we should consider pyfilesystem2 [GitHub], [website], [blog post on why a version 2 has been written, with examples of the API]

There is still some work to do on the AiiDA side, but this would make it easy to use different backends (e.g. it supports Tar and Zip if we want to pack files together, it has a TempFS and in-memory, it supports mounting, it has official support for S3). However it does not have support for swift yet (pyfilesystem1 doesn't and I can't find information either on a plugin for pyfs2). But looking at the amount of code in the S3FS repo it could be something not too complex to implement, considering that in any case we need to do some implementation. And that according to the blog post above and the docs, the essential methods to implement are only 7:

  • getinfo() Get info regarding a file or directory.
  • listdir() Get a list of resources in a directory.
  • makedir() Make a directory.
  • openbin() Open a binary file.
  • remove() Remove a file.
  • removedir() Remove a directory.
  • setinfo() Set resource information.

and all the rest is optional (you need to implement them mainly for optimising the code, so it might not be needed in a first implementation).

I'm not saying that we must use this interface, but that when we will work on this issue/project, we should investigate it to decide if it's worth using it to simplify our life if we want to easily support more backends in the future, and/or see if our internal "folder" api can be cleaned up/simplified/replaced with this one.

@espenfl
Copy link
Contributor

espenfl commented Jan 11, 2020

To add to this, supporting for instance an interface to rclone would be awesome for the repo. It might simplify and give great flexibility. However, some work needs to be done to check how uniform it is across the different interfaces. I would except it to be rather uniform, given its code base. As such, we could easily switch between i.e. Google Drive, OneDrive, local disk, Swift etc. etc. (even SharePoint etc. for MS locked organizations). Also, performance might take a hit compared to a direct interface to Swift etc. but that need to be checked and weighted against possible benefits of not having to maintain any file interface (in principle we could use the local disk interface as well for the stuff that is flying today). Given the possibly push towards more general usage I think supporting something like this would make a lot of sense. @sphuber @ltalirz @giovannipizzi Thoughts?

@espenfl
Copy link
Contributor

espenfl commented Jan 13, 2020

Also note that rclone supports encrypted transfer out of the box, which might be something we would like to offer (and not maintain). The question with respect to pyfilesystem2 versus rclone is; why has the most standardized api and the biggest user base. To me it seems not many are using pyfilesystem2 and its interfaces compared to rclone is rather lacking. We want to use something that already has a big user base such that future maintenance/development are as secure as possible. In addition rclone offers the sync, chunking and caching.

@giovannipizzi giovannipizzi added this to the v2.0.0 milestone Mar 24, 2021
@giovannipizzi
Copy link
Member

This might be an old issue, now superseded - adding 2.0, but maybe (@sphuber ?) we can just closed it as we have a new repository and we don't plan to support the Swift interface anymore?

@espenfl
Copy link
Contributor

espenfl commented Mar 24, 2021

@sphuber @giovannipizzi Sure, if no support for Swift is planned, please go ahead and close it. Having said that, I strongly believe a proper interface to some standard object store API should be added at some point. A standard S3 interface using one of the existing Python libraries would be sufficient to open up for more flexibility. Performance aside. When using object store (with exceptions), performance is never the key point anyway. Do we have a way to put this on the roadmap?

@sphuber
Copy link
Contributor Author

sphuber commented Mar 24, 2021

This issue was not only to discuss adding support for Swift, but also to discuss the possibility to have a single database support using multiple file repositories. The current proposed implementation actually does not consider this and I don't think we should do this for v2.0, but it might still be an interesting feature to consider in the far future. So in that case I would keep this issue open.

@espenfl
Copy link
Contributor

espenfl commented Mar 25, 2021

Maybe we should just close it and open specific issues related to an object store interface and multiple repos? Their work is maybe anyway a bit isolated?

@sphuber
Copy link
Contributor Author

sphuber commented Apr 28, 2021

I think supporting multiple repositories for a single profile is not something for which is a need and would require a lot of work. So I don't think this will be happening soon. Providing a Swift backend for the new repository implementation in aiida-core==2.0.0 is a lot more likely and might be useful. Therefore I am renaming this issues accordingly and removing the v2.0.0 milestone.

@sphuber sphuber removed this from the v2.0.0 milestone Apr 28, 2021
@sphuber sphuber changed the title Allow the configuration of multiple repositories and implement Swift interface Implement the RepositoryBackend interface for Swift Apr 28, 2021
@sphuber
Copy link
Contributor Author

sphuber commented Sep 18, 2023

Since v2 the repository can now be replaced in a StorageBackend plugin implementation. This is done in aiida-s3 to add support for S3 and Azure Blob Storage as the file repository.

@sphuber sphuber closed this as completed Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants