Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AiiDA cannot create folders in object storage #4895

Closed
JPchico opened this issue Apr 30, 2021 · 11 comments
Closed

AiiDA cannot create folders in object storage #4895

JPchico opened this issue Apr 30, 2021 · 11 comments
Labels

Comments

@JPchico
Copy link
Contributor

JPchico commented Apr 30, 2021

Describe the bug

AiiDA is designed to work with traditional file systems. When trying to work with object storage such as blobfuse

In object storage certain operations such as chmod result in failure with [Errno 38] Function not implemented:

with open(filepath, mode=mode, encoding=encoding) as handle:
shutil.copyfileobj(filelike, handle)
os.chmod(filepath, self.mode_file)

This causes any simulation submitted when one sets the repository file system to be located in an object storage system to fail.

A simple solution is to bypass this specific chmod error by adding the lines

try:
   os.chmod(filepath, self.mode_file)
except OSError as _error:
    if _error.args[0] != 38:
        raise

This is a bit of a hack and very specific to this kind of application. Maybe there is a more general solution for object storage.

Your environment

  • Operating system [e.g. Linux]: Ubuntu 18.04
  • Python version [e.g. 3.7.1]: python 3.8
  • aiida-core version [e.g. 1.2.1]: 1.6.1 hash 7295bb6
@JPchico
Copy link
Contributor Author

JPchico commented Apr 30, 2021

Even if one fixes all the os.chmod calls in the folders.py file one will still run into problems with the shutils.copytree function that runs into the same problems. However, one cannot just bypass this call, as one does with the os.chmod.

@giovannipizzi
Copy link
Member

Thanks @JPchico! Just to understand, just calling shutils.copytree would fail on your blobfuse, independent of AiiDA? I guess this would be some issue to report upstream (to the implementers of blobfuse, to to python for the shutils.copytree implementation (unless there is a flag to e.g. skip some operations like setting permissions).

It would be good if you could try the most recent develop, as the repository functionality was significantly changed, to see which errors you still see (and which new ones you'll find :-) ).

For the new library (disk-objectstore) that we implemented, underlying the new implementation, I did some tests on fuse-mounted object-stores with some report in this issue: aiidateam/disk-objectstore#17

I would be interested in reports of the performance on your specific Object-Store and FUSE implementation.

Out of curiosity, is there any implementation other than blobfuse? I have to admit that I already envision slow performance if you work directly on an object store, due to its design (if it works like others, where each command is a REST API call, so working on many files will always have a large cost)

@JPchico
Copy link
Contributor Author

JPchico commented May 6, 2021

Thanks @JPchico! Just to understand, just calling shutils.copytree would fail on your blobfuse, independent of AiiDA? I guess this would be some issue to report upstream (to the implementers of blobfuse, to to python for the shutils.copytree implementation (unless there is a flag to e.g. skip some operations like setting permissions).

It would be good if you could try the most recent develop, as the repository functionality was significantly changed, to see which errors you still see (and which new ones you'll find :-) ).

@giovannipizzi hello! Yes indeed, just shutils.copytree seems to fail, so I'll probably place a bug report upstream. Sadly when tired looking into the copytree implementation I do not see a way to pass the capability to skip things like chmod operations, which is what I think is the root of the problem.

I'll be checking the new version soon! I guess that this kind of behavior should not change too much (at least with what I checked in the aiida-core source code) but maybe I'm missing something.

For the new library (disk-objectstore) that we implemented, underlying the new implementation, I did some tests on fuse-mounted object-stores with some report in this issue: aiidateam/disk-objectstore#17

The issue that you rise has some very interesting implications, indeed FUSE systems have certain peculiarities, things like multiple processes opening the same file at the same time can result in information being lost, among other issues.

Out of curiosity, is there any implementation other than blobfuse? I have to admit that I already envision slow performance if you work directly on an object store, due to its design (if it works like others, where each command is a REST API call, so working on many files will always have a large cost)

I mostly use the blobfuse system as an archive storage solution (i.e. users have run simulations and want to keep all the files for whatever reason, we move them to the blob storage). So I do not have performance statistics right now, but I can get them. My guess is that it works similarly. When checking the consumption reports, right now network consumption is basically negligible, perhaps if a framework like AiiDA interacts with it the situation would change.

I wonder what is the best solution for the repositories? Should one "refresh" them every so often by creating backups of the db and repository to avoid bloating?

@giovannipizzi
Copy link
Member

Is there any reason to keep the repository directly on the FUSE FS?

Can you e.g. just keep on a regular partition, and backup once a day together with the DB?
The new repository is designed in a way that it should be quite efficient to perform incremental backups (mostly with rsync). If your backup goes on an object store there are a few more issues to be checked (e.g. the new implementation packs into files of ~4GB of size (you have to run a "management" packing operation before backup, though; there will be docs); until a new pack is needed, content is appended to existing packs. This is very efficient (only the delta is transferred) when using rysnc on standard FS, might need full retransfer on proper object stores.

Docs and helper tools to do backup with the new repository will appear soon (weeks).

important - do not use the new repository (i.e. develop) for production; we might need to change a bit the schema; just test a copy that you can discard and continue using aiida<=1.6.x for production

@JPchico
Copy link
Contributor Author

JPchico commented May 11, 2021

@giovannipizzi Mostly to ensure that there are no space issues, we have several users running aiida instances in the same VM (each one with a different db, different home folders, etc) this can cause space issues as the repository grows.

Is it possible to backup the repository to the FUSE system and delete the local files? Or would this cause issues when loading nodes?

@giovannipizzi
Copy link
Member

You need to have an active copy of the repository - otherwise when AiiDA tries to access files, it will fail. It might work for basic operations (just load_node) but it will fail when reading (or writing) file content, e.g. reading a UPF during submission, storing the raw data, parsing existing data, reading numpy arrays etc.

Can't you create multiple volumes (with 'standard' filesystem), one per user, and attach them to the same VM, and set the home of each user to be in that volume? In this way different users don't risk to finish the space for the others, performance should be (much) better for live data (you can still use the FUSE just to keep a backup copy), and it should also be easy to resize the volume should you need more space (probably without reboot: turn off that user deamons, log out, connect as root, disconnect the volume, resize it, reconnect/remount it, login again)

@JPchico
Copy link
Contributor Author

JPchico commented May 17, 2021

Sure thing, that is what I'm doing right now,of course as you say one can resize the volume, most of the time one needs to stop the VM (depending on how one has set it up). The advantage with FUSE like systems is that one does not need to worry about that, as in theory one has "infinite" storage. Of course, as you say one might incur in expenses due to network traffic, that plus the other issues. So perhaps the little issues with adding volumes to the VM are preferable over the issues that arise with FUSE like systems.

@giovannipizzi
Copy link
Member

Hi @JPchico - quickly getting back at this. Would a solution like this aiidateam/disk-objectstore#123 be helpful? You keep a few GB on a local fast HD, and move "archived" packs to some other filesystem (mounted?)

@JPchico
Copy link
Contributor Author

JPchico commented Dec 9, 2021

Hi @giovannipizzi this sounds quite interesting! I think that the main issue I had was a blobfuse related one (basically failures because blobfuse does not allow one to use chown commands and then several shutil based commands fail). But maybe this will be of use, as we could mount cheaper HDD for certain filed and keep the data that needs to be fast in SSD.

@giovannipizzi
Copy link
Member

You could simply put the archived-packs on a mounted blobfuse folder, no? and just keep the fast SSD for loose objects and for the recent packs

@JPchico
Copy link
Contributor Author

JPchico commented Jul 10, 2023

I think this is solved in the plugin aiida-s3 in which object storage for the repository is now possible. E.g. one can store the repository in azure blob or aws s3. So I think that this is not needed anymore.

@JPchico JPchico closed this as completed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants