Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of different volumes for postgres, rabbitmq, collection-sets, etc #1051

Closed
SvenLieber opened this issue Jan 23, 2021 · 2 comments
Closed
Milestone

Comments

@SvenLieber
Copy link
Contributor

Hi,

thanks for SFM, it is really a great framework :-)

We are currently testing the framework for the BeSocial project in Belgium and came across some unwanted behavior in our use case. We found a solution by changing parts of SFM in our forks, e.g. https://github.com/SvenLieber/sfm-ui

The issue of a central data storage

For some legal and data management reasons the machine running the harvests is different from the server which should store the collected data. Our first naive solution was to mount the destination folder via SSHFS and use it for /sfm-data.

Regular reading and writing perfectly works, however, both Postgres and RabbitMQ literally want to own their respective subdirectories using the chown command which results in permission errors during startup.

A more fine-grained solution

What fixed the issue for us was splitting up the use of /sfm-data into a more fine-granular use based on subdirectories following the SFM directory structure. Thus we can outsource sensitive parts, e.g. having collection sets on a mounted SSHFS drive and using a remote PostgreSQL database.

We adapted SFM to internally use /sfm-db-data, /sfm-mq-data, /sfm-export-data, /sfm-containers-data and /sfm-collection-set-data instead of /sfm-data. These folders are treated just as root directories and sub directories are still created, e.g. /sfm-collection-set-data/collection_set. This still allows to have everything in a single folder following the default of SFM.

Excerpt from .env:

# RabbitMQ is stored locally
DATA_VOLUME_MQ=/sfm-mq-data

# DB is set to a local docker volume, but we do not have a db instance
# we connect to a remote server via POSTGRES_HOST
DATA_VOLUME_DB=/sfm-db-data

# Data from SFM are stored on a remote server
DATA_VOLUME_EXPORT=/mnt/ssh-drive:/sfm-export-data
DATA_VOLUME_CONTAINERS=/mnt/ssh-drive:/sfm-containers-data
DATA_VOLUME_COLLECTION_SET=/mnt/ssh-drive:/sfm-collection-set-data

Excerpt from the docker-compose.yml file:

  data:
    image: local-fork/sfm-data
    volumes:
      - ${DATA_VOLUME_MQ}
      - ${DATA_VOLUME_DB}
      - ${DATA_VOLUME_EXPORT}
      - ${DATA_VOLUME_CONTAINERS}
      - ${DATA_VOLUME_COLLECTION_SET}
    environment:
      - TZ
      - SFM_UID
      - SFM_GID

This certainly looks more confusing compared to having a single /sfm-data volume, but it offers also more flexibility. The changes affect all SFM repositories as they all use /sfm-data.

This is also not the end of the story, what possibly still needs an update are notifications in the monitoring of used data as now the DB directory might not be available anymore.
Another issue might be the database connection to a remote host which should be configured via SSL. I hope this is possible somewhere over here or here.

Please let us know what you think about this solution :-)

Sven

@kerchner kerchner added this to the 2.4.0 milestone Jan 28, 2021
@lwrubel
Copy link
Collaborator

lwrubel commented Jan 29, 2021

Sven,
Thanks so much for this detailed description of how SFM could be configured better for an environment such as yours. The team thinks it's a great idea to split out the sfm-data configuration as you've proposed. I imagine you're not the only institution with these requirements. We're planning to put in some time on SFM in late Feb/early March and we'll include this issue in that work. Please feel free to submit a PR if you go ahead with adjusting the notifications or SSL database connection in the meantime.
Laura

@lwrubel
Copy link
Collaborator

lwrubel commented Jan 29, 2021

Adding since I didn't say it explicitly earlier, but we'd welcome a PR from your fork with the docker-compose.yml and .env changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants