Skip to content

Conversation

sjanssen2
Copy link
Member

@sjanssen2 sjanssen2 commented Aug 18, 2025

When processing data via plugins, input and output files are located in ONE shared filesystem, i.e. BASE_DATA_DIR (and WORKING_DIR for temporary files). This works well as long as the plugin process and Qiita pet & DB are operating within one machine or machines equipped with network file systems like slurm grids.

We intend to host Qiita within a kubernets cloud environment. A plugin will become an independent docker image running as one or multiple pods. It would therefore be advisable to separate the central file system (Qiita pet & DB) from individual plugins, as this would slow boot up of plugin pods AND we might later distribute plugin jobs across multiple clouds. In this use case, transferring the whole BASE_DATA_DIR is infeasible.

text236

I understand the current flow as follows:

  1. the user composes a processing / analysis workflow and hits "run". Qiita pet uses a launcher to "submit" a new job for the according plugin
  2. the plugin command requests information about e.g. an artifact, the prep file, ... via qiita_client, which calls the postgress DB and returns filepaths in BASE_DATA_DIR.
  3. the plugin command directly accesses content of filepaths from 2.

My suggested flow is designed to require minimal changes for plugin codes, e.g. wrapping the filepath when reading / writing content with additional functions fetch_file_from_central and push_file_to_central to pull or push file content either directly from/to the filesystem (no change from what is done currently) or receive/push and create files in the plugin local or central filesystem, respectively.

This PR adds two endpoints/functions to Qiita pet to send

def get(self, requested_filepath):
and receive files to and from plugins.

Both endpoints are deactivated by default (to make Qiita behave as is) and can be activated by setting

# Allow BASE_DATA_DIR file content transfer through https (True or False)
# By default, Qiita and its plugins share the filesystem of BASE_DATA_DIR.
# You can less tightly couple selected plugins (=no shared file system) but
# they need to get/push input/output files through https then
ENABLE_HTTPS_PLUGIN_FILETRANSFER = False
to True).

For higher performance, file content will be send through nginx instead of plain python/tornado. I thus had to fix the /Users/username prefix in the nginx example configuration file to match the actual filepaths.

It's not yet clear to me, IF we need to add a mechanism to check if a plugin has permissions to access the requested file(s) from BASE_DATA_DIR. Currently, we "trust" the plugin via oauth credentials.

This PR is accompanied by jlab/qiita_client#1 to equip the Qiita Client (part of every plugin) to handle according requests. The client can select between the current central filesystem mechanism
https://github.com/jlab/qiita_client/blob/d9543c9575f0171620c17f4e87897ed5cf52a905/qiita_client/qiita_client.py#L783-L790
OR the novel file transfer through https
https://github.com/jlab/qiita_client/blob/d9543c9575f0171620c17f4e87897ed5cf52a905/qiita_client/qiita_client.py#L792-L811
Both mechanisms return the actual filepath of the requested (and potentially transferred) files. This allows individual plugins to use different mechanisms, i.e. we don't need to migrate all plugins at once.

As only the plugin functions know which files they request / send as artifact components, we need to "decorate" file access in their individual implementations. Here is an example PR for qp-deblur: jlab/qp-deblur#1

…ed manually, but are obtained from an external identity provider
@sjanssen2 sjanssen2 closed this Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant