-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install tasks via SLURM #686
Comments
I generally like the idea a lot. Will review better next week. One first comment: |
Just to quickly confirm: The fractal user at FMI does not have access to submit jobs to the cluster. |
As of version 2.9.0 (in progress), the task lifecycle features are organized roughly as follows:
where the ssh/local modules are quite homogeneous in their structure. In principle we could expose an abstraction for each action (collect, deactivate, reactivate), which then takes a specific form for either local or ssh versions. However, the next step in this area will be to cover a third scenario (the slurm-based one) which will not fit in this scheme; thus we are keeping all modalities separated. That said, we need to define the feature of "run task collection as a SLURM job" better. The main questions are:
Side note: the pattern which we choose here (re: SLURM user and relevant folders) will also be used for task-group deactivation and reactivation, on top of task collection. We first need to review the main questions above (cc @jluethi), and then we can proceed with a first implementation. At a first look, the option of letting the |
High-level: Let's not do this now, but take it for further discussion (e.g. with Enrico) Some content brainstorming: On 1:
Yes. Can we make on-disk permissions for this just very broad? => everyone can execute, every user can write to this task folder. On 2: Open question: Does the server keep track of whl files separately? |
Refs:
python_version
flexibility from task collection? #659After reviewing these issues, we (me and @mfranzon) propose to switch to another way of collecting task (which was already mentioned in the past), namely one wher ethe venv-related commands (venv creation and pip commands) are executed on the same machine where the tasks will be executed.
Briefly, task collection would have three phases:
Steps 1 and 3 should remain very similar to what they are now.
Step 2 should be heavily refactored. Right now it consists of a series of subprocess.run commands, with their I/O handled in python. All these commands are executed on the machine where fractal-server runs, which is clearly a problem (see issues above, but also the possible incompatibility in system libraries).
In the future version, step 2 will be transformed into a bash script, similar to the following prototype:
And, most importantly, this script will be executed via FractalSlurmExecutor (for the slurm backend), or via a standard ThreadPoolExecutor (for the local backend). Thus a SLURM job will install the tasks while executing on a SLURM node.
Some notes:
python_version
flexibility from task collection? #659, with this refactor we could then easily switch to a situation where the SLURM configuration file also points to several python paths (rather than just a single one inFRACTAL_SLURM_WORKER_PYTHON
), each one for a given version. For instance the Fractal admin could include something likeThis is to be reviewed and re-discussed together, but we think that the current task collection is wrong, and it just works by accident (mainly because the server machine is very similar to the cluster nodes).
The text was updated successfully, but these errors were encountered: