Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for different $HOME on cylc job remote and execution nodes. #2779

Closed
DamianAgius opened this issue Sep 26, 2018 · 20 comments
Closed
Assignees
Milestone

Comments

@DamianAgius
Copy link

We have an issue where we are trying to submit jobs from a suite server, via a SSH remote (which does the qsub), to an execution host. Both the SSH remote and the execution host have access to all the required filesystems and PBS, however $HOME on the SSH remote is not the same $HOME as on the execution host.

Cylc uses a relative path for PBS output files, and submits the job from $HOME on the SSH remote - and therefore even though the correct log directory exists on the execution host, PBS cannot copy the output files to the job output directory, as it tries to copy to the directory path given by $HOME on the SSH remote.

Everything works up to the point where PBS attempts to do that copy, so the cylc_run directory, work directory, run directory etc are created in the correct locations on the shared file system because we the rose configuration specifies the correct directories. However the PBS directives in the cylc job files that specify the job output and error destinations are based on the directory the qsub occurs from.

Possible fix

We have tested and determined if your qsub into the directory (on the SSH remote) that represents $HOME on the execution host, the job runs successfully.

Therefore if there was a configuration option to specify the qsub starting point, this would be great. Eg, the code would do something equivalent to the following, assuming $JOB_SUBMIT_DIR was our configured directory:

cd $JOB_SUBMIT_DIR;
qsub blah

This will then allow the job to run, and the output file to be copied back by PBS to the correct location on the execution host, which is visible to the SSH remote.

By default $JOB_SUBMIT_DIR would be "$HOME" - but we would like an optional configuration item, per remote, that would be the directory used for the QSUB.

We also note that we would need to use this directory (or possibly another) for the job log retrieval, as the job logs exist on the SSH remote, but in $JOB_SUBMIT_DIR/cylc-run rather than $HOME/cylc-run (and therefore currently they cannot be copied back to the suite server).

@hjoliver
Copy link
Member

hjoliver commented Sep 26, 2018

@DamianAgius - just to check that I understand the problem, what is the relationship between the "ssh remote" and the "execution host"? Normally (well, in my experience anyway) it would be a login node (or similar) that sees the same home filesystem as the compute nodes. In your case are both hosts on the same shared filesystem but have different home directories, or different filesystem and different home directories??

If the "ssh remote" and the "execution host" do see the same filesystem, would it suffice to tell PBS the full - instead of relative - path to the desired job log location? (this would also require a change to Cylc, btw).

@DamianAgius
Copy link
Author

In this case, the SSH remote is a boundary node (not login) to multiple systems - effectively a suite setup & job submission proxy.
The home & data file systems for SSH remote and execution host are all visible, the only difference is $HOME is different - the boundary nodes $HOME (that the SSH connection uses) is not the same as the execution host (that the tasks use)
You could specify the full path to the PBS output files, but does Cylc know how to retrieve them if they are not under $HOME?

@hjoliver
Copy link
Member

Roger that.

So, the use case (partly from offline conversation) could be summarized as:

The ssh remote is a single "boundary node" that fronts several HPC clusters with different home filesystems.

@hjoliver
Copy link
Member

hjoliver commented Sep 26, 2018

@DamianAgius - a further clarification request: does the PBS client on the boundary node put jobs on the different HPCs (with different home filesystems) based purely on resource requested by the jobs? Because if users have to be aware of which HPC host to target then - I have to ask, before we consider modifying cylc for this - is a separate remote for each of the two different HPCs not a simpler option? (VMs are cheap and easy....)

@hjoliver
Copy link
Member

@matthewrmshin - as the architect of recent cylc job subsystem improvements - is probably best placed to comment on the implications for cylc, if we have to support a single remote that fronts multiple different HPCs.

@DamianAgius
Copy link
Author

DamianAgius commented Sep 26, 2018

@hjoliver Each suite will be able to submit to one or more of the clusters, by:

  • Specifying the cluster via the SSH remote (via DNS alias)
  • Specifying (per task) PBS server via PBS directives

This has been tested and works as expected.

We separate the HPC clusters via DNS alias, which allows current Cylc configuration to work. It would be nice if Cylc supported not having to set up DNS alias or round-robin for each cluster, but this is not essential.

@matthewrmshin
Copy link
Contributor

#2199 and #2565 are related. The alternate home directories setup for the different clusters is definitely going to be a challenging requirement to meet.

@matthewrmshin matthewrmshin added this to the later milestone Sep 26, 2018
@matthewrmshin
Copy link
Contributor

Can we ask why there is a necessity to have alternate HOME file systems for each cluster?

@DamianAgius
Copy link
Author

There is not strictly a necessity, however it has certain benefits and is convenient, especially if:

  • the clusters are not binary compatible, and the HOME file system is used to store cluster specific executable files
  • losing one home file system does not stop all clusters from running workloads
  • the clusters are not co-located, where sharing file systems is not ideal (the boundary nodes could be spread across data centres)

In this instance, the decision was made some time ago to have different home file systems - we will review the decision.
However there could be benefits in modifying Cylc to allow a configuration item for a remote's/clusters 'true' home, used for job submission and job log retrievals (potentially Rose could also be made aware). Having a zoned network with different trust levels, and only exposing boundary nodes (for all clusters) to VMs is potentially more secure.
However I understand am not sure of how possible this is with Cylc.

@matthewrmshin
Copy link
Contributor

Understood. It is certainly an interesting design to have multiple clusters sharing the same front end host.

@hjoliver
Copy link
Member

hjoliver commented Oct 1, 2018

@matthewrmshin - so, is it fair to say this is not a trivial fix and therefore needs to wait on your improved cluster awareness work ... probably after the higher priorities that are now spinning up? (web architecture, authentication, and GUI...)

@DamianAgius - can you confirm you have a workaround for the current setup??

@hjoliver
Copy link
Member

hjoliver commented Oct 1, 2018

p.s. @DamianAgius - I don't think you answered the 2nd part of my question above: #2779 (comment)

@matthewrmshin
Copy link
Contributor

Hi @hjoliver #2199 would help as we'll migrate most remote-host-based settings to become cluster-based settings. If it is important enough to solve this, we can in theory raise the priority of #2199 (at least partially) - the change should be mostly orthogonal to those associated with the web UI work - but it will distract the team (when it comes to reviewing and testing the changes, etc.)

@DamianAgius
Copy link
Author

@hjoliver Sorry for the delayed response to #2779 (comment) - I was on a weeks leave.

We are already using separate 'remote' configurations for each cluster, but these boundary nodes are not VMs - they are HPC nodes, with multiple file systems mounted to allow cross-cluster data transfers,and are also acting, for each cluster, as both external interfaces to non-HPC data sources and as the Cylc SSH 'remotes'.

Extra info:

  • The 'same' realm/service accounts (that run the workflows) exist across the clusters.
  • The home file system for each cluster is accessible on the boundary nodes (as are the work/share file systems)
  • $HOME is different on each cluster
  • $HOME on the boundary node may not be the same as on any of the clusters
  • Cluster home file system paths (and work/share paths) will be symbolic links, and these MUST be honoured for any persistent use (but inside a transient job script, real paths may be used, such as CYLC_DIR)
    • (We patched Rose suite-run to use the run/work/share paths provided, not the real path the remote Rose finds, to ensure our file system fail-over procedures function as expected)

@hjoliver
Copy link
Member

hjoliver commented Oct 5, 2018

Just re-read this issue.

@DamianAgius - as per your initial description above, everything (apart from job log retrieval?) works properly if cylc cd's to the cluster home dir location before doing the qsub? It would be an easy change to make cylc do that, even it is just a temporary workaround.

Testing this sort of change will be painful though...

@DamianAgius
Copy link
Author

Yes, although we are working on how to set up a test environment (for integration testing, not in-built Cylc tests of which I have little knowledge).

We manually tested qsub'ing a very simple script to a cluster, from $HOME on the boundary node
It failed with the same error (PBS couldn't copy the output back to its idea of the home filesystem)
We then did:

$ cd /path/to/cluster/home
$ qsub script.sh

Which worked - the job ran and PBS copied the output files back correctly.
I would assume that Cylc using the same method would work.

Cylc does seem to use relative path when setting up job log paths and then when trying to copy the job log back to the suite server VM - is there a way of configuring Cylc to use the full path for the job output files?

I like easy changes, and am happy to test with a workaround.

@hjoliver
Copy link
Member

Just talked to @DamianAgius. He envisages that the two HPC clusters (with different home paths) can continue to be accessed via two cylc remotes (that happen to resolve to the same physical "boundary node", but cylc doesn't need to know that). And, as described above, the different home paths for both clusters are visible on the boundary node.

So in cylc, we would just have to add a per-remote "home directory" configuration to be used instead of $HOME when interacting with the remote. This is probably quite easy to do, but we need to devise an easy way to simulate this kind of environment so that I can implement and test this.

@hjoliver hjoliver changed the title Allow setting of alternate job submission directory (and for log retrieval) Support for different $HOME on cylc job remote and execution nodes. Oct 10, 2018
@hjoliver
Copy link
Member

After further discussion with @DamianAgius there's one more complication here: PBS has to be told the target cluster for job query or kill to work, so batch system support will need mods. (We have a similar issue with a heterogeneous Slurm cluster here).

the PBS_JOB_ID will have the server on it, but not in a usable format.
123.pbs2 is the PBS job number
you can query it via:
qstat 123.pbs2@pbs2

You can also do:
qstat queue@pbs2 or qstat @pbs2

So it may require a per-cluster batch scheduler configuration, unless you want to extract the PBS server from the JOB_ID.

@hjoliver
Copy link
Member

hjoliver commented Nov 5, 2018

Update: it turns out:

  • PBS thankfully does handle a heterogeneous cluster in a unified way (i.e. the previous comment does not apply
  • cylc handles the home directory problem "out of the box", via the global.rc host run directory setting.
  • rose suite-run however, assumes the standard run directory location on a cylc remote.

Therefore, Rose PR submitted to resolve this issue: metomi/rose#2252

@hjoliver
Copy link
Member

hjoliver commented Nov 23, 2018

PBS thankfully does handle a heterogeneous cluster in a unified way

(Only true of PBS 14+)

For PBS 13 (still needed at @DamianAgius's site for a bit longer) I've posted #2877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants