Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Support non-standard remote run directories. #2252

Closed
wants to merge 7 commits into from

Conversation

hjoliver
Copy link
Contributor

@hjoliver hjoliver commented Nov 5, 2018

Works with a non-default cylc global config "run directory" setting, e.g:

# global.rc
[hosts]
   [[server1]]
       run directory = /nfs/home1/$USER

This can be useful if, for instance, your $HOME on the cylc remote server is not the same as $HOME on the compute cluster that it fronts (but the compute home dir is still mounted on the remote). We need the suite run directory to be set up where the jobs see it.

TODO -

  • test with default and non-standard run directory on a remote host
  • handle the similar work directory setting too (for share and work location)
  • with rose root-dir symlinks
  • with rose root-dir{share/work} symlinks
  • test with --new (not working?)
  • document
  • testing?

@matthewrmshin matthewrmshin added this to the soon milestone Nov 5, 2018
@matthewrmshin
Copy link
Member

You probably want to concentrate on getting the --remote=... option set up correctly when rose suite-run relaunches itself with --remote=... in the job host. (And sorry for the archaic code.)

@hjoliver
Copy link
Contributor Author

hjoliver commented Nov 5, 2018

You probably want to concentrate on getting the --remote=... option...

I used new --run-dir=DIR and --work-dir=DIR options for the remote invocation, which seems to be fine because the command help is not defined by the actual options (i.e. the new options are not exposed to users). But should I be passing these values to the remote side with --remote... instead??

@hjoliver
Copy link
Contributor Author

hjoliver commented Nov 5, 2018

Travis CI failure is only because pygtk.org has evidently disappeared off the internet.

Warning, treated as error:
/home/travis/build/metomi/rose/sphinx/api/rose-gtk-library.rst.rst:13:broken link: http://www.pygtk.org/ (HTTPConnectionPool(host='www.pygtk.org', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b5643bbaad0>: Failed to establish a new connection: [Errno -2] Name or service not known',)))
make: *** [linkcheck] Error 2
make: Leaving directory `/home/travis/build/metomi/rose/sphinx'

@hjoliver
Copy link
Contributor Author

hjoliver commented Nov 5, 2018

(Tested successfully on two networked VMs, and on the Azure cluster with PBS that mimics the BoM clusters; now awaiting testing at BoM...).

@matthewrmshin
Copy link
Member

Yes, attempts to resolve www.pygtk.org are returning Name or service not known. (Time to say goodbye to PyGTK?)

@matthewrmshin
Copy link
Member

I don't really mind either way - given that this logic will migrate to Cylc soon!

The --remote=key1=value1,key2=value2,... settings are split by the _run_remote method in the r_opts dict - which is supposed to make life easier. (It was also done like this to avoid too many CLI options being introduced that are for internal use only).

@matthewrmshin
Copy link
Member

Just restarted the build again.

@hjoliver
Copy link
Contributor Author

hjoliver commented Nov 7, 2018

https://www.pygtk.org now points at the PyGObject site.

@hjoliver
Copy link
Contributor Author

@matthewrmshin - I've just retested this again along with cylc/cylc-flow#2877 (i.e. the two parts of cylc/cylc-flow#2779). It all works fine; can you advise on what more needs doing to get this merged to Rose master (specifically tests I guess) and whether next release is feasible.

os.path.realpath(suite_dir_home) !=
os.path.realpath(suite_dir_root)):
if opts.run_dir:
suite_dir_real = os.path.join(suite_dir_root, suite_name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewrmshin I believe at this point, suite_dir_root is Rose config item root-dir, and suite_name is appropriately set - however neither of these add {pre,suf}fix 'cylc-run' - so if opts.run_dir is set, suite_dir_real = <rose root-dir>/<suite-name> (no /cylc-run/ in between) which is then symlinked to ~/cylc-run which is the exact behaviour I'm seeing.

@matthewrmshin
Copy link
Member

@hjoliver @DamianAgius If I remember correctly via email exchanges @DamianAgius has a trick in place to make this work at his site. Has/can it be integrated with this branch?

@hjoliver
Copy link
Contributor Author

Sorry for the delay - this all went down while I was away on leave. @DamianAgius - I'm now a bit hazy on the details, except that it seems surprising that this worked properly on the Azure nodes that were supposed to mimic your multi-host configuration. I definitely checked those test results many times and would have noticed if the "cylc-run" path component was missing. Now the Azure nodes have been taken down so I can't retest. I wonder if it was a configuration issue - maybe I (incorrectly?) specified the "cylc-run" path in my home dir config, and you didn't?

@matthewrmshin
Copy link
Member

@hjoliver Do you still want to pursue this?

@matthewrmshin matthewrmshin modified the milestones: 2019.01.1, 2019.01.2 May 3, 2019
@DamianAgius
Copy link

@matthewrmshin We are running with the locally patched version of Hilary's branch. I currently have no time to spend on this, but would really appreciate the functionality in master (at some stage). My current workload should ease in the coming weeks, and I'll hopefully have some time then.

@hjoliver
Copy link
Contributor Author

hjoliver commented May 5, 2019

@hjoliver Do you still want to pursue this?

As Damian (BoM) notes - yes!

@hjoliver
Copy link
Contributor Author

hjoliver commented May 5, 2019

Next step is, I think, to get feedback from Damian on what was wrong with this branch (causing missing "cylc-run" path component) in his environment, that he managed to fix - when it seemingly wasn't wrong on the simultated environment we had in the cloud ... then I need to figure out how to test it again without said cloud environment ... Docker might save us there.

Doesn't sound like a major hurry to get it into master though, from what @DamianAgius says.

@hjoliver
Copy link
Contributor Author

hjoliver commented May 5, 2019

(May have been incorrect use of the path config by one of us, probably me).

@hjoliver
Copy link
Contributor Author

I just rediscovered this ancient PR. @DamianAgius are you still out there? Is this still something you'll need to deal with under Cylc 8 (which is going to be deployed on new hardware, I think?).

@DamianAgius
Copy link

Hi @hjoliver - good to hear from you, and hello to the Cylc/Rose team(s).
I cannot confirm at this stage (but do hope!) that this 'feature' will be required as we have not yet finished configuration and testing.

@matthewrmshin matthewrmshin removed their request for review October 31, 2022 13:42
@MetRonnie MetRonnie modified the milestones: 2019.01.8, 2019.01.9 Dec 20, 2022
@oliver-sanders
Copy link
Member

Closing this old PR now, if there's any followup please comment here.

@oliver-sanders oliver-sanders removed this from the 2019.01.9 milestone Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants