-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PBS 13 "multi-cluster" support. #2877
Conversation
I used PBS Torque in the past, and have a Docker container that I used some time ago. Still keen to spin it up and see what's the output of... Maybe we could add a test too? #!/usr/bin/env python2
# THIS FILE IS PART OF THE CYLC SUITE ENGINE.
# Copyright (C) 2008-2018 NIWA & British Crown (Met Office) & Contributors.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import unittest
from cylc.batch_sys_handlers.pbs_multi_cluster import *
def get_test_filter_poll_many_output():
return [
("ignore\nignore\n123.localhost ignored", ["123.localhost@localhost"]),
("\n\n#\n#\n10.samba", ["10.samba@samba"]),
(
"""
a
a
12.localhost ____ # this is ignored
09.jd-01
""",
[
"12.localhost@localhost",
"09.jd-01@jd-01",
]
),
("\n\n1.localhost", [])
]
def get_test_filter_poll_many_output_invalid():
return [
# error raised due to multiple `.` chars
("a\nb\n1.wd-00.terra", ValueError),
# could not find a `.`
("a\nb\nnot_an_id", ValueError)
]
def get_test_manip_job_id():
return [
("1.localhost", "1.localhost@localhost"),
("10000.jd-01", "10000.jd-01@jd-01")
]
def get_test_manip_job_id_invalid():
return [
("1.jd-01.terra.luna", ValueError),
("103", ValueError)
]
class TestPBSMultiCluster(unittest.TestCase):
def test_filter_poll_many_output(self):
"""Basic tests for filter_poll_many_output."""
for out, expected in get_test_filter_poll_many_output():
job_ids = PBSMulticlusterHandler.filter_poll_many_output(out)
self.assertEqual(expected, job_ids)
def test_filter_poll_many_output_invalid(self):
"""Test filter_poll_many_output with invalid values."""
for out, ex in get_test_filter_poll_many_output_invalid():
with self.assertRaises(ex):
PBSMulticlusterHandler.filter_poll_many_output(out)
def test_manip_job_id(self):
"""Basic tests for manip_job_id."""
for job_id, expected in get_test_manip_job_id():
mod_job_id = PBSMulticlusterHandler.manip_job_id(job_id)
self.assertEqual(expected, mod_job_id)
def test_manip_job_id_invalid(self):
"""Basic tests for manip_job_id with invalid values."""
for job_id, ex in get_test_manip_job_id_invalid():
with self.assertRaises(ex):
PBSMulticlusterHandler.manip_job_id(job_id)
def test_export_handler(self):
import cylc.batch_sys_handlers.pbs_multi_cluster as m
self.assertTrue(hasattr(m, 'BATCH_SYS_HANDLER')) Writing the test, I noticed that it doesn't accept host names with dots... can't recall what the output looked like? @hjoliver Q is it valid: |
That test could be something like $ PYTHONPATH=$(pwd -P)/lib pytest lib/cylc/tests/test_pbs_multi_cluster.py
=============================================================================================== test session starts ===============================================================================================
platform linux2 -- Python 2.7.15rc1, pytest-3.9.1, py-1.7.0, pluggy-0.8.0
rootdir: /home/kinow/Development/python/workspace/cylc, inifile:
plugins: cov-2.6.0
collected 5 items
lib/cylc/tests/test_pbs_multi_cluster.py ..... [100%]
============================================================================================ 5 passed in 0.03 seconds ============================================================================================= And provide 100% coverage for the new code 🎉 |
Testing PBS + Cylc with Docker is doable, but would take a couple of days at least to find the right setup. However, I could submit a PBS job with agaveapi/torque. [testuser@docker ~]$ qsub /home/testuser/torque.submit
0.docker.example.com So I think the hostname part may have multiple dots... so maybe? diff --git a/lib/cylc/batch_sys_handlers/pbs_multi_cluster.py b/lib/cylc/batch_sys_handlers/pbs_multi_cluster.py
index 747b05375..ca33f274f 100644
--- a/lib/cylc/batch_sys_handlers/pbs_multi_cluster.py
+++ b/lib/cylc/batch_sys_handlers/pbs_multi_cluster.py
@@ -38,7 +38,7 @@ class PBSMulticlusterHandler(PBSHandler):
lines = out.split('\n')
for line in lines[2:]:
job = line.split()[0]
- _, server = job.split('.')
+ _, server = job.split('.', 1)
job_ids.append(job + '@' + server)
return job_ids |
@kinow - thanks for the review! You're dead right about the hostname dots - oops, I'd better fix that. Also thanks for the unit tests, I'll put them in... |
(Well, actually I'm not sure about the hostname, I was going by the info given to me on the real system this targets, which I don't have access to, and the mimic setup on Azure, ... so it's possible that PBS Pro 13 isn't using use a fully qualified name in this context. But best to be safe I guess.) |
Unit tests added. (As I recall we're not explicitly hooking these into |
You may have to adjust the tests if you change the behaviour for the dots in the host name. I've added a test to validate that there.
Yup, for now the tests are not executed (they were not executed before as well I think). But you can run manually. The other PR to add coverage reports is taking care to call all tests, test-battery/unit. |
Yes I did that
we were doing this for the few-ish existing unit tests:
... might need to remove those now? |
Oh, you were right @hjoliver ! Had a look at the coverage branch, and hadn't removed all the tests from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. 2 comments on style.
"""For job_id's of the form "id.server", return job_id@server.""" | ||
out = out.strip() | ||
job_ids = [] | ||
lines = out.split('\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lines = out.splitlines()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't test this but it looks good as far as I can see.
@matthewrmshin @oliver-sanders - you'd better take another quick look, since I addressed your feedback. @kinow - note I've changed your suggested unit tests a bit: assuming |
Codecov Report
@@ Coverage Diff @@
## master #2877 +/- ##
==========================================
+ Coverage 65.22% 65.26% +0.03%
==========================================
Files 106 107 +1
Lines 15113 15133 +20
Branches 3319 3323 +4
==========================================
+ Hits 9857 9876 +19
- Misses 5256 5257 +1
Continue to review full report at Codecov.
|
(94% patch coverage (one line missed) ... to get to 100% probably requires extensive unit testing of |
(Yes, the logic in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@sadielbartholomew please sanity check and merge as @oliver-sanders is on leave today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanity check passed. Great!
Close #2779
New batch system handler
lib/cylc/batch_sys_handlers/pbs_multi_cluster.py
docstring:This should be an easy review: you might not be able to test it, but you can tell that it won't affect any existing cases.
Not assigning @kinow to review as he doesn't have access to PBS. I've tested this on an Azure cluster with PBS, set up to mimic this "multi cluster" heterogeneous environment.
I don't think I can make an integration test for this - it requires a non-trivial cluster environment, and an old version of PBS Pro.
Obviously could make a small unit test for the new function, once our new unit testing machinery is in place.[unit tests added thanks to @kinow]@matthewrmshin - please assign another reviewer from your team...