-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host select #3489
Host select #3489
Conversation
d206297
to
cf7b351
Compare
@dpmatthews you may want to take a cursory scan of this. |
Nice improvement 🎉 One user-facing change here is that you need decent Python (and psutil) chops to get the new "threshold string" right, but we can document how to do it, and make sure that Python errors are properly flagged (you've probably done the latter already, but I haven't looked yet...). |
cf7b351
to
93dd8fc
Compare
So we could actually write an upgrade to do this easily enough but as it's a site config for admin use I think a quick example and a link to the
Any non-zero exit code gets treated the same way, the |
Looks good. I assume this will replace I don't think I assume selection will always choose the highest returned value? (need to remember this when ranking by load for example) |
Yep, will rename, higher is better so for server load divide by 1, will document. The only thing I’ve not built in yet is the ability to specify your own command or script, e.g. to rank by queue length, perhaps a later job. |
Travis is consistently failing for: ./flakytests/restart/39-auto-restart-no-suitable-host.t (Wstat: 256 Tests: 0 Failed: 0) I cannot get these tests to fail locally (NIWA or MO) so will have to push up some experimental debug commits, in the mean time ready for review. |
Test now passing as of f42ff04. The issue is that I was using the host FQDN returned by host selection rather than the specified host name (e.g. |
cylc/flow/cfgspec/globalcfg.py
Outdated
'memory', 'disk-space'], | ||
'thresholds': [VDR.V_STRING], | ||
}, | ||
'thresholds': [VDR.V_STRING] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Dave's comments?
from cylc.flow.remote import remote_cylc_cmd, run_cmd | ||
|
||
|
||
def select_suite_host(cached=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this extra layer of function necessary? It's only adding 1 arg and 1 line to the wrapped function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this method is to extract the required configurations from the global conf so that the interface is independent of the configuration file. For example when I rename the thresholds
configuration I will only have to update the code in one place.
@@ -0,0 +1,155 @@ | |||
"""Test the cylc.flow.host_select module with remote hosts. | |||
|
|||
NOTE: These tests require a remote host to work with and are skipped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you should offer a pointer on how to go about providing a remote host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uses the same system as the functional tests (see git grep 'remote host with shared fs'
).
66b82ea
to
411cc83
Compare
Here is an overview of what got changed by this pull request: Complexity increasing per file
==============================
- cylc/flow/host_select.py 15
- cylc/flow/scripts/cylc_psutil.py 4
Complexity decreasing per file
==============================
+ cylc/flow/scheduler_cli.py -1
See the complete overview on Codacy |
The coverage score appears to be wrong :(
When I run the test locally: $ pytest --cov cylc/flow/tests/test_host_select*
...
cylc/flow/host_select.py 151 11 96 0 95.55% Looking at the CodeCov results it's got all of the doctests, and maybe one of the unittests? |
I am not sure how well codecov is able to understand how many times code is called. That number 1, I think, is the number of builds received by Codecov from Travis that have coverage for that function. If you go to Codecov, then open Pulls, select the Host Select one, and look at "Build", you can click on "Download" to see the I had a look, and only one of these files had any coverage for
Maybe coverage.py and codecov don't compute the coverage in the same way? I tried running it in PyCharm but two tests failed. Will have a better look tomorrow as I had issues with Codecov a few days ago and want to investigate if our reports are missing something 👍 |
@oliver-sanders just tried locally $ pytest cylc/flow/tests/test_host_select* --cov=cylc
platform linux -- Python 3.7.3, pytest-5.3.5, py-1.8.1, pluggy-0.13.1 -- /home/kinow/Development/python/workspace/cylc-flow/venv/bin/python
cachedir: .pytest_cache
rootdir: /home/kinow/Development/python/workspace/cylc-flow, inifile: pytest.ini
plugins: cov-2.8.1
collected 12 items / 2 skipped / 10 selected
cylc/flow/tests/test_host_select.py::test_hostname_checking PASSED [ 8%]
cylc/flow/tests/test_host_select.py::test_localhost PASSED [ 16%]
cylc/flow/tests/test_host_select.py::test_unique PASSED [ 25%]
cylc/flow/tests/test_host_select.py::test_filter PASSE....
...
...
cylc/flow/host_select.py 151 11 96 0 95.55% Then Looking again at the data uploaded to Codecov for this branch, it instead contains Still doesn't explain what's wrong, but at least we know that Codecov is just reporting back the number that was uploaded. The problem may be in our Travis CI set up I guess. |
Ok, either way I don't think its the fault of this PR, thanks for looking into this. Must prioritise GH actions... |
Started running locally the same commands as Travis. Stopping after each command and checking command line output locally and on Travis unit tests. After a while, found one difference that could explain the coverage under what we expected @oliver-sanders . One test is skipped on Travis. Not sure if that's enough to justify the decrease from ~95% to ~85%, but at least we have something to blame. |
I knew about that skip (had to skip the test to get it to pass on Travis - DNS issues) but as you say, should account for such a large difference, when you look at the lines tested on codacy it's clear that it's completely missing tests all-together. |
Where are we at with this? Still looking for the reason for weird coverage results? (I'm trying to catch up on stuff...) |
|
Poke |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor comments, but tested and pretty much approved. Very sophisticated 🤔 (use of ast
... jeez, I have some catching up to do).
411cc83
to
6191741
Compare
Co-Authored-By: Hilary James Oliver <hilary.j.oliver@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Travis passed already, I just committed a smaller conflict resolution to this branch via the GH UI (setup.py only) so ignoring the travis re-build. Can't cancel it though, the control buttons have disappeared again 😬 |
Closes #3391, #3392, #3398, #3233
Addresses cylc/cylc-admin#65 (the
[platform groups]
part)Addresses #2199 (the
rose host-select
part)Headline:
Abstract the existing suite host selection infrastructure to allow us to use it for the new
[platform groups]
feature which is effectively the migration ofrose host-select
functionality.Overview:
rose host-select
functionality to Cylc.psutil
expressions (host-metrics: use psutil #3391).psutil
for extra configurability./proc/meminfo
any more)psutil
even works on Windows!cylc.flow.remote:run_cmd
and thestdin
option (unit-tested).Open Questions
select_host
method will only consider hosts which are visible from the current platform (because it first gets the fqdn of each host). This could be made optional either now or later if necessary.Example: Cylc Suite Host Select:
~/.cylc/flow/8.0a1
Example: Abstract Host Select Interface:
TODO:
(pending cursory approval)
Requirements check-list
CONTRIBUTING.md
and added my name as a Code Contributor.