-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop Python 3.7 #562
Drop Python 3.7 #562
Conversation
Okay, so I'm not sure why the none test is failing with config error here. When I try to reproduce it locally in the same environment, everything is fine for the config part. See: $ pytest dask_jobqueue/tests/test_slurm.py --verbose
=================================================================================== test session starts ====================================================================================
platform linux -- Python 3.8.13, pytest-7.1.2, pluggy-1.0.0 -- /home/guillaumeeb/miniconda3/envs/dask-dev/bin/python
cachedir: .pytest_cache
rootdir: /home/guillaumeeb/dask-jobqueue
collected 7 items
dask_jobqueue/tests/test_slurm.py::test_header PASSED [ 14%]
dask_jobqueue/tests/test_slurm.py::test_job_script PASSED [ 28%]
dask_jobqueue/tests/test_slurm.py::test_basic SKIPPED (test requires env in ['slurm']) [ 42%]
dask_jobqueue/tests/test_slurm.py::test_adaptive SKIPPED (test requires env in ['slurm']) [ 57%]
dask_jobqueue/tests/test_slurm.py::test_config_name_slurm_takes_custom_config PASSED [ 71%]
dask_jobqueue/tests/test_slurm.py::test_different_interfaces_on_scheduler_and_workers SKIPPED (test requires env in ['slurm']) [ 85%]
dask_jobqueue/tests/test_slurm.py::test_worker_name_uses_cluster_name SKIPPED (test requires env in ['slurm']) [100%]
=============================================================================== 3 passed, 4 skipped in 4.36s =============================================================================== |
Any insights @jacobtomlinson @jrbourbeau ? |
Correction: when I launch all the tests, I see also the KeyError from config. I'll try to understand why. |
I've not seen that before, I know @mrocklin was talking about changing the way configs in subprojects are handled in dask/dask#9171, although that was just a documentation change. But maybe something else has changed? |
Okay, so after some time of investigation, I narrowed the problem to tests using the I'm not sure why we use this fixture so often and what it's supposed to do (give a clean loop for the Scheduler to run on?), but it seems it doesn't play well with default loaded dask configuration. It looks like whenever this fixture is used, the default jobqueue config is lost from Any hints on how to avoid that or what to change? |
@graingert has been cleaning up a bunch of the loop handling stuff in |
we reset the config as part of the loop fixture setup: https://github.com/dask/distributed/blob/236945a3f734eea2e687381e5cd86b561796c535/distributed/utils_test.py#L1883 when you make your own loop you should be able to use: def my_loop(loop):
with dask.config.set(...):
yield loop |
and setting the dask config within the cleanup fixture should help too: From 70375807624e819620b4cad21041b41bb3ece69a Mon Sep 17 00:00:00 2001
From: Thomas Grainger <tagrain@gmail.com>
Date: Wed, 27 Jul 2022 10:47:36 +0100
Subject: [PATCH] set dask config within cleanup
---
conftest.py | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/conftest.py b/conftest.py
index 1b92c19..26f7aae 100644
--- a/conftest.py
+++ b/conftest.py
@@ -1,7 +1,7 @@
# content of conftest.py
# Make loop fixture available in all tests
-from distributed.utils_test import loop # noqa: F401
+from distributed.utils_test import loop, cleanup # noqa: F401
import pytest
@@ -97,18 +97,22 @@ all_envs = {
@pytest.fixture(
params=[pytest.param(v, marks=[pytest.mark.env(k)]) for (k, v) in all_envs.items()]
)
-def EnvSpecificCluster(request):
+def EnvSpecificCluster(request, cleanup):
"""Run test only with the specific cluster class set by the environment"""
if request.param == HTCondorCluster:
# HTCondor requires explicitly specifying requested disk space
- dask.config.set({"jobqueue.htcondor.disk": "1GB"})
- return request.param
+ with dask.config.set({"jobqueue.htcondor.disk": "1GB"}):
+ yield request.param
+ else:
+ yield request.param
-@pytest.fixture(params=list(all_envs.values()))
-def Cluster(request):
+@pytest.fixture(params=list(all_envs.values()))
+def Cluster(request, cleanup):
"""Run test for each cluster class when no environment is set (test should not require the actual scheduler)"""
if request.param == HTCondorCluster:
# HTCondor requires explicitly specifying requested disk space
- dask.config.set({"jobqueue.htcondor.disk": "1GB"})
- return request.param
+ with dask.config.set({"jobqueue.htcondor.disk": "1GB"}):
+ yield request.param
+ else:
+ yield request.param
--
2.34.1 |
I did not try yet, and I'm not fluent in context manager and everything, but I'm not sure how this may fix our issue here. How can I use the loop fixture and inherit from default dask-jobqueue config on it? Is this what the code you mention is doing? |
Okay, so we have some Bytes counting issues here. After investigation, this originates from dask/dask#7484. As we store the memory as an integer, and we express it in our tests as GB and non GiB, we lose information by converting back and forth from String to Int. See below what I get in Python: In [23]: from dask.utils import parse_bytes, format_bytes
In [24]: format_bytes(parse_bytes('2GB'))
Out[24]: '1.86 GiB'
In [25]: parse_bytes(format_bytes(parse_bytes('2GB')))
Out[25]: 1997159792 |
So I've made things work locally, even with HTCondor tests. But CI still fails. HTCondor test fix is not really clean, one test is leaving some job running which prevents other tests to succeed. So I had to manually remove these jobs from the queue at the end of the test. Problem is if some test fails, the queue might not be cleaned properly. Adding some fixture for cleaning HTCondor running jobs might help Moreover, the scheduling cycle seems to be a bit high with default config, so I had to use somehow high timeouts. We could probably do better, cc @riedel, @mivade or @jolange. If someone is willing to take a look, please do so. We should look in docker compose setup for reducing scheduling cycle. |
So I was going to give up HTCondor tests for now and call it a day (more of a week actually), but the dirty hack I introduced with overriding @graingert I think I'll really need your help in order to fix this. I don't see a simple even if hacky workaround for making CI work here, we'll probably need to fix #567 by continuing #565. But if you have a temporary workaround, I'll take it. |
One thing I'd like to understand is:do we really need the loop fixture from distributed? I've looked at dask-kubernetes, dask-cloudprovider, dask-yarn, and didn't find any use of it there. The only external place I saw it used was in Xarray. What's the goal of this fixture for us? |
There are a lot of modifications here. First is the drop of Python 3.7, some are genuine tests fixes (PBS, LSF, None, HTCondor), others are attempts to fix tests or adding debugging and readability. Finally, there is one part that is just a dirty workaround with distributed fixtures to have tests passing even if code does not handle loops correctly. I'm going to merge this anyway, because it still allows a bit of tests and checks upon merge requests. |
* Drop Python 3.7 * Fix cleanup fixture probem (see fistributed#9137) * Override cleanup distributed fixture, and reconfigure dask-jobqueue when called * Use power of 2 for the memory checks in tests (see dask#7484) * Apply Black * Apply correct version of black... * conda install Python in HTCondor Docker image * Fix HTCondor Dockerfile * Fix PBS Issue * Add a timeout for every wait_for_workers call * Fix HTCondor tests, leave room for scheduling cycle to take place for HTCondor * flake check * debugging HTCondor tests on CI * Flake * reduce negotiator interval for faster job queuing, more debugging logs * move condor command at the right place * always run cleanup step, and print more things on HTCondor * Clean temporary debug and other not necessary modifications * Disable HTCondor CI for now * Import loop_in_thread fixture from distributed * Override method from distributed to make tests pass
…ob scripts (#560) * Use `--nworkers` in stead of deprecated `--nprocs` in the generated job scripts Fix #559. * Update the requirements for dask and distributed This is needed to support the `--nworkers` option. * Fix behaviour of `env_extra` for HTCondor (#563) * fix behaviour of `env_extra` for HTCondor Before only environment variables were considered using HTCondor's `Environment =` feature, whereas the parameter description generally says "Other commands to add". If the `Environment =` is needed, one can still use the generic `job_extra` parameter to set it. fixes #393 related to #323 #556 * adapt htcondor tests for new `env_extra` behaviour * adapt htcondor tests for new `env_extra` behaviour "export" is preserved now * docs: made "Moab Deployments" heading the same level as the others * docs: added description of HTCondorCluster + env_extra - example in the docstring - description in "example deployments"" - description in "advanced tops an tricks" * docs: removed the HTCondorCluster section from examples * formatting according to black and flake8 * Drop Python 3.7 (#562) * Drop Python 3.7 * Fix cleanup fixture probem (see fistributed#9137) * Override cleanup distributed fixture, and reconfigure dask-jobqueue when called * Use power of 2 for the memory checks in tests (see dask#7484) * Apply Black * Apply correct version of black... * conda install Python in HTCondor Docker image * Fix HTCondor Dockerfile * Fix PBS Issue * Add a timeout for every wait_for_workers call * Fix HTCondor tests, leave room for scheduling cycle to take place for HTCondor * flake check * debugging HTCondor tests on CI * Flake * reduce negotiator interval for faster job queuing, more debugging logs * move condor command at the right place * always run cleanup step, and print more things on HTCondor * Clean temporary debug and other not necessary modifications * Disable HTCondor CI for now * Import loop_in_thread fixture from distributed * Override method from distributed to make tests pass Co-authored-by: Johannes Lange <jolange@users.noreply.github.com> Co-authored-by: Guillaume Eynard-Bontemps <g.eynard.bontemps@gmail.com>
|
||
|
||
# Overriding distributed.utils_test.reset_config() method because it reset the | ||
# config from ou tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, I was too eager to merge this, I left a typo, thanks for the proof read! I hope anyway that this text will soon disappear.
fixes #561.
At the same time, trying to fix CI issues if possible.