Drop Python 3.7 #562

guillaumeeb · 2022-07-24T17:57:12Z

fixes #561.

At the same time, trying to fix CI issues if possible.

guillaumeeb · 2022-07-26T11:04:05Z

Okay, so I'm not sure why the none test is failing with config error here. When I try to reproduce it locally in the same environment, everything is fine for the config part. See:

$ pytest dask_jobqueue/tests/test_slurm.py --verbose
=================================================================================== test session starts ====================================================================================
platform linux -- Python 3.8.13, pytest-7.1.2, pluggy-1.0.0 -- /home/guillaumeeb/miniconda3/envs/dask-dev/bin/python
cachedir: .pytest_cache
rootdir: /home/guillaumeeb/dask-jobqueue
collected 7 items

dask_jobqueue/tests/test_slurm.py::test_header PASSED                                                                                                                                [ 14%]
dask_jobqueue/tests/test_slurm.py::test_job_script PASSED                                                                                                                            [ 28%]
dask_jobqueue/tests/test_slurm.py::test_basic SKIPPED (test requires env in ['slurm'])                                                                                               [ 42%]
dask_jobqueue/tests/test_slurm.py::test_adaptive SKIPPED (test requires env in ['slurm'])                                                                                            [ 57%]
dask_jobqueue/tests/test_slurm.py::test_config_name_slurm_takes_custom_config PASSED                                                                                                 [ 71%]
dask_jobqueue/tests/test_slurm.py::test_different_interfaces_on_scheduler_and_workers SKIPPED (test requires env in ['slurm'])                                                       [ 85%]
dask_jobqueue/tests/test_slurm.py::test_worker_name_uses_cluster_name SKIPPED (test requires env in ['slurm'])                                                                       [100%]

=============================================================================== 3 passed, 4 skipped in 4.36s ===============================================================================

guillaumeeb · 2022-07-26T11:05:01Z

Any insights @jacobtomlinson @jrbourbeau ?

guillaumeeb · 2022-07-26T11:09:57Z

Correction: when I launch all the tests, I see also the KeyError from config. I'll try to understand why.

jacobtomlinson · 2022-07-26T11:11:56Z

I've not seen that before, I know @mrocklin was talking about changing the way configs in subprojects are handled in dask/dask#9171, although that was just a documentation change. But maybe something else has changed?

guillaumeeb · 2022-07-26T18:50:53Z

Okay, so after some time of investigation, I narrowed the problem to tests using the loop fixture from distributed.utils_test.

I'm not sure why we use this fixture so often and what it's supposed to do (give a clean loop for the Scheduler to run on?), but it seems it doesn't play well with default loaded dask configuration. It looks like whenever this fixture is used, the default jobqueue config is lost from dask.config (but it also looks like dask or distributed configuration is still there).

Any hints on how to avoid that or what to change?

jacobtomlinson · 2022-07-27T08:43:24Z

@graingert has been cleaning up a bunch of the loop handling stuff in distributed lately, so perhaps something has broken things here. Thomas do you have any thoughts on this?

graingert · 2022-07-27T09:23:50Z

we reset the config as part of the loop fixture setup: https://github.com/dask/distributed/blob/236945a3f734eea2e687381e5cd86b561796c535/distributed/utils_test.py#L1883

when you make your own loop you should be able to use:

def my_loop(loop):
    with dask.config.set(...):
        yield loop

graingert · 2022-07-27T09:49:54Z

and setting the dask config within the cleanup fixture should help too:

From 70375807624e819620b4cad21041b41bb3ece69a Mon Sep 17 00:00:00 2001
From: Thomas Grainger <tagrain@gmail.com>
Date: Wed, 27 Jul 2022 10:47:36 +0100
Subject: [PATCH] set dask config within cleanup

---
 conftest.py | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/conftest.py b/conftest.py
index 1b92c19..26f7aae 100644
--- a/conftest.py
+++ b/conftest.py
@@ -1,7 +1,7 @@
 # content of conftest.py
 
 # Make loop fixture available in all tests
-from distributed.utils_test import loop  # noqa: F401
+from distributed.utils_test import loop, cleanup  # noqa: F401
 
 import pytest
 
@@ -97,18 +97,22 @@ all_envs = {
 @pytest.fixture(
     params=[pytest.param(v, marks=[pytest.mark.env(k)]) for (k, v) in all_envs.items()]
 )
-def EnvSpecificCluster(request):
+def EnvSpecificCluster(request, cleanup):
     """Run test only with the specific cluster class set by the environment"""
     if request.param == HTCondorCluster:
         # HTCondor requires explicitly specifying requested disk space
-        dask.config.set({"jobqueue.htcondor.disk": "1GB"})
-    return request.param
+        with dask.config.set({"jobqueue.htcondor.disk": "1GB"}):
+            yield request.param
+    else:
+        yield request.param
 
 
-@pytest.fixture(params=list(all_envs.values()))
-def Cluster(request):
+@pytest.fixture(params=list(all_envs.values()))
+def Cluster(request, cleanup):
     """Run test for each cluster class when no environment is set (test should not require the actual scheduler)"""
     if request.param == HTCondorCluster:
         # HTCondor requires explicitly specifying requested disk space
-        dask.config.set({"jobqueue.htcondor.disk": "1GB"})
-    return request.param
+        with dask.config.set({"jobqueue.htcondor.disk": "1GB"}):
+            yield request.param
+    else:
+        yield request.param
-- 
2.34.1

guillaumeeb · 2022-07-27T10:58:57Z

when you make your own loop you should be able to use

I did not try yet, and I'm not fluent in context manager and everything, but I'm not sure how this may fix our issue here.

How can I use the loop fixture and inherit from default dask-jobqueue config on it? Is this what the code you mention is doing?

…hen called

guillaumeeb · 2022-07-28T18:27:54Z

Okay, so we have some Bytes counting issues here. After investigation, this originates from dask/dask#7484. As we store the memory as an integer, and we express it in our tests as GB and non GiB, we lose information by converting back and forth from String to Int. See below what I get in Python:

In [23]: from dask.utils import parse_bytes, format_bytes

In [24]: format_bytes(parse_bytes('2GB'))
Out[24]: '1.86 GiB'

In [25]: parse_bytes(format_bytes(parse_bytes('2GB')))
Out[25]: 1997159792

… HTCondor

guillaumeeb · 2022-08-03T19:33:11Z

So I've made things work locally, even with HTCondor tests. But CI still fails.

HTCondor test fix is not really clean, one test is leaving some job running which prevents other tests to succeed. So I had to manually remove these jobs from the queue at the end of the test. Problem is if some test fails, the queue might not be cleaned properly. Adding some fixture for cleaning HTCondor running jobs might help

Moreover, the scheduling cycle seems to be a bit high with default config, so I had to use somehow high timeouts.

We could probably do better, cc @riedel, @mivade or @jolange. If someone is willing to take a look, please do so. We should look in docker compose setup for reducing scheduling cycle.

guillaumeeb · 2022-08-06T20:08:08Z

So I was going to give up HTCondor tests for now and call it a day (more of a week actually), but the dirty hack I introduced with overriding cleanup fixture doesn't work anymore with the last distributed version. The loop fixture has been modified there (dask/distributed@172e37f), and so again all the dask-jobqueue configuration is lost when using it.

@graingert I think I'll really need your help in order to fix this. I don't see a simple even if hacky workaround for making CI work here, we'll probably need to fix #567 by continuing #565. But if you have a temporary workaround, I'll take it.

guillaumeeb · 2022-08-07T06:44:36Z

One thing I'd like to understand is:do we really need the loop fixture from distributed?

I've looked at dask-kubernetes, dask-cloudprovider, dask-yarn, and didn't find any use of it there.

The only external place I saw it used was in Xarray.

What's the goal of this fixture for us?

guillaumeeb · 2022-08-07T17:14:19Z

There are a lot of modifications here. First is the drop of Python 3.7, some are genuine tests fixes (PBS, LSF, None, HTCondor), others are attempts to fix tests or adding debugging and readability.

Finally, there is one part that is just a dirty workaround with distributed fixtures to have tests passing even if code does not handle loops correctly.

I'm going to merge this anyway, because it still allows a bit of tests and checks upon merge requests.
I'm also going to open issues for remaining questions or problem unsolved.

* Drop Python 3.7 * Fix cleanup fixture probem (see fistributed#9137) * Override cleanup distributed fixture, and reconfigure dask-jobqueue when called * Use power of 2 for the memory checks in tests (see dask#7484) * Apply Black * Apply correct version of black... * conda install Python in HTCondor Docker image * Fix HTCondor Dockerfile * Fix PBS Issue * Add a timeout for every wait_for_workers call * Fix HTCondor tests, leave room for scheduling cycle to take place for HTCondor * flake check * debugging HTCondor tests on CI * Flake * reduce negotiator interval for faster job queuing, more debugging logs * move condor command at the right place * always run cleanup step, and print more things on HTCondor * Clean temporary debug and other not necessary modifications * Disable HTCondor CI for now * Import loop_in_thread fixture from distributed * Override method from distributed to make tests pass

…ob scripts (#560) * Use `--nworkers` in stead of deprecated `--nprocs` in the generated job scripts Fix #559. * Update the requirements for dask and distributed This is needed to support the `--nworkers` option. * Fix behaviour of `env_extra` for HTCondor (#563) * fix behaviour of `env_extra` for HTCondor Before only environment variables were considered using HTCondor's `Environment =` feature, whereas the parameter description generally says "Other commands to add". If the `Environment =` is needed, one can still use the generic `job_extra` parameter to set it. fixes #393 related to #323 #556 * adapt htcondor tests for new `env_extra` behaviour * adapt htcondor tests for new `env_extra` behaviour "export" is preserved now * docs: made "Moab Deployments" heading the same level as the others * docs: added description of HTCondorCluster + env_extra - example in the docstring - description in "example deployments"" - description in "advanced tops an tricks" * docs: removed the HTCondorCluster section from examples * formatting according to black and flake8 * Drop Python 3.7 (#562) * Drop Python 3.7 * Fix cleanup fixture probem (see fistributed#9137) * Override cleanup distributed fixture, and reconfigure dask-jobqueue when called * Use power of 2 for the memory checks in tests (see dask#7484) * Apply Black * Apply correct version of black... * conda install Python in HTCondor Docker image * Fix HTCondor Dockerfile * Fix PBS Issue * Add a timeout for every wait_for_workers call * Fix HTCondor tests, leave room for scheduling cycle to take place for HTCondor * flake check * debugging HTCondor tests on CI * Flake * reduce negotiator interval for faster job queuing, more debugging logs * move condor command at the right place * always run cleanup step, and print more things on HTCondor * Clean temporary debug and other not necessary modifications * Disable HTCondor CI for now * Import loop_in_thread fixture from distributed * Override method from distributed to make tests pass Co-authored-by: Johannes Lange <jolange@users.noreply.github.com> Co-authored-by: Guillaume Eynard-Bontemps <g.eynard.bontemps@gmail.com>

graingert · 2022-08-07T19:01:05Z

conftest.py

+
+
+# Overriding distributed.utils_test.reset_config()  method because it reset the
+# config from ou tests.


Woops, I was too eager to merge this, I left a typo, thanks for the proof read! I hope anyway that this text will soon disappear.

guillaumeeb added 2 commits July 24, 2022 21:56

Drop Python 3.7

96b38fa

Fix cleanup fixture probem (see fistributed#9137)

fa27942

guillaumeeb mentioned this pull request Jul 26, 2022

Use --nworkers in stead of deprecated --nprocs in the generated job scripts #560

Merged

Override cleanup distributed fixture, and reconfigure dask-jobqueue w…

d50cc2d

…hen called

guillaumeeb added 9 commits July 28, 2022 22:36

Use power of 2 for the memory checks in tests (see dask#7484)

8cf3c27

Apply Black

dc104f1

Apply correct version of black...

465f51e

conda install Python in HTCondor Docker image

2dad405

Fix HTCondor Dockerfile

d79ddce

Fix PBS Issue

77fe271

Add a timeout for every wait_for_workers call

eaadeb8

Fix HTCondor tests, leave room for scheduling cycle to take place for…

5561ea1

… HTCondor

flake check

5251db4

guillaumeeb added 6 commits August 5, 2022 22:19

debugging HTCondor tests on CI

1f585c2

Flake

ee6e36a

reduce negotiator interval for faster job queuing, more debugging logs

6d35b88

move condor command at the right place

0bbd7f7

always run cleanup step, and print more things on HTCondor

19ad536

Clean temporary debug and other not necessary modifications

01206d1

guillaumeeb added 2 commits August 6, 2022 22:17

Disable HTCondor CI for now

eb9b2bb

Import loop_in_thread fixture from distributed

48cfc23

Override method from distributed to make tests pass

e8a1617

guillaumeeb merged commit 5308486 into dask:main Aug 7, 2022

guillaumeeb mentioned this pull request Aug 7, 2022

HTCondor CI is failing #568

Closed

riedel mentioned this pull request Aug 7, 2022

CI errors after dropping Python 3.6 #547

Closed

guillaumeeb mentioned this pull request Aug 7, 2022

CI: Distributed fixtures not compliant with dask-jobqueue #569

Open

graingert reviewed Aug 7, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop Python 3.7 #562

Drop Python 3.7 #562

guillaumeeb commented Jul 24, 2022

guillaumeeb commented Jul 26, 2022 •

edited

Loading

guillaumeeb commented Jul 26, 2022

guillaumeeb commented Jul 26, 2022

jacobtomlinson commented Jul 26, 2022

guillaumeeb commented Jul 26, 2022 •

edited

Loading

jacobtomlinson commented Jul 27, 2022

graingert commented Jul 27, 2022

graingert commented Jul 27, 2022 •

edited

Loading

guillaumeeb commented Jul 27, 2022

guillaumeeb commented Jul 28, 2022

guillaumeeb commented Aug 3, 2022

guillaumeeb commented Aug 6, 2022

guillaumeeb commented Aug 7, 2022

guillaumeeb commented Aug 7, 2022 •

edited

Loading

graingert Aug 7, 2022

guillaumeeb Aug 7, 2022



		# Overriding distributed.utils_test.reset_config() method because it reset the
		# config from ou tests.

Drop Python 3.7 #562

Drop Python 3.7 #562

Conversation

guillaumeeb commented Jul 24, 2022

guillaumeeb commented Jul 26, 2022 • edited Loading

guillaumeeb commented Jul 26, 2022

guillaumeeb commented Jul 26, 2022

jacobtomlinson commented Jul 26, 2022

guillaumeeb commented Jul 26, 2022 • edited Loading

jacobtomlinson commented Jul 27, 2022

graingert commented Jul 27, 2022

graingert commented Jul 27, 2022 • edited Loading

guillaumeeb commented Jul 27, 2022

guillaumeeb commented Jul 28, 2022

guillaumeeb commented Aug 3, 2022

guillaumeeb commented Aug 6, 2022

guillaumeeb commented Aug 7, 2022

guillaumeeb commented Aug 7, 2022 • edited Loading

graingert Aug 7, 2022

Choose a reason for hiding this comment

guillaumeeb Aug 7, 2022

Choose a reason for hiding this comment

guillaumeeb commented Jul 26, 2022 •

edited

Loading

guillaumeeb commented Jul 26, 2022 •

edited

Loading

graingert commented Jul 27, 2022 •

edited

Loading

guillaumeeb commented Aug 7, 2022 •

edited

Loading