use asyncio.run to run gen_cluster, gen_test and cluster #6231

graingert · 2022-04-28T14:35:51Z

Closes Test fixtures gen_cluster gen_test are using asyncio.run #6164
Tests added / passed
Passes pre-commit run --all-files

github-actions · 2022-04-28T18:18:56Z

Unit Test Results

      15 files ±0       15 suites ±0 6h 41m 14s ⏱️ + 20m 25s
  2 812 tests ±0   2 727 ✔️ - 5   80 💤 +1 5 ❌ +4
20 847 runs ±0 19 908 ✔️ - 7 933 💤 +2 6 ❌ +5

For more details on these failures, see this check.

Results for commit 86cef41. ± Comparison against base commit 046ab17.

♻️ This comment has been updated with latest results.

fjetter

CI is very sad

fjetter · 2022-05-19T16:18:36Z

distributed/utils_test.py

+        @clean(**clean_kwargs)
        def test_func(*outer_args, **kwargs):
-            result = None
-            with clean(timeout=active_rpc_timeout, **clean_kwargs) as loop:
-
-                async def coro():
-                    with tempfile.TemporaryDirectory() as tmpdir:
-                        config2 = merge({"temporary-directory": tmpdir}, config)
-                        with dask.config.set(config2):
-                            workers = []
-                            s = False
-
-                            for _ in range(60):
-                                try:
-                                    s, ws = await start_cluster(
-                                        nthreads,
-                                        scheduler,
-                                        loop,
-                                        security=security,
-                                        Worker=Worker,
-                                        scheduler_kwargs=scheduler_kwargs,
-                                        worker_kwargs=worker_kwargs,
-                                    )
-                                except Exception as e:
-                                    logger.error(
-                                        "Failed to start gen_cluster: "
-                                        f"{e.__class__.__name__}: {e}; retrying",
-                                        exc_info=True,
-                                    )
-                                    await asyncio.sleep(1)
-                                else:
-                                    workers[:] = ws
-                                    args = [s] + workers
-                                    break
-                            if s is False:
-                                raise Exception("Could not start cluster")
-                            if client:
-                                c = await Client(
-                                    s.address,
-                                    loop=loop,
+            async def async_fn():
+                result = None


I assume the entire block that follows is just an indentation?

there's actually a few changes, eg I don't pass loop to await start_cluster( and await Client(

To make review easier, you could split the indentation vs other changes up into two commits.

OR you leave comments pointing reviewers to the code you actually modified (if it's just about the loop, that's ok)

yep it's just the avoiding passing a loop kwarg, moving clean to a decorator, and using asyncio.run with asyncio.wait_for instead of run_sync

graingert · 2022-05-20T14:10:02Z

CI is very sad

CI should be happier now!

fjetter

I'm OK with the changes proposed. If test failures turn out to be unrelated I'm good to go.

I left a few review comments where I think more inline commentary would help but this is not a requirement.

fjetter · 2022-05-25T13:19:07Z

distributed/utils_test.py

+    loop_started = concurrent.futures.Future()
+    with concurrent.futures.ThreadPoolExecutor(
+        1, thread_name_prefix="test IOLoop"
+    ) as tpe:
+
+        async def run():
+            io_loop = IOLoop.current()
+            stop_event = asyncio.Event()
+            loop_started.set_result((io_loop, stop_event))
+            await stop_event.wait()
+
+        ran = tpe.submit(_run_and_close_tornado, run)
+        for f in concurrent.futures.as_completed((loop_started, ran)):
+            if f is loop_started:
+                io_loop, stop_event = loop_started.result()
+                try:
+                    yield io_loop
+                finally:
+                    io_loop.add_callback(stop_event.set)
+
+            elif f is ran:
+                ran.result()
+                return


This is more complicated than before but also much more robust.

I think it makes sense but a comment to justify this complexity would be nice. Very naively I would've implemented something more similar to the previous version than this

fjetter · 2022-05-25T13:20:14Z

distributed/utils_test.py

+
+@contextmanager
+def clean(threads=True, instances=True, processes=True):
+    asyncio.set_event_loop(None)


What exactly is this doing? The documentation doesn't state the behavior if None is passed.

this causes asyncio.get_event_loop() to fail, which causes IOLoop.current() to fail. It's the equivalent of running asyncio.run(noop()):

https://github.com/python/cpython/blob/02d35fc4ef8a718e74ab97402b6d8371b69d3635/Lib/asyncio/runners.py#L51

It's the side effect of calling IOLoop.clear_current(), where asyncio.get_event_loop() raises a RuntimeError:

https://github.com/tornadoweb/tornado/blob/43ae5839a56e445dd2d10539718f1e0c8053d995/tornado/platform/asyncio.py#L346-L355

try: self.old_asyncio = asyncio.get_event_loop() except (RuntimeError, AssertionError): self.old_asyncio = None # type: ignore # get_event_loop() failed so `_clear_current_hook` calls asyncio.set_event_loop(None)` self.is_current = True asyncio.set_event_loop(self.asyncio_loop) def _clear_current_hook(self) -> None: if self.is_current: asyncio.set_event_loop(self.old_asyncio)

fjetter · 2022-05-25T13:21:04Z

distributed/utils_test.py

+        @clean(**clean_kwargs)
        def test_func(*outer_args, **kwargs):
-            result = None
-            with clean(timeout=active_rpc_timeout, **clean_kwargs) as loop:
-
-                async def coro():
-                    with tempfile.TemporaryDirectory() as tmpdir:
-                        config2 = merge({"temporary-directory": tmpdir}, config)
-                        with dask.config.set(config2):
-                            workers = []
-                            s = False
-
-                            for _ in range(60):
-                                try:
-                                    s, ws = await start_cluster(
-                                        nthreads,
-                                        scheduler,
-                                        loop,
-                                        security=security,
-                                        Worker=Worker,
-                                        scheduler_kwargs=scheduler_kwargs,
-                                        worker_kwargs=worker_kwargs,
-                                    )
-                                except Exception as e:
-                                    logger.error(
-                                        "Failed to start gen_cluster: "
-                                        f"{e.__class__.__name__}: {e}; retrying",
-                                        exc_info=True,
-                                    )
-                                    await asyncio.sleep(1)
-                                else:
-                                    workers[:] = ws
-                                    args = [s] + workers
-                                    break
-                            if s is False:
-                                raise Exception("Could not start cluster")
-                            if client:
-                                c = await Client(
-                                    s.address,
-                                    loop=loop,
+            async def async_fn():
+                result = None


To make review easier, you could split the indentation vs other changes up into two commits.

OR you leave comments pointing reviewers to the code you actually modified (if it's just about the loop, that's ok)

distributed/utils_test.py

rather than making a pristine loop this forces functions defended by clean to make their own pristine loop

graingert · 2022-05-26T10:22:33Z

looks like the test suite ending with exit code 1 and all tests passing is already happening: https://github.com/dask/distributed/runs/6604868888?check_suite_focus=true#step:11:1852 (from #6451)

graingert · 2022-05-26T10:38:52Z

ubuntu-latest 3.8 c1: #6452
ubuntu-latest 3.10 not ci1: #6453 #6305
windows-latest 3.8 not c1: #6305
windows-latest 3.9 ci: #6455
windows-latest 3.10 not c1: #6454
gpuCI: #6429

mrocklin · 2022-05-26T11:59:50Z

It's introducing stricter cleanup and cancelling tasks instead of allowing them to be garbage collected, so it's likely to introduce more flaky tests.

I'm curious about this. Is there some useful signal that we can give to future developers about why a test is flaky in this way that would help them? I ask this question without having thought deeply about it.

mrocklin · 2022-05-26T12:02:17Z

Or, more broadly, do you have thoughts on how to address the increase in flaky tests that we're likely to see?

I like that we're flushing these issues out into the open, but I'd also like to get some ideas on how we can address them. Is there any systemic approach that you see?

graingert · 2022-05-26T12:11:51Z

It's introducing stricter cleanup and cancelling tasks instead of allowing them to be garbage collected, so it's likely to introduce more flaky tests.

I'm curious about this. Is there some useful signal that we can give to future developers about why a test is flaky in this way that would help them? I ask this question without having thought deeply about it.

This is mostly about edge cases involving implicitly closed rpc objects and implicitly closed tasks.

Implicitly closed tasks are now explicitly cancelled and so await calls will start raising CancelledError during the test rather than GeneratorExit when the task is garbage collected.

In addition some loop.add_callback calls that were previously discarded after the test will now get run in the test

With changes from dask/distributed#6231 , the `loop` fixture now depends on the `cleanup` fixture, which must be imported explicitly. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Mads R. B. Kristensen (https://github.com/madsbk) URL: #924

graingert force-pushed the asyncio-run-gen-cluster-gen-test branch 2 times, most recently from ad01bd9 to 9b89f8a Compare May 5, 2022 14:24

graingert force-pushed the asyncio-run-gen-cluster-gen-test branch 6 times, most recently from b159135 to b3d72d8 Compare May 12, 2022 14:29

graingert marked this pull request as ready for review May 12, 2022 14:39

graingert force-pushed the asyncio-run-gen-cluster-gen-test branch 6 times, most recently from b817c5d to a84e1b5 Compare May 19, 2022 14:03

fjetter reviewed May 19, 2022

View reviewed changes

graingert force-pushed the asyncio-run-gen-cluster-gen-test branch 3 times, most recently from 1f1853b to 77eae29 Compare May 20, 2022 11:40

graingert requested a review from fjetter May 20, 2022 14:03

graingert force-pushed the asyncio-run-gen-cluster-gen-test branch from 77eae29 to cabab46 Compare May 23, 2022 10:21

graingert mentioned this pull request May 23, 2022

Test fixtures gen_cluster gen_test are using asyncio.run #6164

Closed

4 tasks

fjetter approved these changes May 25, 2022

View reviewed changes

graingert mentioned this pull request May 25, 2022

always bind the Scheduler to the IOLoop.current() - and deprecate the Scheduler(loop= kwarg #6443

Merged

2 tasks

graingert commented May 25, 2022

View reviewed changes

distributed/utils_test.py Outdated Show resolved Hide resolved

graingert added 3 commits May 26, 2022 09:14

use asyncio.run to run gen_cluster, gen_test and cluster

fbc8933

clear asyncio loop in clean

28346de

rather than making a pristine loop this forces functions defended by clean to make their own pristine loop

avoid pristine_loop/loop.start in loop_in_thread

44ba8c0

comment loop_in_thread

86cef41

graingert force-pushed the asyncio-run-gen-cluster-gen-test branch from 0ed4344 to 86cef41 Compare May 26, 2022 08:14

mrocklin merged commit f20f776 into dask:main May 27, 2022

graingert deleted the asyncio-run-gen-cluster-gen-test branch May 27, 2022 14:44

jrbourbeau mentioned this pull request May 27, 2022

Can't find cleanup pytest fixture dask/dask#9137

Open

pentschev mentioned this pull request May 31, 2022

Import cleanup fixture in test_dask_cuda_worker.py rapidsai/dask-cuda#924

Merged

pentschev mentioned this pull request Jul 6, 2022

Add get_event_loop() util function rapidsai/ucx-py#864

Merged

gjoseph92 mentioned this pull request Aug 3, 2022

Flaky tests: OSError: Timed out trying to connect to tcp://127.0.0.1:8786 after 5 s #6731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use asyncio.run to run gen_cluster, gen_test and cluster #6231

use asyncio.run to run gen_cluster, gen_test and cluster #6231

graingert commented Apr 28, 2022 •

edited

Loading

github-actions bot commented Apr 28, 2022 •

edited

Loading

fjetter left a comment

fjetter May 19, 2022

graingert May 20, 2022

fjetter May 25, 2022

graingert May 26, 2022

graingert commented May 20, 2022

fjetter left a comment

fjetter May 25, 2022

fjetter May 25, 2022

graingert May 25, 2022

graingert May 25, 2022 •

edited

Loading

fjetter May 25, 2022

graingert commented May 26, 2022 •

edited

Loading

graingert commented May 26, 2022 •

edited

Loading

mrocklin commented May 26, 2022

mrocklin commented May 26, 2022

graingert commented May 26, 2022 •

edited

Loading

use asyncio.run to run gen_cluster, gen_test and cluster #6231

use asyncio.run to run gen_cluster, gen_test and cluster #6231

Conversation

graingert commented Apr 28, 2022 • edited Loading

github-actions bot commented Apr 28, 2022 • edited Loading

Unit Test Results

fjetter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graingert commented May 20, 2022

fjetter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graingert May 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graingert commented May 26, 2022 • edited Loading

graingert commented May 26, 2022 • edited Loading

mrocklin commented May 26, 2022

mrocklin commented May 26, 2022

graingert commented May 26, 2022 • edited Loading

graingert commented Apr 28, 2022 •

edited

Loading

github-actions bot commented Apr 28, 2022 •

edited

Loading

graingert May 25, 2022 •

edited

Loading

graingert commented May 26, 2022 •

edited

Loading

graingert commented May 26, 2022 •

edited

Loading

graingert commented May 26, 2022 •

edited

Loading