Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky distributed/diagnostics/tests/test_task_stream.py::test_client_sync #6820

Open
gjoseph92 opened this issue Aug 3, 2022 · 1 comment
Labels
flaky test Intermittent failures on CI.

Comments

@gjoseph92
Copy link
Collaborator

Rather odd:

_______________________________ test_client_sync _______________________________

client = <Client: 'tcp://127.0.0.1:55669' processes=2 threads=2, memory=28.00 GiB>

    def test_client_sync(client):
        with get_task_stream(client=client) as ts:
            sleep(0.1)  # to smooth over time differences on the scheduler
            # to smooth over time differences on the scheduler
            futures = client.map(inc, range(10))
            wait(futures)
    
>       assert len(ts.data) == 10
E       AssertionError: assert 9 == 10
E        +  where 9 = len([{'key': 'inc-aa9589d43f33371b09a8b12fb0f1f11d', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464678.2094538, 'stop': 1659464678.2094707},), ...}, {'key': 'inc-03d935909bba38f9a49655e867cbf56a', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464676.999707, 'stop': 1659464676.999717},), ...}, {'key': 'inc-e006d6ed622cf1bdfe77fadcd5aa8540', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464678.2119305, 'stop': 1659464678.2119415},), ...}, {'key': 'inc-60d23875ec5497d5404aad1ce8fcd252', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464678.2139337, 'stop': 1659464678.2139456},), ...}, {'key': 'inc-deebfb8e8b05bf230e909b88993dc421', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464677.002106, 'stop': 1659464677.002117},), ...}, {'key': 'inc-31f04a272f144bda4ac0c5f15ba8b12d', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464678.2161546, 'stop': 1659464678.2161655},), ...}, ...])
E        +    where [{'key': 'inc-aa9589d43f33371b09a8b12fb0f1f11d', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464678.2094538, 'stop': 1659464678.2094707},), ...}, {'key': 'inc-03d935909bba38f9a49655e867cbf56a', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464676.999707, 'stop': 1659464676.999717},), ...}, {'key': 'inc-e006d6ed622cf1bdfe77fadcd5aa8540', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464678.2119305, 'stop': 1659464678.2119415},), ...}, {'key': 'inc-60d23875ec5497d5404aad1ce8fcd252', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464678.2139337, 'stop': 1659464678.2139456},), ...}, {'key': 'inc-deebfb8e8b05bf230e909b88993dc421', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464677.002106, 'stop': 1659464677.002117},), ...}, {'key': 'inc-31f04a272f144bda4ac0c5f15ba8b12d', 'metadata': {}, 'nbytes': 28, 'startstops': ({'action': 'compute', 'start': 1659464678.2161546, 'stop': 1659464678.2161655},), ...}, ...] = <distributed.client.get_task_stream object at 0x7fe9fe18fac0>.data

distributed/diagnostics/tests/test_task_stream.py:131: AssertionError
---------------------------- Captured stderr setup -----------------------------
2022-08-02 18:24:34,676 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2022-08-02 18:24:34,688 - distributed.scheduler - INFO - State start
2022-08-02 18:24:34,708 - distributed.scheduler - INFO - Clear task state
2022-08-02 18:24:34,710 - distributed.scheduler - INFO -   Scheduler at:     tcp://127.0.0.1:55669
2022-08-02 18:24:34,710 - distributed.scheduler - INFO -   dashboard at:            127.0.0.1:8787
2022-08-02 18:24:34,762 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:55670
2022-08-02 18:24:34,762 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:55670
2022-08-02 18:24:34,762 - distributed.worker - INFO -          dashboard at:            127.0.0.1:55672
2022-08-02 18:24:34,762 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:55669
2022-08-02 18:24:34,763 - distributed.worker - INFO - -------------------------------------------------
2022-08-02 18:24:34,763 - distributed.worker - INFO -               Threads:                          1
2022-08-02 18:24:34,763 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:55671
2022-08-02 18:24:34,763 - distributed.worker - INFO -                Memory:                  14.00 GiB
2022-08-02 18:24:34,763 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:55671
2022-08-02 18:24:34,763 - distributed.worker - INFO -       Local Directory: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/dask-worker-space/worker-oe8vmoju
2022-08-02 18:24:34,763 - distributed.worker - INFO -          dashboard at:            127.0.0.1:55673
2022-08-02 18:24:34,763 - distributed.worker - INFO - -------------------------------------------------
2022-08-02 18:24:34,763 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:55669
2022-08-02 18:24:34,763 - distributed.worker - INFO - -------------------------------------------------
2022-08-02 18:24:34,763 - distributed.worker - INFO -               Threads:                          1
2022-08-02 18:24:34,763 - distributed.worker - INFO -                Memory:                  14.00 GiB
2022-08-02 18:24:34,763 - distributed.worker - INFO -       Local Directory: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/dask-worker-space/worker-fp9p2_hp
2022-08-02 18:24:34,764 - distributed.worker - INFO - -------------------------------------------------
2022-08-02 18:24:35,742 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:55670', status: init, memory: 0, processing: 0>
2022-08-02 18:24:36,954 - distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:55670
2022-08-02 18:24:36,954 - distributed.core - INFO - Starting established connection
2022-08-02 18:24:36,955 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:55669
2022-08-02 18:24:36,955 - distributed.worker - INFO - -------------------------------------------------
2022-08-02 18:24:36,957 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:55671', status: init, memory: 0, processing: 0>
2022-08-02 18:24:36,958 - distributed.core - INFO - Starting established connection
2022-08-02 18:24:36,958 - distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:55671
2022-08-02 18:24:36,958 - distributed.core - INFO - Starting established connection
2022-08-02 18:24:36,959 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:55669
2022-08-02 18:24:36,959 - distributed.worker - INFO - -------------------------------------------------
2022-08-02 18:24:36,962 - distributed.core - INFO - Starting established connection
2022-08-02 18:24:36,990 - distributed.scheduler - INFO - Receive client connection: Client-5d63f5e4-1290-11ed-80be-005056a02c6f
2022-08-02 18:24:36,991 - distributed.core - INFO - Starting established connection
--------------------------- Captured stderr teardown ---------------------------
2022-08-02 18:24:37,150 - distributed.scheduler - INFO - Remove client Client-5d63f5e4-1290-11ed-80be-005056a02c6f
2022-08-02 18:24:37,152 - distributed.scheduler - INFO - Remove client Client-5d63f5e4-1290-11ed-80be-005056a02c6f
2022-08-02 18:24:37,154 - distributed.scheduler - INFO - Close client connection: Client-5d63f5e4-1290-11ed-80be-005056a02c6f
2022-08-02 18:24:37,163 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:55670
2022-08-02 18:24:37,165 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:55671
2022-08-02 18:24:37,166 - distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:55670', status: closing, memory: 0, processing: 0>
2022-08-02 18:24:37,166 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting. ID: Worker-ed4031d8-90c3-4933-8ab6-e0b2a1dd9455 Address tcp://127.0.0.1:55670 Status: Status.closing
2022-08-02 18:24:37,166 - distributed.core - INFO - Removing comms to tcp://127.0.0.1:55670
2022-08-02 18:24:37,168 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting. ID: Worker-764f3ebe-68cb-4e4e-b26b-eba45a2ad968 Address tcp://127.0.0.1:55671 Status: Status.closing
2022-08-02 18:24:37,168 - distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:55671', status: closing, memory: 0, processing: 0>
2022-08-02 18:24:37,168 - distributed.core - INFO - Removing comms to tcp://127.0.0.1:55671
2022-08-02 18:24:37,168 - distributed.scheduler - INFO - Lost all workers
2022-08-02 18:24:37,173 - distributed.scheduler - INFO - Scheduler closing...
2022-08-02 18:24:37,175 - distributed.scheduler - INFO - Scheduler closing all comms

https://github.com/dask/distributed/runs/7637502481?check_suite_focus=true#step:11:1909

@gjoseph92 gjoseph92 added the flaky test Intermittent failures on CI. label Aug 3, 2022
@GenevieveBuckley
Copy link
Contributor

A bit of a wild guess, but could this be somehow related? Our friends over at napari have ended up creating some race conditions by chaining multiple dask arrays/computations together:
haesleinhuepf/napari-time-slicer#6 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test Intermittent failures on CI.
Projects
None yet
Development

No branches or pull requests

2 participants