Skip to content
This repository has been archived by the owner on Feb 10, 2021. It is now read-only.

test_stop_single_worker failing on CI #93

Open
jakirkham opened this issue Jun 26, 2018 · 4 comments
Open

test_stop_single_worker failing on CI #93

jakirkham opened this issue Jun 26, 2018 · 4 comments

Comments

@jakirkham
Copy link
Member

Noticed that test_stop_single_worker has started failing on CI. Seems to be a consistent failure. However it wasn't failing on the same code a month ago. So something else has changed (possibly in Distributed) related to how Dask worker directories are managed.

=================================== FAILURES ===================================
___________________________ test_stop_single_worker ____________________________
loop = <tornado.platform.asyncio.AsyncIOLoop object at 0x7f270bb33978>
    def test_stop_single_worker(loop):
        with DRMAACluster(scheduler_port=0) as cluster:
            with Client(cluster, loop=loop) as client:
                cluster.start_workers(2)
                future = client.submit(lambda x: x + 1, 1)
                assert future.result() == 2
                while len(client.ncores()) < 2:
                    sleep(0.1)
    
                a, b = cluster.workers
                local_dir = client.run(lambda dask_worker: dask_worker.local_dir,
                                       workers=[a])[a]
                assert os.path.exists(local_dir)
    
                cluster.stop_workers(a)
                start = time()
                while len(client.ncores()) != 1:
                    sleep(0.2)
                    assert time() < start + 60
>       assert not os.path.exists(local_dir)
E       AssertionError: assert not True
E        +  where True = <function exists at 0x7f27547ba950>('/dask-drmaa/dask-worker-space/worker-c7271lb9')
E        +    where <function exists at 0x7f27547ba950> = <module 'posixpath' from '/opt/anaconda/lib/python3.6/posixpath.py'>.exists
E        +      where <module 'posixpath' from '/opt/anaconda/lib/python3.6/posixpath.py'> = os.path
dask_drmaa/tests/test_core.py:128: AssertionError
=============== 1 failed, 23 passed, 1 xpassed in 120.47 seconds ===============

ref: https://travis-ci.org/dask/dask-drmaa/jobs/396654145#L2188-L2219
ref: https://travis-ci.org/dask/dask-drmaa/builds/385798967

@jakirkham
Copy link
Member Author

Any ideas as to why this failure might have cropped up, @mrocklin?

@mrocklin
Copy link
Member

mrocklin commented Jun 26, 2018 via email

@jakirkham
Copy link
Member Author

Now remembering this test has been historically flaky. ( #60 ) So it may be there is something that dask-drmaa is doing incorrectly that is now being flushed out thanks to a distributed change.

@jakirkham
Copy link
Member Author

Going to mark this as a known failure for now. ( #94 )

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants