Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky test_close_gracefully and test_lifetime #5677

Merged
merged 6 commits into from
Jan 25, 2022

Conversation

crusaderky
Copy link
Collaborator

Closes #4862

@crusaderky crusaderky self-assigned this Jan 21, 2022
@crusaderky crusaderky marked this pull request as draft January 21, 2022 12:13
futures = c.map(slowinc, range(200), delay=0.1, worker=[b.address])
await asyncio.sleep(1.5)
assert b.status not in (Status.running, Status.paused)
futures = c.map(slowinc, range(200), delay=0.1, workers=[b.address])
Copy link
Collaborator Author

@crusaderky crusaderky Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix spelling error worker->workers! All tasks were failing before!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like await next(as_completed(futures)) would protect us there and may remove the necessity for sleep below. I'm fine without it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised

@crusaderky crusaderky added the flaky test Intermittent failures on CI. label Jan 21, 2022
crusaderky added a commit to crusaderky/distributed that referenced this pull request Jan 21, 2022
@crusaderky crusaderky marked this pull request as ready for review January 21, 2022 14:20
async with Worker(s.address) as a, Worker(s.address, lifetime="2 seconds") as b:
futures = c.map(slowinc, range(200), delay=0.1, workers=[b.address])
await asyncio.sleep(1)
assert not a.data
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After fixing the misspelling above, the test was occasionally failing this assertion with lifetime="1 seconds" and sleep(0.5)

crusaderky added a commit to crusaderky/distributed that referenced this pull request Jan 21, 2022
@crusaderky
Copy link
Collaborator Author

All test failures are unrelated; ready for review and merge

proc = {ts for ts in b.tasks.values() if ts.state == "executing"}
assert proc

assert any(ts for ts in b.tasks.values() if ts.state == "executing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a tiny time window possible where no task is in executing state until the next ensure_computing is called

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

a.story(key),
[
(key, "put-in-memory"),
(key, "receive-from-scatter"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a scatter event?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legacy (pre-AMM) Scheduler.gather_on_worker, which underlies both Scheduler.replicate and Scheduler.retire_workers

futures = c.map(slowinc, range(200), delay=0.1, worker=[b.address])
await asyncio.sleep(1.5)
assert b.status not in (Status.running, Status.paused)
futures = c.map(slowinc, range(200), delay=0.1, workers=[b.address])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like await next(as_completed(futures)) would protect us there and may remove the necessity for sleep below. I'm fine without it

@crusaderky
Copy link
Collaborator Author

All review comments have been incorporated.
All test failures are unrelated.

@fjetter fjetter merged commit 0f6a4df into dask:main Jan 25, 2022
@crusaderky crusaderky deleted the AMM/test_close_gracefully branch January 25, 2022 13:44
crusaderky added a commit to crusaderky/distributed that referenced this pull request Jan 28, 2022
commit 14bddba1482de035ca416368f918d94b115d3660
Merge: df079fc 9a66b71
Author: crusaderky <crusaderky@gmail.com>
Date:   Tue Jan 25 17:16:58 2022 +0000

    Merge branch 'main' into AMM/RetireWorker

commit df079fc
Merge: 9676934 5835ce3
Author: crusaderky <crusaderky@gmail.com>
Date:   Mon Jan 24 15:08:51 2022 +0000

    Merge branch 'AMM/staging' into AMM/RetireWorker

commit 5835ce3
Merge: ee68b02 b0b8e95
Author: crusaderky <crusaderky@gmail.com>
Date:   Mon Jan 24 15:07:27 2022 +0000

    Merge branch 'AMM/test_close_gracefully' into AMM/staging

commit ee68b02
Merge: 7d4e6ee ccad288
Author: crusaderky <crusaderky@gmail.com>
Date:   Mon Jan 24 15:06:50 2022 +0000

    Merge branch 'main' into AMM/staging

commit b0b8e95
Author: crusaderky <crusaderky@gmail.com>
Date:   Mon Jan 24 14:32:29 2022 +0000

    Code review

commit 9676934
Author: crusaderky <crusaderky@gmail.com>
Date:   Fri Jan 21 14:50:53 2022 +0000

    AMM to manage retire_workers()

commit 7d4e6ee
Author: crusaderky <crusaderky@gmail.com>
Date:   Fri Jan 21 14:39:41 2022 +0000

    Fix flaky test_close_gracefully and test_lifetime (dask#5677)

commit 7faab51
Author: crusaderky <crusaderky@gmail.com>
Date:   Fri Jan 21 14:39:41 2022 +0000

    harden test

commit aef3b71
Author: crusaderky <crusaderky@gmail.com>
Date:   Fri Jan 21 14:29:10 2022 +0000

    Increase resilience on slow CI

commit af84e40
Author: crusaderky <crusaderky@gmail.com>
Date:   Fri Jan 21 12:39:18 2022 +0000

    Dump cluster state on all test failures (dask#5674)

commit 5054c19
Author: crusaderky <crusaderky@gmail.com>
Date:   Fri Jan 21 12:38:38 2022 +0000

    Paused workers shouldn't steal tasks (dask#5665)

commit eadb35f
Author: crusaderky <crusaderky@gmail.com>
Date:   Fri Jan 21 12:37:48 2022 +0000

    transition flight to missing if no who_has (dask#5653)

commit 581aee8
Merge: 940bb45 60c0d60
Author: crusaderky <crusaderky@gmail.com>
Date:   Fri Jan 21 12:36:09 2022 +0000

    Merge branch 'main' into AMM/test_close_gracefully

commit 940bb45
Author: crusaderky <crusaderky@gmail.com>
Date:   Fri Jan 21 12:20:10 2022 +0000

    tweak comment

commit 731d132
Author: crusaderky <crusaderky@gmail.com>
Date:   Fri Jan 21 12:12:03 2022 +0000

    Fix flaky test_close_gracefully and test_lifetime
gjoseph92 pushed a commit to gjoseph92/distributed that referenced this pull request Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test Intermittent failures on CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky distributed/tests/test_worker.py::test_lifetime
2 participants