-
-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI job running tests with queuing on #6989
Conversation
we have to make the tasks to be stolen not look like root tasks.
There are a lot more race conditions with queuing
these should not be failing and we need to evaluate why they are
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 5h 53m 29s ⏱️ - 16m 4s For more details on these failures, see this check. Results for commit 33e4995. ± Comparison against base commit 1818788. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this new oversaturate_only
mark, couldn't you simply have
OVERSATURATION = math.isinf(dask.config.get("distributed.scheduler.worker-saturation"))
@pytest.mark.flaky(not OVERSATURATION, reason="flaky on MacOS")
Tests that don't make sense with queueing should instead force
config={"distributed.scheduler.worker-saturation": math.inf}
(some already do)
.github/workflows/tests.yaml
Outdated
@@ -44,8 +53,8 @@ jobs: | |||
# run: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] | |||
|
|||
env: | |||
TEST_ID: ${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.partition-label }} | |||
# TEST_ID: ${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.partition-label }}-${{ matrix.run }} | |||
TEST_ID: ${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.partition-label }}-${{ matrix.queuing }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this renaming will break some logic in our test report generation.
distributed/continuous_integration/scripts/test_report.py
Lines 146 to 152 in acf6078
df_jobs["suite_name"] = ( | |
df_jobs["OS"] | |
+ "-" | |
+ df_jobs["python_version"] | |
+ "-" | |
+ df_jobs["partition"].str.replace(" ", "") | |
) |
distributed/continuous_integration/scripts/test_report.py
Lines 353 to 357 in acf6078
html_url = jobs_df[jobs_df["suite_name"] == a["name"]].html_url.unique() | |
assert ( | |
len(html_url) == 1 | |
), f"Artifact suit name {a['name']} did not match any jobs dataframe {jobs_df['suite_name'].unique()}" | |
html_url = html_url[0] |
I introduced this in #6837
Worst case, we remove the more specific job URL again (we knew from the start this is very brittle)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should block this PR
Pulled out from dask#6989. This minor refactor makes it easier to add other config options in the future. It also ensure that the `ws` marker is added even when `--runslow` is given.
This reverts commit 0f9034a.
GitHub didn't like setting `partition-label` in two different ways; it showed up in two different places in the job name and breaks the test report script. Instead, we can set the `TEST_ID` variable as a step in the job, which is cleaner anyway IMO.
Ready for re-review and merge. |
Thank you! |
Just to confirm, is the new CI job added here for testing purposes and is planned to be removed in the future, or is this intended as a long-term change? |
Long term, we'll want to enable scheduler-side queueing by default. |
Adds a CI job, only for Ubuntu and py3.9, running the test suite with
worker-saturation: 1.0
(instead ofinf
).Also adds a
@pytest.mark.oversaturate_only
marker, which automatically skips marked tests if queuing is enabled.A surprisingly small number of tests actually failed when queuing was turned on. I've updated the ones that did as needed. I've tried as much as possible to just make the tests agnostic to queuing. In a few cases though, I've needed to add explicit conditional logic. And in a couple cases, I've marked tests to be skipped when queuing is enabled, since they may not make sense.
Unfortunately there are three tests that are flaky under queuing, which could indicate an actual bug. They're skipped for now, but should be investigated.
Closes #6631
pre-commit run --all-files