Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI job running tests with queuing on #6989

Merged
merged 21 commits into from
Sep 5, 2022
Merged

Conversation

gjoseph92
Copy link
Collaborator

Adds a CI job, only for Ubuntu and py3.9, running the test suite with worker-saturation: 1.0 (instead of inf).

Also adds a @pytest.mark.oversaturate_only marker, which automatically skips marked tests if queuing is enabled.

A surprisingly small number of tests actually failed when queuing was turned on. I've updated the ones that did as needed. I've tried as much as possible to just make the tests agnostic to queuing. In a few cases though, I've needed to add explicit conditional logic. And in a couple cases, I've marked tests to be skipped when queuing is enabled, since they may not make sense.

Unfortunately there are three tests that are flaky under queuing, which could indicate an actual bug. They're skipped for now, but should be investigated.

Closes #6631

  • Tests added / passed
  • Passes pre-commit run --all-files

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±0         15 suites  ±0   5h 53m 29s ⏱️ - 16m 4s
  3 082 tests ±0    2 996 ✔️  -   1    85 💤 +  1  1 ±0 
22 808 runs  +6  21 895 ✔️ +79  912 💤  - 73  1 ±0 

For more details on these failures, see this check.

Results for commit 33e4995. ± Comparison against base commit 1818788.

♻️ This comment has been updated with latest results.

Copy link
Collaborator

@crusaderky crusaderky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this new oversaturate_only mark, couldn't you simply have

OVERSATURATION = math.isinf(dask.config.get("distributed.scheduler.worker-saturation"))
@pytest.mark.flaky(not OVERSATURATION, reason="flaky on MacOS")

Tests that don't make sense with queueing should instead force

config={"distributed.scheduler.worker-saturation": math.inf}

(some already do)

@@ -44,8 +53,8 @@ jobs:
# run: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

env:
TEST_ID: ${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.partition-label }}
# TEST_ID: ${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.partition-label }}-${{ matrix.run }}
TEST_ID: ${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.partition-label }}-${{ matrix.queuing }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this renaming will break some logic in our test report generation.

df_jobs["suite_name"] = (
df_jobs["OS"]
+ "-"
+ df_jobs["python_version"]
+ "-"
+ df_jobs["partition"].str.replace(" ", "")
)

html_url = jobs_df[jobs_df["suite_name"] == a["name"]].html_url.unique()
assert (
len(html_url) == 1
), f"Artifact suit name {a['name']} did not match any jobs dataframe {jobs_df['suite_name'].unique()}"
html_url = html_url[0]

I introduced this in #6837

Worst case, we remove the more specific job URL again (we knew from the start this is very brittle)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should block this PR

gjoseph92 added a commit to gjoseph92/distributed that referenced this pull request Sep 2, 2022
Pulled out from dask#6989. This minor refactor makes it easier to add other config options in the future. It also ensure that the `ws` marker is added even when `--runslow` is given.
GitHub didn't like setting `partition-label` in two different ways; it showed up in two different places in the job name and breaks the test report script. Instead, we can set the `TEST_ID` variable as a step in the job, which is cleaner anyway IMO.
@gjoseph92
Copy link
Collaborator Author

  • Removed the oversaturate_only mark; just using pytet.mark.skipif and config={"distributed.scheduler.worker-saturation": math.inf} as necessary

  • Fixed setting the environment variable in the test step

  • Updated test report script to handle and display results from the queue job:

    visualization

Ready for re-review and merge.

@gjoseph92
Copy link
Collaborator Author

@crusaderky crusaderky merged commit b133009 into dask:main Sep 5, 2022
@crusaderky
Copy link
Collaborator

Thank you!

@jrbourbeau
Copy link
Member

Just to confirm, is the new CI job added here for testing purposes and is planned to be removed in the future, or is this intended as a long-term change?

@crusaderky
Copy link
Collaborator

Long term, we'll want to enable scheduler-side queueing by default.
When that happens, we'll need to revisit the unit tests to have selected few tests with queueing off (like it already happens with all other config toggles) and revert this PR.
Before that, all tests that are flaky only when queueing is on need to be addressed.

gjoseph92 added a commit to gjoseph92/distributed that referenced this pull request Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Root-task withholding without co-assignment
4 participants