Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimum dependency testing to CI #7285

Merged
merged 67 commits into from
Jan 5, 2023
Merged

Conversation

charlesbluca
Copy link
Member

Continuing @jrbourbeau's work in #4794, this PR adds a new check that runs the test suite against the minimum versions of Distributed's dependencies, which should help us to identify if/when deps need to be updated.

  • Tests added / passed
  • Passes pre-commit run --all-files

jrbourbeau and others added 30 commits May 3, 2021 13:17
Comment on lines 67 to 70
except Exception:
# FIXME is this possible?
# FIXME occurs when psutil version doesn't have handling for given platform / kernel;
# should we explicitly error in this case?
monitor_disk_io = False # pragma: nocover
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging into the failures on distributed/tests/test_system_monitor.py::test_disk_config, it looks like this exception gets raised when psutil doesn't support the given platform/kernel (in our case, 5.6.3 didn't have support for Linux 5.5+).

Given this knowledge, would it make sense to raise a warning here to let the user know that I/O monitoring will be disabled because of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice detective work! I could definitely see how emitting a warning here could help avoid user confusion. Normally, I'd defer to you as to if you'd like to include that new warning/test here or in a separate PR. But given the diff here is already pretty big, and adding a new warning seems more like an (reasonable) enhancement rather than directly related to the mindeps CI build, how about we tackle it in a separate PR?

@fjetter
Copy link
Member

fjetter commented Nov 24, 2022

I took the liberty of fixing merge conflicts in requirements.txt after merging #7286

@charlesbluca is there anything else left to do?

@charlesbluca
Copy link
Member Author

Thanks @fjetter!

is there anything else left to do?

Yeah, there are two remaining tests that seem to be failing consistently with mindeps that I haven't been able to debug:

  • distributed/deploy/tests/test_spec_cluster.py::test_adaptive_killed_worker
  • distributed/deploy/tests/test_spec_cluster.py::test_broken_worker

Beyond these failures, it would probably be worthwhile to dig through the files that pytest is currently skipping to see if there are any tests that can be run with minimal dependencies to increase the testing coverage, but considering the diff here is pretty large I'm okay with pushing that off to a follow-up PR.

@fjetter
Copy link
Member

fjetter commented Dec 1, 2022

Yeah, there are two remaining tests that seem to be failing consistently with mindeps that I haven't been able to debug:

I think both test cases have the same root cause. This reminds me of failures I encountered in #7323 I suggest to merge that one first (still WIP) and then see if the issue persists. I don't think this is necessarily related to the mindeps but maybe the dependencies introduced different timings making the race conditions I'm fixing over there more likely

Beyond these failures, it would probably be worthwhile to dig through the files that pytest is currently skipping to see if there are any tests that can be run with minimal dependencies to increase the testing coverage, but considering the diff here is pretty large I'm okay with pushing that off to a follow-up PR.

Yes, that's definitely out of scope

@charlesbluca
Copy link
Member Author

Looks like the current changes in #7323 resolves the failures 🎉 once that's finished and merged, we can revisit this PR to see if there are any remaining tests to resolve

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work here @charlesbluca. I'm inclined to temporarily skip/xfail those failing tests in order to get this PR in. We can then un-skip/un-xfail once #7323 is in.

Here's another issue where a mindeps build would have been useful #7416

jrbourbeau and others added 2 commits December 19, 2022 15:22
@mrocklin
Copy link
Member

mrocklin commented Jan 3, 2023

It looks like this has been going on for a while. @charlesbluca thank you for continuing to track this. Is there anything we can do to get some variant of this in quickly so that you don't have to work to keep this from going stale?

@charlesbluca
Copy link
Member Author

Is there anything we can do to get some variant of this in quickly so that you don't have to work to keep this from going stale?

Was going to go with @jrbourbeau's suggestion and just xfail the few remaining tests that should be resolved with #7323

Comment on lines +215 to +218
@pytest.mark.xfail(
os.environ.get("MINDEPS") == "true",
reason="Timeout errors with mindeps environment",
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it wasn't immediately obvious which package(s) were causing this remaining failure, I opted to xfail when matrix.environment == 'mindeps' in the GHA run; this means that to run mindeps tests locally, one would need to prepend the pytest command with MINDEPS=true.

@fjetter
Copy link
Member

fjetter commented Jan 4, 2023

#7323 might not get finished after all. I will have another final look at it but it proved to be incredibly difficult to converge and a fix might require some significant refactoring that will be out of scope for that PR

If a single/few tests are failing on this (that are not failing elsewhere) we should probably just skip them for the mindep build. #7323 should not block this

@charlesbluca
Copy link
Member Author

we should probably just skip them for the mindep build

Yeah it looks like the one remaining failure here is distributed/tests/test_nanny.py::test_nanny_timeout, which I've gone ahead and xfailed when we're running mindeps testing. The remaining failures are unrelated, so this should be ready for review

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Thank you @charlesbluca

@fjetter fjetter merged commit 3619923 into dask:main Jan 5, 2023
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo! It's great to see this included -- thanks for all your work here @charlesbluca!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants