Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should utils_test.py be considered public API? #6806

Open
hendrikmakait opened this issue Jul 27, 2022 · 14 comments
Open

Should utils_test.py be considered public API? #6806

hendrikmakait opened this issue Jul 27, 2022 · 14 comments
Labels
discussion Discussing a topic with no specific actions yet

Comments

@hendrikmakait
Copy link
Member

We currently expose a number of functions and fixtures in distributed.utils_test. While I would consider these implementation details of our tests, and not part of any stable user-facing API, other projects, including dask/dask rely on them for their test suites (https://github.com/dask/dask/blob/ac74939697c328bd6456aa9fad7d9520328db6e5/dask/tests/test_distributed.py#L11-L14, #5300, #6775, #6802 (review)). This creates friction between the stability requirements of downstream users and our ability to adjust the contents of utils_test to fit our internal testing needs. For example, we cannot easily move functionality around without breaking things for downstream users (#6802 (review)).

This leads me to several questions:
Should the contents of utils_test.py be considered part of the public API?

If so:

  • What guarantees are we willing to make around API stability?
    • How do we want to test/enforce those?
  • Which parts should be considered public API?
    • For example, should we only expose context managers/functions, but not fixtures?

If not:

  • How do we make this clear and discourage users from importing them?
  • How do we deal with projects that already import them?
@hendrikmakait hendrikmakait added the discussion Discussing a topic with no specific actions yet label Jul 27, 2022
@hendrikmakait
Copy link
Member Author

cc: @jrbourbeau, @graingert

@hendrikmakait
Copy link
Member Author

Apart from the question whether we should consider the contents of distributed.utils_test public API, I'd argue that the module itself suggests that it is. Compare these two alternatives:

  • distributed.utils_test sounds like something we ship to help with tests.
  • distributed.tests.utils sounds like test helpers for our own sake.

@jrbourbeau
Copy link
Member

Thanks for raising this @hendrikmakait. While I definitely don't think all the contents of utils_test.py should be public, there are some methods like gen_cluster, and possibly the cluster context manager (maybe?), that are used in several other projects and I think historically have been viewed as public. For example, xarray, streamz, dask, dask-ml all use gen_cluster (I'm sure there are more than I'm not aware of). I've personally found gen_cluster to be very convenient when writing tests for projects that use Dask clusters

This creates friction between the stability requirements of downstream users and our ability to adjust the contents of utils_test to fit our internal testing needs

Do you have a sense for how much friction there is here? To be clear, I think being more intentional about what we consider public is a good thing. I'm just wondering how big of a problem this is today

@mrocklin
Copy link
Member

I wouldn't consider them to be public to users necessarily, but certainly they're public to other dask libraries and other friends-of-dask like Xarray. From my perspective those libraries are within the bounds of Dask. If we change internals in utils_test then I would expect us to also reach out to and fix their use elsewhere, or at least collaborate with those libraries.

@hendrikmakait
Copy link
Member Author

Do you have a sense for how much friction there is here?

Honestly, I can't really quantify this. What I can say is that I have stumbled across this twice this week and when talking to other developers they were either unhappy with the situation or uncertain about answering that question.

It sounds like we might profit from moving most of the functions and fixtures elsewhere and reimporting those that others rely on back into utils_test.py, and maybe make sure that we properly document those as well as adding a module-scope docstring describing who this is intended for? This might clear things up without too much effort.

@graingert also might have some interesting thoughts on all this.

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Aug 1, 2022

Given that we rely on these things in a bunch of places I would argue that today this is effectively a public API and we should treat it as such (no breaks, use deprecations, etc). However there is probably way more in there than there should be, so maybe we should try and be more intentional about what we expose going forwards.

Other libraries I've used recently like kopf explicitly have a kopf.testing submodule where you can grab test utils to use in your own tests. Perhaps we should shift more towards that and be more explicit. Then put everything else somewhere like distributed._utils_test so that it is clearly private.

@hendrikmakait
Copy link
Member Author

@jacobtomlinson: Thanks for your input and for linking kopf.testing. That's pretty much what I had in mind with my comment above.

@fjetter
Copy link
Member

fjetter commented Aug 4, 2022

+1 for a dedicated testing module with a limited public API and the deprecation of utils_test

@graingert
Copy link
Member

My usecase for these testing utilities is kind of subtle,

I like to be able to make changes to the testing infrastructure to increase strictness
I also like to be able to push that increase in strictness into friends-of-dask and see if I broke them also so I can see if there's any bits of distributed are being used that break that strictness but isn't getting tested

eg we now test that it's possible to run asyncio.run(something_in_distributed()); asyncio.run(something_in_distributed()); and that change is breaking friends-of-dask because code in distributed wasn't actually compatible with that pattern when used in a legitimate way I didn't anticipate

@graingert
Copy link
Member

As an aside it's really painful for us to add new fixture dependencies to our 'exported' fixtures because they need to be re-imported completely in a way that appears to violate flake8. Which is why I'd like this new API to use contextmanagers in place of fixtures

@fjetter
Copy link
Member

fjetter commented Aug 4, 2022

Which is why I'd like this new API to use contextmanagers in place of fixtures

I also support this. I don't consider exporting fixtures a "sane" API. It's way too difficult to control imo.

@graingert
Copy link
Member

graingert commented Aug 4, 2022

I wouldn't consider them to be public to users necessarily, but certainly they're public to other dask libraries and other friends-of-dask like Xarray. From my perspective those libraries are within the bounds of Dask. If we change internals in utils_test then I would expect us to also reach out to and fix their use elsewhere, or at least collaborate with those libraries.

This reflects my view, I don't think we should attempt all that much stability in this API but we should still encourage people to use it for their CI.

If you use it in CI, when we make a change:

  • it will probably cause your test suite to fail
  • but any failure we cause should be actionable
  • and we should come help you fix it
  • and most importantly you should always be able to copy paste an old version to unblock any PRs and issues with higher priority

@hendrikmakait
Copy link
Member Author

Agreed, @graingert. What I would like is some distinction between a smaller module where changes might break stuff for others and a larger module where things will only break distributed. This would reduce mental load and I can make a conscious decision about introducing breaking changes (and consciously breaking things is a very valid decision here). Right now, I feel like we follow the rule: "If everything is public, then nothing (really) is."

+1 on getting rid of fixtures in the public API.

@graingert
Copy link
Member

graingert commented Aug 5, 2022

We should also help friends-of-dask enable filterwarnings=error so they are notified about these deprecations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussing a topic with no specific actions yet
Projects
None yet
Development

No branches or pull requests

6 participants