-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we be testing against multiple dask schedulers? #1971
Comments
Huh, that's interesting. Yes, I suppose should at least consider parametric tests using both dask's multithreaded and distributed schedulers. Though I'll note that for test we actually set the default scheduler to dask's basic non-parallelized get, for easier debugging: xarray/xarray/tests/__init__.py Line 87 in 54468e1
For #1793, the key thing would be to ensure that we run the tests in the isolated context without changing the default scheduler. |
I managed to dig up some more information here. I was having a test failure in
From then on we were using the distributed scheduler and any tests that used dask resulted in a additional timeout (or similar error). Unfortunately, my attempts to provide a mcve have come up short. If I can come up with one, I'll report upstream but as it is, I can't really transfer this behavior outside of my example. cc @mrocklin |
FWIW most of the logic within the dask collections (array, dataframe, delayed) is only tested with Obviously though for things like writing to disk it's useful to check different schedulers. |
Seems like the distributed scheduler is the advised one to use in general, so maybe some tests could be added for this one. For sure for diskIO, would be interesting to see the difference. http://dask.pydata.org/en/latest/setup.html
|
Closing this now. The distributed integration test module seems to be covering our IO use cases well enough. I don't think we need to do anything here at this time. |
Almost all of our unit tests are against the dask's default scheduler (usually dask.threaded). While it is true that beauty of dask is that one can separate the scheduler from the logical implementation, there are a few idiosyncrasies to consider, particularly in xarray's backends. To that end, we have a few tests covering the integration of the distributed scheduler with xarray's backends but the test coverage is not particularly complete.
If nothing more, I think it is worth considering tests that use the threaded, multiprocessing, and distributed schedulers for a larger subset of the backends tests (those that use dask).
Note, I'm bringing this up because I'm seeing some failing tests in #1793 that are unrelated to my code change but do appear to be related to dask and possibly a different different default scheduler (example failure).
The text was updated successfully, but these errors were encountered: