Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize rolling dask #311

Merged
merged 13 commits into from
Dec 5, 2019
Merged

Optimize rolling dask #311

merged 13 commits into from
Dec 5, 2019

Conversation

aulemahal
Copy link
Collaborator

@aulemahal aulemahal commented Nov 26, 2019

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • HISTORY.rst has been updated

After merging to master and before closing the branch:

  • bumpversion (minor / major / patch) has been called on master branch
  • Tags have been pushed
  • What kind of change does this PR introduce?
    Create a new util function rolling to optimize the rolling sums and means over dask arrays.
    Sum and mean are implemented as they both use the same numpy.cumsum trick. Other rolling operations (min, max, std) could be implemented later on.

The function does the computation lazily, which xarray's rolling() didn't. It defaults to xarray's version is the data is not in a dask array as it uses a special dask function : dask.Array.map_overlap().

This function is faster than xarray's even for datasets with only one chunk along the rolled dimension. And it solves MemoryErrors when using huge datasets.

I did some Jedi tricks to get the same behaviour as xarray's version when NaNs are encountered. Without the second np.cumsum, we can reduce the computation time by 30-50%, but NaNs will be treated as zeros.

  • Does this PR introduce a breaking change?
    No

  • Other information:
    I called the function xc.utils.rolling, but in comparison to xr.DataArray.rolling, it actually does the sum or mean. So same name, different behaviour.

@aulemahal aulemahal added the enhancement New feature or request label Nov 26, 2019
@aulemahal aulemahal self-assigned this Nov 26, 2019
@coveralls
Copy link

coveralls commented Nov 26, 2019

Pull Request Test Coverage Report for Build 1318

  • 52 of 52 (100.0%) changed or added relevant lines in 6 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 91.304%

Totals Coverage Status
Change from base Build 1316: 0.2%
Covered Lines: 6384
Relevant Lines: 6992

💛 - Coveralls

@aulemahal
Copy link
Collaborator Author

So, it finally wasn't that hard. I added a function that takes care of rolling operation other than sum and mean. It acts the same way as the conventional rolling : creates a new dimension and copies* data so that the rolling op can be applied along that new dim. But it still does this chunk-wise and is lazy, thus not going into a MemoryError on huge datasets.

  • I believe it returns a view of the values, not actually copying them.

Also, this can accept any function, not only the numpy ones.

@huard
Copy link
Collaborator

huard commented Nov 26, 2019

Questions: Can this be somehow ported to xarray ?

@aulemahal
Copy link
Collaborator Author

aulemahal commented Nov 27, 2019

Yes and no, as far as I understang xarray's code. The whole "rolling" module is completely exempt to any explicit call to dask. I don't really see any elegant way to add this code. Also, there is some time in the overhead, that I just skip. What exactly does this overhead do, and is it useful? I do not know.
During this exploration, I did found that you can do ds.rolling(time=5).mean(allow_lazy=True). Which saves up some memory on compute time if you select a specific slice futher along. However, there is still something I don't understand, because it uses 20-40% more memory than my version and is slower.
So, it would be possible, but I don't see a way that would make an acceptable PR.

@tlogan2000
Copy link
Collaborator

This might indirectly be a solution to #184 where i could never completely understand the snowballing memory problems. I will try to see if this helps once merged to master

@tlogan2000
Copy link
Collaborator

tlogan2000 commented Nov 27, 2019

Questions: Can this be somehow ported to xarray ?

I think we could minimally create an issue on the xarray github describing the problems we are experiencing for long time-series and present @aulemahal 's solution / work around as being more efficient. They might decide to find a way to integrate it themselves?

@huard
Copy link
Collaborator

huard commented Nov 27, 2019

I've read similar memory issues happening when bottleneck is not installed. Could you confirm this is not the issue here.

@huard
Copy link
Collaborator

huard commented Nov 27, 2019

See pydata/xarray#3332 Might be a place where you could describe what you've done.

@tlogan2000
Copy link
Collaborator

Messing around with various reduction options I see that using 'std' results in very small diffs between xr and utils.rolling() not sure why..

    std_nd_xr = ds_nd.pr.rolling(time=5).std()
    std_nd_xc = utils.rolling(
        ds_nd.pr, window=5, dim="time", mode="std", keep_attrs=False
    )
    std_dask = utils.rolling(ds_dask.pr, window=5, dim="time", mode="std")

    xr.testing.assert_identical(std_nd_xr, std_nd_xc)  # fails : 'allclose' works
    xr.testing.assert_allclose(std_dask, std_nd_xr)

@tlogan2000
Copy link
Collaborator

tlogan2000 commented Nov 27, 2019

It might be prudent to include a chunked version of tests for functions that rely on rolling (or indirectly runlength) in the indicator test suite?

E.g. tests on .nc files in test_precip.py, test_temperature.py etc currently do not chunk the test datasets before calculating indices

Example from test_precip

class TestMaxNDay:
    # testing of wet_day and daily_pr_intensity, both are related

    nc_file = os.path.join(TESTS_DATA, "NRCANdaily", "nrcan_canada_daily_pr_1990.nc")

    def test_3d_data_with_nans(self):
        # test with 3d data
        pr = xr.open_dataset(self.nc_file).pr
        
        ##### potential added line
        pr_dask = xr.opend_dataset(self.nc_file, chunks=dict(time=10)).pr

        prMM = xr.open_dataset(self.nc_file).pr
        prMM.values *= 86400.0
        prMM.attrs["units"] = "mm/day"
        # put a nan somewhere
        prMM.values[10, 1, 0] = np.nan
        pr.values[10, 1, 0] = np.nan
        wind = 3
        out1 = atmos.max_n_day_precipitation_amount(pr, window=wind, freq="MS")
        out2 = atmos.max_n_day_precipitation_amount(prMM, window=wind, freq="MS")
       
        ##### potential added line
        out3 = atmos.max_n_day_precipitation_amount(pr_dask, window=wind, freq="MS")
       
        cont'd assertion tests ....

Maybe this overkill as we already test rolling generally in test_utils.py.. opinions?

@huard
Copy link
Collaborator

huard commented Nov 27, 2019

I think you're right, but I would like this to be parameterized. We don't want to duplicate all tests, we need to use pytest fictures to parameterize test input so that each test runs with both chunked and un-chunked datasets. This goes back to #52 ... I suggest we raise the priority on this.

@aulemahal
Copy link
Collaborator Author

Messing around with various reduction options I see that using 'std' results in very small diffs between xr and utils.rolling() not sure why..

@tlogan2000 I wasn't able to replicate your problem... Are you sure ds_nd in your example is not a dask array? If it is, then rolling() defaults to my method, which gives slightly different results.

@huard I confirm bottleneck is installed in my environment. Also, AFAIK bottleneck is not used by xarray when rolling on a dask array.

I'll modifiy the tests with parametrization in mind.

@aulemahal
Copy link
Collaborator Author

So, I realize (tnks @tlogan2000 ) that my solution was float-dependent, which isn't good. Fixed it, but this makes me wonder if other edge cases are not covered in my code, so I decided to put the function private. Like : if ever xarray fixes the problem, or we find a more convenient solution, I don't want this to be a breaking change.
For now, I believe the current solution is better or similar to doing:

    r = da.rolling(dim=win)
    a = r.construct('roll').reduce(np.func, dim='roll', allow_lazy=True)
    a = a.where(r._counts() >= win)

where the last line is to take care of NaN values and edges. Going through the construct -> reduce, saves some overhead and uses less memory than going directly to rolling().reduce()`.

These changes were really useful for my own script generating indices for the ouranos website, but I would understand if it isn't appropriate for xclim, as it's more a xarray problem.

@aulemahal
Copy link
Collaborator Author

Oh! And I added tests following recommendations from #52.

Copy link
Collaborator

@tlogan2000 tlogan2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done quite a bit of testing with this now and I think it seems to work quite well for dealing with the memory issues we were having.

@aulemahal Did some preliminary benchmarking comparing this to xr.rolling() and xr.rolling( allow_lazy=True) and it seems to be more efficient however I think it might be worhtwhile to do a more thorough benchmarking of all xclim fucntions requiring ,rolling() in order to get a fuller picture of the benefits in terms of speed / memory use. If overall the benefit is marginal compared to xr.rolling(allow_lazy=True) then we will have to consider whether it is worthwhile to maintain a custum utility long term

@tlogan2000
Copy link
Collaborator

I see the py3.5 build is failing but we will be dropping 3.5 shortly anyhow. Black and flake 8 failed as well but should be an easy fix..

@aulemahal
Copy link
Collaborator Author

So, I fixed that black thing.
The failing build was due to a implementation bug in python 3.5 (https://bugs.python.org/issue9232), I fixed that too.

@aulemahal
Copy link
Collaborator Author

I suggest we merge this for now.
We should definitely complete #316 before v1.0, but this is our best shot for the moment.

@aulemahal aulemahal merged commit eff131e into master Dec 5, 2019
@aulemahal aulemahal deleted the optimize-rolling-dask branch December 5, 2019 22:48
aulemahal added a commit that referenced this pull request Dec 6, 2019
@aulemahal aulemahal restored the optimize-rolling-dask branch January 6, 2020 21:54
@aulemahal aulemahal deleted the optimize-rolling-dask branch January 6, 2020 21:55
aulemahal added a commit that referenced this pull request Jan 6, 2020
@aulemahal aulemahal mentioned this pull request Jan 6, 2020
6 tasks
huard added a commit that referenced this pull request Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants