[WIP] Fix problem with wrong chunksizes when using rolling_window on dask.array #2532

cchwala · 2018-10-31T21:12:03Z

Closes interpolate_na with limit argument changes size of chunks #2514
Closes DataArray.rolling() does not preserve chunksizes in some cases #2531
Tests added (for all bug fixes or enhancements)
Fully documented, including whats-new.rst for all changes

Short summary

The two rolling-window functions for dask.array

will be fixed to preserve dask.array chunksizes.

Long summary

The specific initial problem with chunksizes and interpolate_na() in #2514 is caused by the padding done in

xarray/xarray/core/dask_array_ops.py

Lines 74 to 85 in 5940100

    
           if pad_size > 0: 
        
               if pad_size < depth[axis]: 
        
                   # overlapping requires each chunk larger than depth. If pad_size is 
        
                   # smaller than the depth, we enlarge this and truncate it later. 
        
                   drop_size = depth[axis] - pad_size 
        
                   pad_size = depth[axis] 
        
               shape = list(a.shape) 
        
               shape[axis] = pad_size 
        
               chunks = list(a.chunks) 
        
               chunks[axis] = (pad_size, ) 
        
               fill_array = da.full(shape, fill_value, dtype=a.dtype, chunks=chunks) 
        
               a = da.concatenate([fill_array, a], axis=axis)

which adds a small array with a small chunk to the initial array.

There is another related problem where DataArray.rolling() changes the size and distribution of dask.array chunks which stems from this code

xarray/xarray/core/dask_array_ops.py

Line 23 in b622c5e

def dask_rolling_wrapper(moving_func, a, window, min_count=None, axis=-1):

For some (historic) reason there are these two rolling-window functions for dask. Both need to be fixed to preserve chunksize of a dask.array in all cases.

pep8speaks · 2018-10-31T21:12:09Z

Hello @cchwala! Thanks for submitting the PR.

There are no PEP8 issues in the file xarray/core/dask_array_ops.py !
There are no PEP8 issues in the file xarray/tests/test_dataarray.py !
There are no PEP8 issues in the file xarray/tests/test_missing.py !

fujiisoup

Hi, @cchwala
Thanks for sending a PR.

Some tests are failing, though.

I am not very sure how the rechunking the whole array affects the performance. Do you have any idea?

fujiisoup · 2018-11-01T07:33:39Z

xarray/core/dask_array_ops.py

+        rechunk_chunks = list(a.chunks)
+        rechunk_chunks[axis] = rechunk_chunks[axis] + (pad_size,)
+        a = da.concatenate([fill_array, a], axis=axis).rechunk(
+            {axis: rechunk_chunks[axis]})


I think it is rechunking whole a. Doesn't it affect the performance?

That was also my concern, but I had no time to check yet. Will do to answer this. I did not find a way to "merge" two chunks which would probably be the best thing with regard to performance.

cchwala · 2018-11-01T08:13:48Z

Yes. Test are still failing. The PR is WIP. I just wanted to open the PR now to have the discussion here instead of in the issues.

I will work on fixing the code to pass all current test. I will also check how the rechunking affects performance.

cchwala added 2 commits October 26, 2018 17:14

Initial version of fix and test for original bug

45dce77

Added test that really test for bug in GH 2514

8a4e590

fujiisoup reviewed Nov 1, 2018

View reviewed changes

dcherian added the topic-rolling label Feb 16, 2021

dcherian mentioned this pull request Mar 1, 2021

Use numpy & dask sliding_window_view for rolling #4977

Merged

4 tasks

dcherian closed this in #4977 Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] Fix problem with wrong chunksizes when using rolling_window on dask.array #2532

[WIP] Fix problem with wrong chunksizes when using rolling_window on dask.array #2532

Uh oh!

cchwala commented Oct 31, 2018

Uh oh!

pep8speaks commented Oct 31, 2018

Uh oh!

fujiisoup left a comment

Uh oh!

fujiisoup Nov 1, 2018

Uh oh!

cchwala Nov 1, 2018

Uh oh!

cchwala commented Nov 1, 2018

Uh oh!

Uh oh!

	if pad_size > 0:
	if pad_size < depth[axis]:
	# overlapping requires each chunk larger than depth. If pad_size is
	# smaller than the depth, we enlarge this and truncate it later.
	drop_size = depth[axis] - pad_size
	pad_size = depth[axis]
	shape = list(a.shape)
	shape[axis] = pad_size
	chunks = list(a.chunks)
	chunks[axis] = (pad_size, )
	fill_array = da.full(shape, fill_value, dtype=a.dtype, chunks=chunks)
	a = da.concatenate([fill_array, a], axis=axis)

Uh oh!

[WIP] Fix problem with wrong chunksizes when using rolling_window on dask.array #2532

[WIP] Fix problem with wrong chunksizes when using rolling_window on dask.array #2532

Uh oh!

Conversation

cchwala commented Oct 31, 2018

Short summary

Long summary

Uh oh!

pep8speaks commented Oct 31, 2018

Uh oh!

fujiisoup left a comment

Choose a reason for hiding this comment

Uh oh!

fujiisoup Nov 1, 2018

Choose a reason for hiding this comment

Uh oh!

cchwala Nov 1, 2018

Choose a reason for hiding this comment

Uh oh!

cchwala commented Nov 1, 2018

Uh oh!

Uh oh!