Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compliance with Dask version 2021.03 ? #27

Closed
knutfrode opened this issue Mar 10, 2021 · 15 comments · Fixed by dask/dask#7391
Closed

Compliance with Dask version 2021.03 ? #27

knutfrode opened this issue Mar 10, 2021 · 15 comments · Fixed by dask/dask#7391
Labels
bug Something isn't working

Comments

@knutfrode
Copy link

knutfrode commented Mar 10, 2021

First of all thank you for making this useful tool.
It has worked well for me until Dask version 2021.03, after which I am getting error as below when calling histogram. I apologize that this might be not be sufficient information to reproduce.

../../opendrift/models/basemodel.py:3707: in get_density_xarray
    h = histogram(self.ds.lon, self.ds.lat, bins=[lonbin, latbin],
../../../../miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/xarray.py:133: in histogram
    h_data = _histogram(*args_data, weights=weights_data, bins=bins, axis=axis,
../../../../miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/core.py:261: in histogram
    bin_counts = _histogram_2d_vectorized(*all_args_reshaped, bins=bins,
../../../../miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/core.py:133: in _histogram_2d_vectorized
    bin_counts = _dispatch_bincount(bin_indices, weights, N, hist_shapes,
../../../../miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/core.py:81: in _dispatch_bincount
    return _bincount_2d(bin_indices, weights, N, hist_shapes)
../../../../miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/core.py:32: in _bincount_2d
    return bc_offset.reshape(final_shape)
../../../../miniconda3/envs/opendrift/lib/python3.9/site-packages/dask/array/core.py:1919: in reshape
    return reshape(self, shape, merge_chunks=merge_chunks)
        if reduce(mul, shape, 1) != x.size:
>           raise ValueError("total size of new array must be unchanged")
E           ValueError: total size of new array must be unchanged

../../../../miniconda3/envs/opendrift/lib/python3.9/site-packages/dask/array/reshape.py:206: ValueError
@knutfrode knutfrode changed the title Compliance with Xarray 0.17 ? Compliance with Dask version 2021.03 ? Mar 10, 2021
@dougiesquire
Copy link
Contributor

Noting that @aaronspring encountered what appears to be a similar problem using dask==2021.03.0 with xhistogram inside xskillscore (xarray-contrib/xskillscore#280)

@rabernat rabernat added the bug Something isn't working label Mar 11, 2021
@rabernat
Copy link
Contributor

Ok, thanks for reporting this. Need to get a fix in asap. Any ideas?

Ideally we should be testing against xarray and dask master branches.

willirath added a commit to geomar-od-lagrange/parcels-container that referenced this issue Mar 11, 2021
Needed until xgcm/xhistogram#27 is resolved.
@raybellwaves
Copy link
Contributor

Took a quick look at one of the failing tests in xskillscore xarray-contrib/xskillscore#280 (comment)

Failing at dask/array/reshape.py:206. Worth getting at with a minimal example.

@raybellwaves
Copy link
Contributor

raybellwaves commented Mar 17, 2021

@rabernat
Copy link
Contributor

The core issue is that dask's bincount function, which we import here

bincount = _dask_or_eager_func('bincount')

is now returning dask arrays with shape (). I'm looking into this. Is there a dask PR someone can point to where this behavior was changed?

@raybellwaves
Copy link
Contributor

maybe dask/dask#7391 will help?

@rabernat
Copy link
Contributor

I just reviewed that PR and I'm 99% sure that it will fix the problem.

I consider this a regression in dask, so we will not be making a workaround in xhistogram. We have two choices here:

  • Do nothing and wait until it's fixed + released in dask.
  • Pin xhistogram to an earlier version of dask and do a bugfix release now.

What do those affected prefer?

@dougiesquire
Copy link
Contributor

I'm fine with doing nothing if others are. dask releases are more or less monthly and I think we've already pinned an earlier version of dask in the last xskillscore release.

@jrbourbeau
Copy link
Contributor

dask/dask#7391 was just merged into Dask's main branch and I confirmed that tests here pass with the changes in that PR

@aaronspring
Copy link
Contributor

aaronspring commented Apr 20, 2021

I find a followup issue with dask==2021.04: xarray-contrib/xskillscore#288

Its a new error message:

skillscore/tests/test_probabilistic.py:831: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
xskillscore/core/probabilistic.py:1209: in roc
    dim=dim,
xskillscore/core/contingency.py:122: in __init__
    self._table = self._get_contingency_table(dim)
xskillscore/core/contingency.py:167: in _get_contingency_table
    bin_dim_suffix="_bin",
xskillscore/core/utils.py:166: in histogram
    return xhist(*args, bins=bins, **kwargs)
/usr/share/miniconda/envs/xskillscore-minimum-tests/lib/python3.7/site-packages/xhistogram/xarray.py:146: in histogram
    block_size=block_size
/usr/share/miniconda/envs/xskillscore-minimum-tests/lib/python3.7/site-packages/xhistogram/core.py:271: in histogram
    block_size=block_size,
/usr/share/miniconda/envs/xskillscore-minimum-tests/lib/python3.7/site-packages/xhistogram/core.py:131: in _histogram_2d_vectorized
    bin_indices = ravel_multi_index(each_bin_indices, hist_shapes)
/usr/share/miniconda/envs/xskillscore-minimum-tests/lib/python3.7/site-packages/xhistogram/duck_array_ops.py:24: in f
    return getattr(module, name)(*args, **kwargs)
<__array_function__ internals>:6: in ravel_multi_index
    ???
/usr/share/miniconda/envs/xskillscore-minimum-tests/lib/python3.7/site-packages/dask/array/core.py:1525: in __array_function__
    return da_func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

multi_index = [dask.array<digitize, shape=(1, 200), dtype=int64, chunksize=(1, 200), chunktype=numpy.ndarray>, dask.array<digitize, shape=(1, 200), dtype=int64, chunksize=(1, 200), chunktype=numpy.ndarray>]
dims = [4, 4], mode = 'raise', order = 'C'

    @wraps(np.ravel_multi_index)
    def ravel_multi_index(multi_index, dims, mode="raise", order="C"):
>       return multi_index.map_blocks(
            _ravel_multi_index_kernel,
            dtype=np.intp,
            chunks=(multi_index.shape[-1],),
            drop_axis=0,
            func_kwargs=dict(dims=dims, mode=mode, order=order),
        )
E       AttributeError: 'list' object has no attribute 'map_blocks'

xhist master doesnt solve it:
xarray-contrib/xskillscore#289

@raybellwaves
Copy link
Contributor

xgcm tests seem ok on upstream: https://github.com/xgcm/xhistogram/actions/workflows/upstream.yml
Is there something in xskillscore that isn't tested in xgcm?

@aaronspring
Copy link
Contributor

the xhist wrapper for datasets by @dougiesquire

@dougiesquire
Copy link
Contributor

I think I've found the issue here. It's an issue in dask that was introduced into xhistogram in the latest version.

The dask version of ravel_multi_index (introduced into xhistogram in 0.1.3) does not actually replicate the API of the equivalent numpy function. Specifically, the numpy implementation can receive a tuple of integer arrays, whereas the dask version cannot (despite saying it can). This functionality is required in xhistogram when there are multiple args.

That is, the following works:

import numpy as np
import dask.array as dsa

arr = np.array([[3,6,6],[4,5,1]]) # Example from numpy docs
da = dsa.from_array(arr)

dsa.ravel_multi_index(da, (7,7))

but the following fails with the error 'tuple' object has no attribute 'map_blocks':

dsa.ravel_multi_index((da, da), (7,7))

This is a bug in dask and I'll open an issue about it.

xhistogram doesn't currently have tests with multiple dask args, which is why this didn't get flagged earlier (we should add them!), but xskillscore does try to use xhistogram in this way (and thus fails).

@raybellwaves, @aaronspring one solution for now in xskillscore is to pin xhistogram==0.1.2.

@dougiesquire
Copy link
Contributor

Issue opened with dask here: dask/dask#7580

@dougiesquire
Copy link
Contributor

Issue opened with dask here: dask/dask#7580

A fix for this has been implemented in the latest release of dask (2021.06.1). In the meantime, however, the xhistogram refactor to use dask.array.blockwise (#49) has removed our dependence on the dask.array functions referenced in this issue, so we actually no longer need this fix.

Closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants