-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive tokenization #3515
Recursive tokenization #3515
Conversation
xref pydata/sparse#300 |
# Test DataArray and Variable | ||
da_a = DataArray(a) | ||
da_b = DataArray(b) | ||
assert dask.base.tokenize(da_a) != dask.base.tokenize(da_b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I'm clear, these are two different (but equal) xarray objects, so we don't want dask to think that they are the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They aren't equal. da_b[5000] == 2, while da_a[5000] == 1.
@@ -856,6 +856,10 @@ def test_dask_token(): | |||
import dask | |||
|
|||
s = sparse.COO.from_numpy(np.array([0, 0, 1, 2])) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xfail this instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was an option. But it would have completely disabled the test, and the test is really about not accidentally invoking self.values
(which in turn invokes self._variable._data.__array__()
) on NEP18-compatible backends. It is not really about testing sparse specifically; it just conveniently relies on the fact that COO.__array__
raises an exception. A more correct, but also more laborious, test could have created a NEP18-compatible dummy class on the fly.
* upstream/master: Allow appending datetime & boolean variables to zarr stores (pydata#3504) warn if dim is passed to rolling operations. (pydata#3513) Deprecate allow_lazy (pydata#3435) Recursive tokenization (pydata#3515) format indexing.rst code with black (pydata#3511)
* upstream/master: Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) Allow appending datetime & boolean variables to zarr stores (pydata#3504) warn if dim is passed to rolling operations. (pydata#3513) Deprecate allow_lazy (pydata#3435) Recursive tokenization (pydata#3515)
* upstream/master: (22 commits) Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) Allow appending datetime & boolean variables to zarr stores (pydata#3504) warn if dim is passed to rolling operations. (pydata#3513) Deprecate allow_lazy (pydata#3435) Recursive tokenization (pydata#3515) format indexing.rst code with black (pydata#3511) add missing pint integration tests (pydata#3508) DOC: update bottleneck repo url (pydata#3507) add drop_sel, drop_vars, map to api.rst (pydata#3506) remove syntax warning (pydata#3505) ...
After misreading the dask documentation https://docs.dask.org/en/latest/custom-collections.html#deterministic-hashing, I was under the impression that the output of
__dask_tokenize__
would be recursively parsed, like it happens for__getstate__
or__reduce__
. That's not the case - the output of__dask_tokenize__
is just fed into a str() function so it has to be made explicitly recursive!