Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data.func and friends: migrate to Dask #300

Merged

Conversation

sadielbartholomew
Copy link
Member

@sadielbartholomew sadielbartholomew commented Jan 24, 2022

Convert the Data.func method towards #182, in doing so daskifying several other element-wise operation methods that use it to perform the underlying operation, as follows (though note the below context since in short most of these can and will be then converted to use Dask built-ins rather than calling func):

  • the trigonometric and hyperbolic methods and their inverses (12 in total: sin, arcsin, sinh, arcsin etc.);
  • ceil;
  • exp;
  • floor;
  • rint;
  • round;
  • trunc;

and daskifying log except in the case of setting a base that isn't e, 2 or 10 which requires which requires __itruediv__ to be daskified to work.

Context and plan/proposal for follow-on work

As well as daskifying func this PR effectively daskifies the other methods listed above in doing so because they use func to operate in the code at present, hence I have removed the test-skipping decorators from the unit tests for, and added the daskified marker to, each of the methods (except log where a comment has been made to note the nearly-daskified state), however such methods are more efficiently daskified by conversion to using the existing Dask built-in equivalents instead of calling func this way.

So the plan is to use follow-on PRs to convert from use of func to the appropriate built-in in each method. It turns out there are Dask built-ins for every one listed above available for us to use directly.

A small complication is in the case of the trig. and hyperbolic methods with a restricted domain (4 of the 12, e.g. arcsin which is undefined outside of [-1, 1]), since we support a preserve_invalid keyword which preserves any NaN or inf like values rather than masking them as important for the use context of cf-python. In order to still support that for methods we think should have it as a keyword flag, we should use the daskified func rather than the Dask built-in, as it already implements that keyword. Overall, therefore, my proposal is as follows, to be applied in a series of new PRs:

For data methods that apply a trivial operation in an element-by-element manner (see list above), use the following:

  1. If there is a Dask built-in equivalent method, use that directly, except where there is a valid reason we can't, notably as an example for methods where there is a restricted domain such that the output may have new masked elements, where we want to have a preserve_invalid option that NumPy does not support in any trivial way.
  2. Otherwise use func to apply the operation.

@sadielbartholomew sadielbartholomew changed the title Data.func: convert old algorithm to Dask array form as first step Data.func: migrate to Dask Jan 25, 2022
@sadielbartholomew sadielbartholomew marked this pull request as draft January 25, 2022 00:02
@sadielbartholomew sadielbartholomew self-assigned this Jan 25, 2022
@sadielbartholomew sadielbartholomew added dask Relating to the use of Dask question General question labels Jan 25, 2022
@sadielbartholomew sadielbartholomew changed the title Data.func: migrate to Dask Data.func and friends: migrate to Dask Jan 26, 2022
@sadielbartholomew sadielbartholomew marked this pull request as ready for review January 26, 2022 01:51
Copy link
Collaborator

@davidhassell davidhassell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sadie, not much to say here, as it all looks good. I agree with you analysis/context/plan/proposal (#300 (comment)). I can confirm that the tests all pass, and have noted the TODOs in them to be sorted out when other functionality comes on line.

Thanks - please merge when you're ready.

cf/data/data.py Outdated Show resolved Hide resolved
cf/data/data.py Show resolved Hide resolved
@sadielbartholomew
Copy link
Member Author

Thanks for the feedback, @davidhassell. I've deprecated the 'out' keyword as suggested, and also added a minimal unit test for Data.func as an explicit test was missing (and I realised I had not added the 'daskified' marker decorator to the trig. and hyperbolic methods, so did that too in a third commit). None of those updates require a re-review, as you implied in your review comment. I'll just check the CI jobs pass and then merge - re-triggering those via open-close...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask Relating to the use of Dask question General question
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants