Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Data.transpose from LAMA to Dask #247

Merged

Conversation

sadielbartholomew
Copy link
Member

@sadielbartholomew sadielbartholomew commented Aug 23, 2021

Towards #182, 'daskify' the Data.transpose method. This one is fairly trivial (I believe) since there is a Dask intrinsic method to transpose.

Note I had to remove the loop and context management over chunksize in self.chunk_sizes for the corresponding test, but otherwise I have not touched that test, which now passes, as indicated by the removal of the unittest.skipIf decorator on the test which now passes locally and the CI will indicate if it passes across the job OS and version matrix.

cf/data/data.py Outdated Show resolved Hide resolved
@sadielbartholomew
Copy link
Member Author

Of relevance to this PR and to keep an eye on going forward: data-apis/array-api#228

@davidhassell davidhassell added the dask Relating to the use of Dask label Sep 30, 2021
@sadielbartholomew
Copy link
Member Author

HI @davidhassell, just 'bumping' this in case it fell off your radar. It would be good to get all outstanding lama-to-dask branch PRs merged (as for #254, I''ll update it with my notes summarising our discussion about it and work on it further very shortly).

cf/data/data.py Outdated
@@ -12142,21 +12142,14 @@ def transpose(self, axes=None, inplace=False, i=False):
)
# --- End: if
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ditch all of this checking, as dask duplicates it, and replace it with a try ... except ValueError ... around the actual dask transpose operation?

In [17]: from dask import array as da

In [18]: x = da.from_array([[2, 1]])

In [19]: x.transpose([2, 0])
    ...
ValueError: ('Unknown dimension', {2})

In [20]: x.transpose([1, 0, -1])
    ...
ValueError: axes don't match array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking, I didn't consider that. I imagine so, let me try it out and see...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK now I am implementing it, I see we use iaxes for the new d._axes value calculation, so we do need (want) the if/else (as it also short circuits for a trivial case which is good generally) but can cut all but the first line from the else (this might be what you were implying to being with). Unless, that is, there is a clever way to avoid calculating iaxes or the equivalent...

I'm about to push a new commit but if you have thoughts or a new idea relating to this, just let me know and we can tweak it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I see. Given our need to also permute the _axes attribute, I can't see a way other than keeping the existing code. No problem

@@ -859,6 +859,7 @@ def test_Data_squeeze_insert_dimension(self):
def test_Data___getitem__(self):
if self.test_only and inspect.stack()[0][3] not in self.test_only:
return
# TODODASK - why is there no testing here?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know! We'll certainly get some in when __getitem__ is daskified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... which has now happened (#257)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot. I'll remove this line. (Given the time since I put up this PR I probably should have done a quick scan to check for anything that could have been updated before tagging you, sorry 🙂)

cf/data/data.py Outdated Show resolved Hide resolved
cf/data/data.py Outdated Show resolved Hide resolved
sadielbartholomew added a commit to sadielbartholomew/cf-python that referenced this pull request Oct 7, 2021
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
@sadielbartholomew
Copy link
Member Author

As well as adding a few new commits to address the feedback, I have rebased onto lama-to-dask so that the reset_mask_hardness is known and recognised, and therefore test_Data passes (at least locally) rather than failing due to not knowing of that logic yet.

@sadielbartholomew
Copy link
Member Author

... hence the need to force-push.

@sadielbartholomew
Copy link
Member Author

sadielbartholomew commented Oct 7, 2021

Trying via open-close to re-trigger the CI jobs to indicate that the relevant test module passes...

@sadielbartholomew
Copy link
Member Author

Thanks for your feedback @davidhassell, I believe I have addressed everything now so let me know if any adjustments are required or desired.

@sadielbartholomew
Copy link
Member Author

Hi @davidhassell, since you are about to go on leave thought I might bump this in case you have a chance to confirm, is it ready to be merged? Next week I will return somewhat to cf-python daskifying work, FYI, though naturally mostly focusing on ES-DOC machine and performance.

cf/data/data.py Outdated Show resolved Hide resolved
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
@davidhassell
Copy link
Collaborator

Thanks for the nudge. I've made a minor style suggestion, but otherwise good to go!

@sadielbartholomew
Copy link
Member Author

Thanks for all of the carefully-considered feedback @davidhassell, I have corrected the typo you pointed out in a final commit so will now merge.

@sadielbartholomew sadielbartholomew merged commit c3164e7 into NCAS-CMS:lama-to-dask Oct 22, 2021
@sadielbartholomew sadielbartholomew deleted the lama-to-dask-4 branch October 22, 2021 14:02
@davidhassell davidhassell added this to the 3.14.0 milestone Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask Relating to the use of Dask
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants