Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask: Data.__init__ with no arguments #551

Merged
merged 3 commits into from
Jan 12, 2023

Conversation

davidhassell
Copy link
Collaborator

(Re-) Fixes #378. See that issue for details.

@davidhassell davidhassell added the dask Relating to the use of Dask label Jan 5, 2023
@davidhassell davidhassell added this to the 3.14.0 milestone Jan 5, 2023
@davidhassell davidhassell linked an issue Jan 5, 2023 that may be closed by this pull request
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes themselves are fine and good, and with this PR merged creation of a plain cf.Data() works and is tested to do so, but I am concerned about side-effects of the creation of such an object. For example from seeing the list of methods available on a case via a __dir__() call, and trying out some, I hit trivial errors such as those illustrated below. Those are often the kinds of errors you might expect to happen, conceptually, on an array-less instance, or following from making:

the "dask" key in the _custom dictionary is inintialised to None, thereby allowing basic inspection of the instance.

Ultimately I am not really sure whether allowing this (Data.__init__ with no arguments) is a good idea after all, because really if we allow it we should go through all of the possible methods for Data and ensure they run or (probably for most) error gracefully with that case, and that could be pretty time-consuming. But with the alternative I implemented in #491 we have the complication of the docstring rewriting testing, so either way there is work to be done!

In light of that, what do you think we should do from here? Could we add some fancy logic, perhaps, to stop methods being called on a Data instance created with no data? Or (less preferred, in my eyes) warn users upon initialisation of data-less Data that many or most methods won't work properly?

Method call examples

>>> d = cf.Data()
>>> d.ndim
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sadie/cf-python/cf/data/data.py", line 4357, in ndim
    return dx.ndim
AttributeError: 'NoneType' object has no attribute 'ndim'
>>> d.ndim()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sadie/cf-python/cf/data/data.py", line 4357, in ndim
    return dx.ndim
AttributeError: 'NoneType' object has no attribute 'ndim'
>>> d.size()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sadie/cf-python/cf/data/data.py", line 4427, in size
    size = dx.size
AttributeError: 'NoneType' object has no attribute 'size'
>>> d.max()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sadie/cfdm/cfdm/decorators.py", line 42, in inplace_wrapper
    self.INPLACE_ENABLED_PLACEHOLDER = self.copy()
  File "/home/sadie/cfdm/cfdm/data/data.py", line 1520, in copy
    return super().copy(array=array)
  File "/home/sadie/cfdm/cfdm/core/data/data.py", line 322, in copy
    return type(self)(source=self, copy=True, _use_array=array)
  File "/home/sadie/cf-python/cf/data/data.py", line 361, in __init__
    self._set_dask(array, copy=copy, conform=False)
  File "/home/sadie/cf-python/cf/data/data.py", line 1282, in _set_dask
    array = array.copy()
AttributeError: 'NoneType' object has no attribute 'copy'
>>> d.get_filenames()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sadie/cf-python/cf/data/data.py", line 5474, in get_filenames
    dsk = collections_to_dsk((self.to_dask_array(),), optimize_graph=True)
  File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/dask/base.py", line 368, in collections_to_dsk
    dsk, keys = _extract_graph_and_keys(val)
  File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/dask/base.py", line 394, in _extract_graph_and_keys
    graphs.append(v.__dask_graph__())
AttributeError: 'NoneType' object has no attribute '__dask_graph__'
>>> d.is_masked()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sadie/cf-python/cf/data/data.py", line 4281, in is_masked
    out_ind = tuple(range(dx.ndim))
AttributeError: 'NoneType' object has no attribute 'ndim'

@davidhassell
Copy link
Collaborator Author

Hi Sadie, OK, how about 59a6e3d, which produces:

>>> import cf
>>> d = cf.Data()
>>> d.ndim
Traceback (most recent call last):
    ...
ValueError: Data object has no data
>>> d.get_filenames()
Traceback (most recent call last):
    ...
ValueError: Data object has no data
>>> 

@sadielbartholomew
Copy link
Member

OK, how about 59a6e3d

Ah, perfect! Achieves what I hoped for as best solution, namely:

go through all of the possible methods for Data and ensure they run or (probably for most) error gracefully with that case

without any of the nuisance I anticipated! Great idea.

I'm just re-running the test suite and then I can approve. Thanks.

Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good now (feedback addressed). Ideally we can add one or two self.assertRaises(ValueError) checks to the test to ensure the behaviour implemented by 59a6e3d is coming through to the methods (which I have tested manually), but regardless please merge when ready.

@davidhassell
Copy link
Collaborator Author

Thanks, @sadielbartholomew - I've add a test (8e343bb) ... and merging.

@davidhassell davidhassell merged commit ab6b8cb into NCAS-CMS:lama-to-dask Jan 12, 2023
@davidhassell davidhassell deleted the dask-data-no-args branch February 6, 2023 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask Relating to the use of Dask
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LAMA to Dask: managing default(-argument) Data instance
2 participants