Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change constructs access API to facilitate performance improvements #202

Closed
davidhassell opened this issue Apr 28, 2021 · 0 comments · Fixed by #210
Closed

Change constructs access API to facilitate performance improvements #202

davidhassell opened this issue Apr 28, 2021 · 0 comments · Fixed by #210
Assignees
Labels
enhancement New feature or request performance Relating to speed and memory performance
Milestone

Comments

@davidhassell
Copy link
Collaborator

davidhassell commented Apr 28, 2021

Current situation

Access to named constructs is currently either by the "filter_by_*" Constructs methods, or by Field attributes:

>>> f = cf.example_field(0)
>>> f.dimension_coordinates
<Constructs: dimension_coordinate(3)>
>>> f.constructs.filter_by_type('dimension_coordinate')
<Constructs: dimension_coordinate(3)>

The reason for the "dimension_coordinates" alias being an attribute rather than method is that it was expected that it would often be followed by more options:

>>> f.dimension_coordinates('time')
<Constructs: dimension_coordinate(1)>
>>> f.dimension_coordinates.filter_by_property(units='degrees_east')
<Constructs: dimension_coordinate(1)>

This is fine, but comes with the slight confusion that the result of the attribute is a callable Constructs instance, which obfuscates the help:

>>> help(f.dimension_coordinates)
Help on Constructs in module cf.constructs object:

class Constructs(cfdm.constructs.Constructs)
 |  Constructs(auxiliary_coordinate=None, dimension_coordinate=None, domain_ancillary=None, field_ancillary=None, cell_measure=None, coordinate_reference=None, domain_axis=None, cell_method=None, source=None, copy=True, _use_data=True, _view=False, _ignore=())
 |
 |  A container for metadata constructs.
 ...  

Performance

It turns out that the code for accessing constructs is very slow, and must be improved.

A key part of this improvement relies on being able to work with python dictionaries rather than dictionary-like Constructs objects.

To facilitate this, whilst retaining the intuitive nature of the API the least intrusive change is to make what were attributes properties: i.e. f.dimension_coordinates becomes f.dimension_coordinates(). This then allows keyword parameters that can change the behaviour when speed is an issue.
Code breaking

This change will break any code that uses bare construct access attributes:

>>> # These won't work any more
>>> f.auxiliary_coordinates  
>>> f.coordinate_references
>>> f.coordinates  
>>> f.cell_measures
>>> f.dimension_coordinates
>>> f.domain_ancillaries
>>> f.domain_axes
>>> f.cell_methods
>>> f.field_ancillaries
>>> # These will work as before
>>> f.auxiliary_coordinates()
>>> f.coordinate_references()
>>> f.coordinates()
>>> f.cell_measures()
>>> f.dimension_coordinates()
>>> f.domain_ancillaries()
>>> f.domain_axes()
>>> f.cell_methods()
>>> f.field_ancillaries()
>>> # These will also work as before
>>> f.auxiliary_coordinates(x)
>>> f.coordinate_references(x)
>>> f.coordinates(x)
>>> f.cell_measures(x)
>>> f.dimension_coordinates(x)
>>> f.domain_ancillaries(x)
>>> f.domain_axes(x)
>>> f.cell_methods(x)
>>> f.field_ancillaries(x)

This is the only backwards incompatible change to the API. All other changes will not break existing code.

With the new API, reading a file is, in one reproducible test, ~3 times faster:

>>> import cf, timeit
>>> f = cf.example_field(1)
>>> cf.write(f, 'tmp.nc')
>>> cf.__version__
3.8.0
>>> sum(timeit.repeat("cf.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
0.22618969043996912
>>> import cf, timeit
>>> cf.__version__
3.9.0b1
>>> sum(timeit.repeat("cf.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
0.07341453492001165

Note that much of the improvement comes from unrelated changes (such as removing unnecessary __repr__ calls and unnecessary deep copies).

See NCAS-CMS/cfdm#130 for further details. In particular, the cfdm version of the above timing test gives a ~10 times speed up - the reasons for the lesser improvement in cf-python will need investigating

Edit: The less speed-up is understandable and due to extra checks being carried out in, e.g., set_construct. This is not to say that things can't be sped up more, but that is for another issue

PR to follow ...

@davidhassell davidhassell added the enhancement New feature or request label Apr 28, 2021
@davidhassell davidhassell self-assigned this Apr 28, 2021
@davidhassell davidhassell added this to the 3.9.0 milestone Apr 28, 2021
@davidhassell davidhassell added the performance Relating to speed and memory performance label Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Relating to speed and memory performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant