Skip to content

Commit

Permalink
Mergeback of FEATURE_chunk_control branch (#5588)
Browse files Browse the repository at this point in the history
* Merge chunk control code into latest iris (#5565)

* Dask chunking control for netcdf loading.

* renamed loader

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix indentation error, perhaps also docstring error

* fixed result error in loader, and set tests to treat as big files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trial and error, solve non iterable tuple 1.0

* trial and error, solve non iterable tuple 2.0 (used if var is none: instead of if var: )

* commented out docstring

* fixed mock 'no name' failure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed precommit issues

* corrected docstrings as per review comments

* Removed unnecessary line

Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>

---------

Co-authored-by: Patrick Peglar <patrick.peglar@metoffice.gov.uk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>

* Chunk control modes (#5575)

* added modes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added as_dask mode

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaned up enum and as_dask, as per review comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* corrected  to  in final place

* unindented lazy_param assignment one indent

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Corrected required type of dimension_chunksizes. (#5581)

* Chunk Control Tests (#5583)

* converted tests to pytest, added neg_one, and incomplete from_file and as_dask tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added from_file test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added mocking tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trial and error with mocks and patches, may or may not work

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* converted Mock to patch in as_dask test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comment changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pre commit fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments, and added test in test__get_cf_var_data()

* added in another test

* added tests and fixed review comments

* added AuxCoord test

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Chunk control minor fixes (#5593)

* Disallow chunks=None in optimum_chunksize.

* Clearer docstrings.

* Corrected docstring.

* Chunk Control documentation (#5597)

* init PR, skeleton TP

* whoops, missed the TP.

* fixed doctests in rst file

* correct triple chevron to elipses

* updated set doctest to better show functionality

* removed in-progress doctest code

* Review comments, part 1

* Review comments, part 2

* changed numpy docs dict

* wait, this way is better

* fixed linkcheck failures (maybe)

* fixed :meth:

* fixed a couple doc bits

* hopefully fixed doctests

* newest review comments

* fixed rendering, and wording in docstring

* fixed docstring numpyness

* What's New Entry (#5601)

* written whatsnew entry

* added ref

* moved label to before title

---------

Co-authored-by: Elias <110238618+ESadek-MO@users.noreply.github.com>
Co-authored-by: Patrick Peglar <patrick.peglar@metoffice.gov.uk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
4 people authored Nov 23, 2023
1 parent 54582d9 commit 507c34c
Show file tree
Hide file tree
Showing 8 changed files with 768 additions and 80 deletions.
1 change: 1 addition & 0 deletions docs/src/techpapers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ Extra information on specific technical issues.

um_files_loading.rst
missing_data_handling.rst
netcdf_io.rst
140 changes: 140 additions & 0 deletions docs/src/techpapers/netcdf_io.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
.. testsetup:: chunk_control

import iris
from iris.fileformats.netcdf.loader import CHUNK_CONTROL

from pathlib import Path
import dask
import shutil
import tempfile

tmp_dir = Path(tempfile.mkdtemp())
tmp_filepath = tmp_dir / "tmp.nc"

cube = iris.load(iris.sample_data_path("E1_north_america.nc"))[0]
iris.save(cube, tmp_filepath, chunksizes=(120, 37, 49))
old_dask = dask.config.get("array.chunk-size")
dask.config.set({'array.chunk-size': '500KiB'})


.. testcleanup:: chunk_control

dask.config.set({'array.chunk-size': old_dask})
shutil.rmtree(tmp_dir)

.. _netcdf_io:

=============================
NetCDF I/O Handling in Iris
=============================

This document provides a basic account of how Iris loads and saves NetCDF files.

.. admonition:: Under Construction

This document is still a work in progress, so might include blank or unfinished sections,
watch this space!


Chunk Control
--------------

Default Chunking
^^^^^^^^^^^^^^^^

Chunks are, by default, optimised by Iris on load. This will automatically
decide the best chunksize for your data without any user input. This is
calculated based on a number of factors, including:

- File Variable Chunking
- Full Variable Shape
- Dask Default Chunksize
- Dimension Order: Earlier (outer) dimensions will be prioritised to be split over later (inner) dimensions.

.. doctest:: chunk_control

>>> cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.shape)
(240, 37, 49)
>>> print(cube.core_data().chunksize)
(60, 37, 49)

For more user control, functionality was updated in :pull:`5588`, with the
creation of the :data:`iris.fileformats.netcdf.loader.CHUNK_CONTROL` class.

Custom Chunking: Set
^^^^^^^^^^^^^^^^^^^^

There are three context manangers within :data:`~iris.fileformats.netcdf.loader.CHUNK_CONTROL`. The most basic is
:meth:`~iris.fileformats.netcdf.loader.ChunkControl.set`. This allows you to specify the chunksize for each dimension,
and to specify a ``var_name`` specifically to change.

Using ``-1`` in place of a chunksize will ensure the chunksize stays the same
as the shape, i.e. no optimisation occurs on that dimension.

.. doctest:: chunk_control

>>> with CHUNK_CONTROL.set("air_temperature", time=180, latitude=-1, longitude=25):
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(180, 37, 25)

Note that ``var_name`` is optional, and that you don't need to specify every dimension. If you
specify only one dimension, the rest will be optimised using Iris' default behaviour.

.. doctest:: chunk_control

>>> with CHUNK_CONTROL.set(longitude=25):
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(120, 37, 25)

Custom Chunking: From File
^^^^^^^^^^^^^^^^^^^^^^^^^^

The second context manager is :meth:`~iris.fileformats.netcdf.loader.ChunkControl.from_file`.
This takes chunksizes as defined in the NetCDF file. Any dimensions without specified chunks
will default to Iris optimisation.

.. doctest:: chunk_control

>>> with CHUNK_CONTROL.from_file():
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(120, 37, 49)

Custom Chunking: As Dask
^^^^^^^^^^^^^^^^^^^^^^^^

The final context manager, :meth:`~iris.fileformats.netcdf.loader.ChunkControl.as_dask`, bypasses
Iris' optimisation all together, and will take its chunksizes from Dask's behaviour.

.. doctest:: chunk_control

>>> with CHUNK_CONTROL.as_dask():
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(70, 37, 49)


Split Attributes
-----------------

TBC


Deferred Saving
----------------

TBC


Guess Axis
-----------

TBC
8 changes: 8 additions & 0 deletions docs/src/whatsnew/latest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ This document explains the changes made to Iris for this release
intervention preventing :func:`~iris.util.guess_coord_axis` from acting on a
coordinate. (:pull:`5551`)

#. `@pp-mo`_, `@trexfeathers`_ and `@ESadek-MO`_ added more control over
NetCDF chunking with the use of the :data:`iris.fileformats.netcdf.loader.CHUNK_CONTROL`
context manager. (:pull:`5588`)


🐛 Bugs Fixed
=============
Expand Down Expand Up @@ -118,6 +122,10 @@ This document explains the changes made to Iris for this release
#. `@ESadek-MO`_ added a phrasebook for synonymous terms used in similar
packages. (:pull:`5564`)

#. `@ESadek-MO`_ and `@trexfeathers`_ created a technical paper for NetCDF
saving and loading, :ref:`netcdf_io` with a section on chunking, and placeholders
for further topics. (:pull:`5588`)


💼 Internal
===========
Expand Down
Loading

0 comments on commit 507c34c

Please sign in to comment.