Skip to content

Commit

Permalink
NetCDF thread safety take two (#5095)
Browse files Browse the repository at this point in the history
* Unpin netcdf4.

* Temporarily enable GHA on this branch.

* Temporarily enable GHA on this branch.

* Temporarily enable GHA on this branch.

* Experiment to disable wheel CI on forks.

* Disable segfaulting routines.

* More temporary changes to get CI passing.

* More temporary changes to get CI passing.

* Finessed segfault skipping.

* Bring in changed from #5061.

* Re-instate test_load_laea_grid.

* Adaptations to get the tests passing.

* Use typing.Mapping instead.

* Get doctests passing.

* CF only resolve non-url filenames.

* Confirm thread safety fixes.

* Remove dummy assignment.

* Restored plot_nemo What's New entry.

* _add_aux_factories temporarily release global lock.

* Remove per-file locking.

* Remove remaining test workarounds.

* Remove remaining comments.

* Correct use of CFReader context manager.

* Refactor for easier future maintenance.

* Rename netcdf _thread_safe, add header.

* Full use of ThreadSafeAggregators.

* Full use of ThreadSafeAggregators.

* Remove remaining imports of NetCDF4.

* Test to ensure netCDF4 is via _thread_safe module.

* More refined netcdf._thread_safe classes.

* _thread_safe docstrings.

* Restore original NetCDF code where possible.

* Revert changes to 2.3.rst.

* Update lockfiles.

* Additions to _thread_safe.py

* Remove temporary CI shims.

* New locking stategy for NetCDFDataProxy.

* NetCDFDataProxy simpler use of netCDF4 lock.

* Update lock files.

* Go back to using a Threading Lock.

* Remove superfluous pass commands in test_cf.py.

* Rename _thread_safe to _thread_safe_nc.

* Rename thread safe classes to be 'Wrappers'.

* Better contained getattr and setattr pattern.

* Explicitly name netCDF4 module in _thread_safe_nc docstring.

* Better docstring for _ThreadSafeWrapper.

* Better comment about THREAD_SAFE_FLAG.

* list() wrapping within _GLOBAL_NETCDF4_LOCK, to account for generators.

* More accurate thread_safe docstrings in netcdf.saver.

* Split netcdf integration tests into multiple modules.

* Tests for non-thread-safe NetCDF behaviour.

* Docstring accuracy.

* Correct use of dask config set (context manager).

* Update dependencies.

* Review - don't need the first-class import of iris.tests.

* Better name for the loading test.

* Better selection of data to load.

* What's New entry.

* Improve tests.

* Update lock files.

* Increase chunking on test_save.

---------

Co-authored-by: Patrick Peglar <patrick.peglar@metoffice.gov.uk>
  • Loading branch information
trexfeathers and pp-mo authored Feb 20, 2023
1 parent ca42c30 commit 11da71b
Show file tree
Hide file tree
Showing 48 changed files with 1,799 additions and 1,179 deletions.
1 change: 1 addition & 0 deletions docs/src/common_links.inc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
.. _issues on GitHub: https://github.com/SciTools/iris/issues?q=is%3Aopen+is%3Aissue+sort%3Areactions-%2B1-desc
.. _python-stratify: https://github.com/SciTools/python-stratify
.. _iris-esmf-regrid: https://github.com/SciTools-incubator/iris-esmf-regrid
.. _netCDF4: https://github.com/Unidata/netcdf4-python


.. comment
Expand Down
8 changes: 6 additions & 2 deletions docs/src/userguide/glossary.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. include:: ../common_links.inc

.. _glossary:

Glossary
Expand Down Expand Up @@ -125,7 +127,7 @@ Glossary
of formats.

| **Related:** :term:`CartoPy` **|** :term:`NumPy`
| **More information:** `Matplotlib <https://scitools.org.uk/cartopy/docs/latest/>`_
| **More information:** `matplotlib`_
|
Metadata
Expand All @@ -143,9 +145,11 @@ Glossary
When Iris loads this format, it also especially recognises and interprets data
encoded according to the :term:`CF Conventions`.

__ `NetCDF4`_

| **Related:** :term:`Fields File (FF) Format`
**|** :term:`GRIB Format` **|** :term:`Post Processing (PP) Format`
| **More information:** `NetCDF-4 Python Git <https://github.com/Unidata/netcdf4-python>`_
| **More information:** `NetCDF-4 Python Git`__
|
NumPy
Expand Down
8 changes: 6 additions & 2 deletions docs/src/whatsnew/2.1.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. include:: ../common_links.inc

v2.1 (06 Jun 2018)
******************

Expand Down Expand Up @@ -67,7 +69,7 @@ Incompatible Changes
as an alternative.

* This release of Iris contains a number of updated metadata translations.
See this
See this
`changelist <https://github.com/SciTools/iris/commit/69597eb3d8501ff16ee3d56aef1f7b8f1c2bb316#diff-1680206bdc5cfaa83e14428f5ba0f848>`_
for further information.

Expand All @@ -84,14 +86,16 @@ Internal
calendar.

* Iris updated its time-handling functionality from the
`netcdf4-python <http://unidata.github.io/netcdf4-python/>`_
`netcdf4-python`__
``netcdftime`` implementation to the standalone module
`cftime <https://github.com/Unidata/cftime>`_.
cftime is entirely compatible with netcdftime, but some issues may
occur where users are constructing their own datetime objects.
In this situation, simply replacing ``netcdftime.datetime`` with
``cftime.datetime`` should be sufficient.

__ `netCDF4`_

* Iris now requires version 2 of Matplotlib, and ``>=1.14`` of NumPy.
Full requirements can be seen in the `requirements <https://github.com/SciTools/iris/>`_
directory of the Iris' the source.
3 changes: 2 additions & 1 deletion docs/src/whatsnew/latest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ This document explains the changes made to Iris for this release
🐛 Bugs Fixed
=============

#. N/A
#. `@trexfeathers`_ and `@pp-mo`_ made Iris' use of the `netCDF4`_ library
thread-safe. (:pull:`5095`)


💣 Incompatible Changes
Expand Down
3 changes: 2 additions & 1 deletion lib/iris/experimental/ugrid/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,8 @@ def load_meshes(uris, var_name=None):

result = {}
for source in valid_sources:
meshes_dict = _meshes_from_cf(CFUGridReader(source))
with CFUGridReader(source) as cf_reader:
meshes_dict = _meshes_from_cf(cf_reader)
meshes = list(meshes_dict.values())
if var_name is not None:
meshes = list(filter(lambda m: m.var_name == var_name, meshes))
Expand Down
26 changes: 23 additions & 3 deletions lib/iris/fileformats/cf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@
import re
import warnings

import netCDF4
import numpy as np
import numpy.ma as ma

from iris.fileformats.netcdf import _thread_safe_nc
import iris.util

#
Expand Down Expand Up @@ -1050,7 +1050,9 @@ def __init__(self, filename, warn=False, monotonic=False):
#: Collection of CF-netCDF variables associated with this netCDF file
self.cf_group = self.CFGroup()

self._dataset = netCDF4.Dataset(self._filename, mode="r")
self._dataset = _thread_safe_nc.DatasetWrapper(
self._filename, mode="r"
)

# Issue load optimisation warning.
if warn and self._dataset.file_format in [
Expand All @@ -1068,6 +1070,19 @@ def __init__(self, filename, warn=False, monotonic=False):
self._build_cf_groups()
self._reset()

def __enter__(self):
# Enable use as a context manager
# N.B. this **guarantees* closure of the file, when the context is exited.
# Note: ideally, the class would not do so much work in the __init__ call, and
# would do all that here, after acquiring necessary permissions/locks.
# But for legacy reasons, we can't do that. So **effectively**, the context
# (in terms of access control) alreday started, when we created the object.
return self

def __exit__(self, exc_type, exc_value, traceback):
# When used as a context-manager, **always** close the file on exit.
self._close()

@property
def filename(self):
"""The file that the CFReader is reading."""
Expand Down Expand Up @@ -1294,10 +1309,15 @@ def _reset(self):
for nc_var_name in self._dataset.variables.keys():
self.cf_group[nc_var_name].cf_attrs_reset()

def __del__(self):
def _close(self):
# Explicitly close dataset to prevent file remaining open.
if self._dataset is not None:
self._dataset.close()
self._dataset = None

def __del__(self):
# Be sure to close dataset when CFReader is destroyed / garbage-collected.
self._close()


def _getncattr(dataset, attr, default=None):
Expand Down
Loading

0 comments on commit 11da71b

Please sign in to comment.