-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What makes the NAME loader faster than the NetCDF loader? #5053
Comments
Just to update... We had a theory that "breaking" the coordinate metadata could speed this up, and it really did. Pre-processing: rename each variable's netCDF4 examplewith nc.Dataset(filename, "a") as file:
for var in file.variables.values():
if "coordinates" in var.ncattrs():
var.renameAttribute("coordinates", "fake_coordinates")
file.sync() Post-processing: identify the "cubes" that were supposed to be aux coords, and add them to the relevant cubes. This can be done by overriding the usual netcdf loaders in Incomplete exampledef load_cubes(filenames, callback=None, constraints=None):
# Identify coords
cubes = []
unknown = []
coord_names = set()
for cube in netcdf.load_cubes(filenames, callback, constraints):
fake_coords = cube.attributes.get("fake_coordinates", "").split()
if fake_coords:
# Definitely a cube
cubes.append(cube)
coord_names.update(fake_coords)
else:
# Might be a cube or a coord
unknown.append(cube)
# Create the coords
coords = {}
for cube in unknown:
if cube.name() in coord_names:
coord = AuxCoord(cube.data)
coord.metadata = cube.metadata
coords[coord.var_name] = coord
else:
cubes.append(cube)
# Add coords to cubes
for cube in cubes:
fake_coords = cube.attributes.pop("fake_coordinates", "").split()
for name in fake_coords:
# TODO: Assumes scalar...
cube.add_aux_coord(coords[name])
yield cube This doesn't cover everything, such as bounds or non-scalar aux coords, but it did bring the load time down to ~3 seconds. The re-adding coordinates part is a tiny proportion of that so I don't imagine a full solution would take much longer either. However... in all |
#5069 shows a really basic test where we bypass lazy array creation in _get_cf_var_data, if the array is "small". Like this...
Of course, #3172 would be a solution, but sadly that still looks a long way off. This sounds like a possible win, as-is ?? How do we choose a size threshold ? |
Closed by #5229 |
📰 Custom Issue
@bsherratt has provided an interesting case. A Name III
.txt
file describing:The file takes
~20s
to load. If Iris is used to convert the file to NetCDF, that NetCDF file takes~105s
to load. Both loaders are unreasonably slow in this example, since the grid is shared between all phenomena, yet theCoord
instances are created on aCube
-by-Cube
basis (see #3172 for discussion on sharing). But why is NetCDF even slower?File available on request.
The text was updated successfully, but these errors were encountered: