Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple support for selective netcdf loading. #4175

Closed

Conversation

pp-mo
Copy link
Member

@pp-mo pp-mo commented Jun 3, 2021

🚀 Pull Request

Description

An initial heads-up on a really simple way of speeding up netcdf loads ...
... with files of many variables, as in #4134

This is actually much simpler than I imagined, but ..
Blockers to completion

  • probably needs discussion
  • not sure how to test this

Background...

Following #4135, ESMValTool devs are reporting that iris loading is still too slow, where you want only one a whole lot of diagnostics.

This example follows what we did for PP files in similar circumstances.
It's notable that, in creating a iris.fileformats.cf.CFReader, we are still doing a whole-file analysis, that includes the unwanted data-variables.
I actually don't think you can avoid that, as only context will distinguish a CF data-variable from an aux-coord.
However, the cost of this is not huge. I am finding <1sec for the testfile mentioned in #4134 (~250mB, 300 variables of content float[1,100]).

Some sample timings:

Using testfiles with many identical (small) variables:
n-vars : timings without // with fix
1 : 0.04 // 0.01 [loading 1 of N named variables]
10 : 0.14 // 0.02
30 : 0.45 // 0.03
100 : 1.70 // 0.07
300 : 8.19 // 0.40
(314) : 44.17 // 0.61

case (314) is based on the testfile mentioned in #4314
( I suspect it may be slower than the '300' because the variables data is larger?? WIP )
with the code like

cube = iris.load_cube(
    'Iris_multivar_data_file.nc',
    NameConstraint(long_name='Air Surface Temperature'))

Consult Iris pull request check list

@pp-mo pp-mo closed this Jun 3, 2021
@pp-mo
Copy link
Member Author

pp-mo commented Jun 3, 2021

Wrong branch..

@pp-mo pp-mo deleted the nc_ugrid_selective_loading branch March 18, 2022 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant