Simple support for selective netcdf loading. #4175
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 Pull Request
Description
An initial heads-up on a really simple way of speeding up netcdf loads ...
... with files of many variables, as in #4134
This is actually much simpler than I imagined, but ..
Blockers to completion
Background...
Following #4135, ESMValTool devs are reporting that iris loading is still too slow, where you want only one a whole lot of diagnostics.
This example follows what we did for PP files in similar circumstances.
It's notable that, in creating a
iris.fileformats.cf.CFReader
, we are still doing a whole-file analysis, that includes the unwanted data-variables.I actually don't think you can avoid that, as only context will distinguish a CF data-variable from an aux-coord.
However, the cost of this is not huge. I am finding <1sec for the testfile mentioned in #4134 (~250mB, 300 variables of content float[1,100]).
Some sample timings:
Using testfiles with many identical (small) variables:
n-vars : timings without // with fix
1 : 0.04 // 0.01 [loading 1 of N named variables]
10 : 0.14 // 0.02
30 : 0.45 // 0.03
100 : 1.70 // 0.07
300 : 8.19 // 0.40
(314) : 44.17 // 0.61
case (314) is based on the testfile mentioned in #4314
( I suspect it may be slower than the '300' because the variables data is larger?? WIP )
with the code like
Consult Iris pull request check list