-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple support for selective netcdf loading. #4176
Conversation
I've stared at this for a while now, and it makes sense. My main question is: what does this look like beyond MVP? There are obviously a lot of limitations and hard-coding in this implementation; how would you make it more generalisable? I.e. multiple constraints, non- You can ignore my question if this is far as we're expecting to take this functionality in the foreseeable future. |
Well, first we need to check that this is addressing the immediate needs of the ESMValTool devs, otherwise it's not worth it. A simple list of constraints acts with an "OR"-like behaviour, so that is mostly a non-starter for a speedup solution. Apart from that, I think in this case we can work out how to do the equivalent of a Constraint(name=string), allowing that we decode a possible STASH attribute on the variable. |
Yup, makes sense not to invest time thinking about the future yet. In that case do you think it would be worth including a Indeed, I'd quite like if we could move |
Ok I don't think there's any point doing that directly, as it isn't a public thing My feeling is that this is (should be) an implementation shortcut guaranteed to have no effect on correctness I.E. the final result is the same in all cases. |
Thanks for inputs @trexfeathers @bjlittle |
Some further practical discussion at ESMValGroup/ESMValCore#1180 In my reply , I tried to explain how the existing solution here is only a quick hack. As-is, it only supports a constraint expression which is a single NameConstraint. Some possibly-useful further cases :
But the approach does not generalise fully -- i.e. some types of expression which "could" be translated (+ therefore speeded up) will remain 'opaque' to this type of analysis and it won't be possible to do everything.
Likewise, in any case where we are given a list of constraints, we need to be able to 'translate' everything on the list, or we can't select on data-variable at all. |
For us, the main use case is to load one variable from a multi-variable file. I think that is really all we need. |
Status reportEMSValTool people have confirmed that they only need the simplest case of this (one named var from one file)
(2) and (3) can be dropped if there is any difficulty or hurry I (@pp-mo) can tackle this in ~2 weeks when I return from holiday, |
8b17dfe
to
5e5a0f4
Compare
02c8b1b
to
58399fd
Compare
I have taken this on whilst @pp-mo is on holiday. Changes I have made:
|
58399fd
to
fa5561b
Compare
Rebased to resolve merge conflict now that #4206 has gone in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbdreyer Awesome! Thanks for helping to nudge this PR across the line 🍻
Just a couple of comments to service, then we're good to go 🚀
for more information, see https://pre-commit.ci
@bjlittle Thanks for your review! I believe I have now resolved the review actions you raised, so could you have a final check? |
* Replace interim 3.0.x whatsnews with single v3.1.rst, with concrete version and date. * Mark as unreleased, fix links. * Rearrange entries tagged 'pre-3.1', and recategorise #4176 change. * Moved bugfix notes from v3.1 to v3.0 whatsnew page. * Fixed broken pull-request refs. * Fix link to PyKE.
🚀 Pull Request
Description
An initial heads-up on a really simple way of speeding up netcdf loads ...
... with files of many variables, as in #4134
This is actually much simpler than I imagined, but ..
Blockers to completion
Background...
Following #4135, ESMValTool devs are reporting that iris loading is still too slow, where you want only one from a whole lot of diagnostics.
This example follows what we did for UM files in similar circumstances.
It's notable that, in creating a
iris.fileformats.cf.CFReader
, we are still doing a whole-file analysis, that includes the unwanted data-variables.I actually don't think you can avoid that, as only context will distinguish a CF data-variable from an aux-coord.
However, the cost of this is not huge. I am finding <1sec for the testfile mentioned in #4134
( which is : ~250mB, ~300 variables of content float[1 * 79 * 143 * 144] )
Some sample timings:
Using testfiles with many identical (small) variables:
n-vars : timings without // with fix
1 : 0.04 // 0.01 [loading 1 of N named variables]
10 : 0.14 // 0.02
30 : 0.45 // 0.03
100 : 1.70 // 0.07
300 : 8.19 // 0.40
(314) : 45.36 // 0.59
case (314) is based on the testfile mentioned in #4134
( I suspect it may be slower than the '300' because the variables data is larger?? WIP )
with the code like
Consult Iris pull request check list