-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load is VERY slow for a NetCDF multi-variable file #4134
Comments
@senesis Thanks for taking the time to report this, much appreciated. Could you just confirm the version of |
Yes it is v3.0.1 |
@senesis Great thanks. And which version of Python? v3.8? |
@senesis We have a patch in the pipeline that will go towards significantly alleviating this issue. We're going to target this for the forthcoming See the v3.0.x Release Discussion for further details. |
@senesis great issue -- your account is excellent + should make the problem reproducible ! I was already looking into issues with slow netcdf loads, specifically with lots of variables (i.e. we also found something similar). In the meantime though, I wonder if you could put up your 'load_time_histmth' notebook so I can test out with that, maybe in a Gist ?? |
Stop press: see #4135 |
I am not familiar with Gist. The notebook is available here |
Great , I am looking forward to get 3.0.2 (and that ESMValTool uses it) |
@senesis GitHub gists are a fantastic way to easily share snippets of code and notebooks with your peers. Checkout the GitHub document for further details 👍 |
Well I tried the notebook, but I'm not sure if it delivers any more info really, as I am just using it with the same 'Iris_multivar_data_file.nc' file you mentioned above, which may not be the same. |
I actually used the same file in my notebook run. |
Closed by #4158 |
I just realised, I'm not sure if you people are aware of the potential impact of #4572 ? I believe we did discuss the issues raised by this in #3333.
I believe that the detail of what seems like a useful API and feature-set really needs some trials, to examine specific practical cases. |
From the API doc , I do not understand how this feature could be used when the use case is 'just speed up loading a single variable from a multi-variables file, whatever the dimensions set and sizes' |
Apologies, I think you are right -- it doesn't have much relevance to this case after all. I think I got my wires crossed here -- I was looking for an ESMValTool-related loading issue issue I thought I remembered, where chunking definitely was an issue. But this isn't it ! |
No,sorry. |
📰 Custom Issue
When loading a single variable from a quite small NetCDF file which includes 300 variables, the load time is very large : around 100 seconds (while it is less than 0.1s for a similar single variable file).
This is a bottleneck for trying to use Iris (trough ESMValTool) for handling some climate model native data format.
The attached notebook load_time_histmth.pdf demonstrates the issue and includes a profiling, which shows that the most time consuming function is (by large) NetCDFDataProxy.getitem
The data file is available here
System info is :
The text was updated successfully, but these errors were encountered: