-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow opening a GRIB file with 100.000 messages #142
Comments
@matteodefelice expected no, but the file is rather unusual with a super tiny spatial extent (21, 9) and a huge number of time steps (113957). Every time step is in a different GRIB message and apparently ecCodes takes quite some time in decoding:
Then cfgrib performs a few operations per message via a slow C-interface. So yes, cfgrib is not optimised to handle files with a huge number of fields even if they are small. Supporting a similar use case will need some work. |
Thanks for the quick reply. Actually, after converting the data in NetCDF everything was rather quick. Good to know, thanks! |
I'll leave the issue open for reference and to keep track of the enhancement request! |
I just tried that file with various timings on my Linux machine with the file stored on my local disk (/tmp). Using ecCodes 2.17.0 (GNU/7.3.0) Note: grib_ls by default shows the shortName key. Also with a very simple Python script which loops through each message printing the message number (count) and step Perhaps you have very slow disk access. |
@shahramn the default install of cfgrib uses the internal bindings (still) in ABI mode so every call to an ecCodes function gets a the CFFI overhead. So most probably the performance issue with many messages is more on the cfgrib side than on ecCodes. |
|
@Plantain an undocumented feature is to set the |
Ciao @alexamici, are you still using the internal bindings in the 0.9.9 or is it something we might expect for the 1.0? |
@matteodefelice in fact Unfortunately just today I tried again your example and it takes still several minutes to open. But at least cfgrib now opens it :) I'll do a new release soon. |
You are right, it opens it - but once converted to NetCDF it loads it in less than a second. I know that most of the users use large grids rather than long time-series like data but however, the user at least should be warned about this limitation. Don't you think? |
@aurghs and myself are on a performance optimisation spree and we identified and fixed a number of bottlenecks. At the moment I'm tackling the issues with your file that is in fact the worst of the worst cases as I found a couple of code sections that scale quadratically in the number of values in a dimension. One fix is already in |
@matteodefelice and that was faster than I expected. I'll do a proper release soon. |
This is actually fixed in @matteodefelice thanks for reporting the issue with the very best sample file and sorry that it took a bit. Also, note that this work was sponsored by the CDS. |
I have download an ERA5 hourly file of ~68 MB (you can download it here).
If I use the following command to open it:
Python is not able to open it, after 10 minutes is still loading and then I kill the process.
This happens on Jupyter with Python 3.7.6, xarray 0.15.1 and cfgrib 0.9.8.2.
Is this behaviour expected?
The text was updated successfully, but these errors were encountered: