-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with slow open performance with netCDF4 Files #489
Comments
Can you do an ncdump -s -h of the netCDF-4 file, and show the metadata? |
I can. Here is the file that is the one I uncompressed (MERR.nc4 in the above script). |
Since I do not have the necessary authorization. Can you public post an actual |
I'm currently working on a different issue, but I can easily believe that there is an inefficiency somewhere. Using the |
Interesting evaluation! % ./read1000 test.nc4 |
Opening a file with a large metadata is a known problem because, unlike HDF5, |
After a quick look, it appears that attributes are stored in a singly linked list. |
I agree. 4.5.1 would be better, 4.5.0 is overdue. |
Well, it sounds I'm a little late to the party, but I've put a compressed version of the file on Dropbox.
Ah. This makes sense. Although for a bit there, HDF5 opened really fast because I forgot to check the error status (in Fortran). Turns out it'll open very fast if it's just erroring out. But I've confirmed it works. If only I could figure out how to cast the HDF5 file pointer to nf90...best of both worlds!
But it sounds like @DennisHeimbigner might have figured it out. When I get into work tomorrow, I might construct a series of files with varying variables. Luckily I can create a stupid-long file with our climate model and then just remove variables with NCO/CDO. I'd like to know if I can reproduce a linear trend. |
Wrote a quick and dirty tester and...yup: The R2 of the fit didn't export with the image, but it was 0.9871. (https://plot.ly/~Fortran/333) |
Thanks for the verification. |
The virtual attribute _IsNetcdf4 was added in 4.4.1. I wondered if this was the performance problem, because it requires scanning all the file top metadata. I compared Netcdf-C 4.4.0/HDF5 1.8.16 to current versions, 4.4.1.1 with 1.10.1. These timing results were actually run interleaved between versions, to compensate for possible network biases. I grouped by version and copy/pasted into this summary.
This proves that _IsNetcdf4 is not the primary culprit for slow opens, because the attribute was completely absent from netcdf-c 4.4.0. The main impact is presumably just as Dennis described. However there seems to be a smaller time increase between versions here. This might be _IsNetcdf4 or something else. |
If memory serves, _IsNetcdf4. is computed only when asked for. |
That makes sense. Not knowing the internals, I thought it was possible that nc_open might be pre-staging _IsNetcdf4 for some reason. |
@mathomp4, your experiment focused on only nc_open. How much of your model I/O time is consumed in nc_open vs. other file I/O? There are some known serious inefficiencies when the netcdf-4 chunk layout conflicts with read and write access patterns, or when chunk caches are too small. My general experience with netcdf-4 is that performance rivals netcdf-3 when the file structure is reasonably matched with the I/O access pattern. Also in my current model project, 20 to 200 milliseconds is pretty reasonable for a file open that occurs only three times in a 40-minute compute cycle. Your model's file open pattern might be much different, of course. |
@Dave-Allured Well, this was sort of an odd test. We were looking at I/O and noticed this, so we sort of focused on the open. Now, in most cases, the open probably isn't as important as the read, so we are looking at that too. But we were wondering about open/read speeds when NC4 files were unlimited time v. fixed time, compressed v. uncompressed, etc. But the open was dominating! We've also encountered what we think is a bug in Intel MPI when reading compressed NC4 files in our model. It was very weird. We found if we uncompressed, the code worked. Now that we have solved the open mystery, we'll probably look at the read more carefully. Chunking, compression, etc. But all that said, in one way of running our model we have to open approximately 500 files at start up and ~100 every hour. In that version which was run at very high resolution, I/O (mainly I in this case) was ~25% of the entire model run. Thus, looking at I/O strategies now. |
@mathomp4, all things considered, it seems that NC3 format would be a good fit to improve I/O for this particular model. File opens will be much faster as demonstrated here. I think data reads and writes will be similar to NC4, possibly faster for some access patterns. The main thing you give up with NC3 is NC4 data compression. It seems like speed is your main issue, not storage space. Watch out for NC3 file structure issues which could bog down data writes. You should pre-define all variables and attributes in a new NC3 file, before writing any large variables. Or else utilize the NC3 padding capability. Otherwise, adding or expanding any variable or attribute can trigger an internal re-write of the whole file when you are not careful. NC4 is thankfully free of this constraint because of dynamic storage management. Please excuse me if I am explaining stuff you already know. If you start now with netcdf-3 format, you should be able to easily upgrade to netcdf-4 if nc_open is improved in a future release. |
Update: I have been experimenting with the netcdf4 code in So, what can be done. Converting netcdf to fully use lazy metadata read is known |
I have (mostly) split the HDF5 code from the libsrc4 data model code. I now see how lazy reads can work for (almost?) all HDF5 objects. I don't think it will be as hard as I previously estimated. I deduce from @WardF's recent activity that my HDF4 dispatch PR is soon to be merged, so I can present the HDF5/libsrc4 split after that and we can discuss it. After that, it will be an easy step to add lazy reads. Also easy would be to use an array for atts, as we now do for vars, instead of linked lists. |
After some experimentation, my current speculation is that the only thing |
Really? If I have a file with 10000 atts it's slow, but with 10000 vars its not? That seems surprising. |
I am basing my comment off of the example that others gave me. |
Also, based on that limited sample. 1000's of variables may not be common. |
OK, but I do agree with you that lazy read of atts would be a good thing. |
FYI -- I do have many files that have 1000's of variables or more for compilicated models. |
What would be great would be to send me the ncdump -h -s of some of the very worst offenders. (Or attach them here.) Pick your three or four slowest opening files, and give me their metadata. Then I can create some samples and have some real examples to work with. |
I created some programs to:
|
The problem is that the hashmap keeps a copy of the key string. I have considered
I am open to other ideas. |
For your tests, can you tell me what is the largest # of variables within a group. |
@edhartnett, can you make your test program and data files available? |
@DennisHeimbigner |
I don't think I trust your gprof output. Gprof is a sampling based profiler and it looks like there aren't enough samples to give you a good indication of what is happening. I would try using either valgrind with callgrind or perhaps intel vtune or some other profiler that actually counts the true time in a function instead of periodic samples. I also don't see any of the utf8 functions in your profile and they tend to show up when I profile my usages. Perhaps the test program is using the library in a way that bypasses the utf8 conversions; if so, that would explain it. |
@gsjaardema the utf8 functions don't show in the profile table because I am only reading, so no utf8 conversions need to be done. 500K variable - well, you are one for the record books. I will try the valgrind profiler next. @wkliao everything is available in the latest release of HPC NetCDF: https://github.com/NOAA-GSD/netcdf-c/releases. The timing program is nc_test4/tst_attsperf. I did not try classic format files, I will do that for the next round of testing. @DennisHeimbigner let me rerun tests, as Greg suggests, before you mess with the hashes. Another improvement that is in easy reach (but I have not yet done) is to only create the hash for attributes attributes are actually read. That is, instead of creating a hash table for atts for every var, wait to see if the atts are really going to be read for that var. But first I will rerun the profiiling, as Greg suggests, to ensure we're targeting the correct code for performance improvements. |
@edhartnett I'll need a bit more time. I grabbed it this morning, but it looks like 4.6.2.4 of HPC NetCDF triggers the same ocprint build error on my MacBook as does 4.6.1 of the "standard" NetCDF. Time to buckle down and try to figure out what's going on with this... ETA: Followup email I just posted to the list. netCDF's complexity might be beating me... |
Try building with --disable-dap. |
Also @mathomp4 is there a github issue for this ocprint error? If not, there should be. |
@edhartnett Well, technically I build with dap on because the ultimate use for this netCDF (the GEOS model and its users) has a need for OpenDAP-enabled netcdf. But for testing... And I did have a Github issue but as I read the README.md, my reading seemed to me to say that it was more of a mailing list issue. But I can add one. ETA: Or, should I re-open the issue I had before? It's about a month old now, so maybe a new issue is better to get it fresh? |
Reported as #1095. |
As an aside, it seems to me that for Greg's example of a single flat group with |
|
I think the gprof input I would like to see is the profile for a single read |
@DennisHeimbigner I didn't fully explain the lazy vars. Initially I thought that I would have to learn all about all the vars in a group if the user asked for one. But that is not how the implementatnion works out. Instead, netcdf-4 still iterates through all of the vars in the file on open, but it does not learn all about the vars. (For example, it does not learn their type, or fill value, and some other stuff). This allows the file to open about twice as fast. When you ask about a specific variable (for example, what is it's type?) then the library will learn all about that variable, but does not have to learn about all the other variables in the group at that time. So Greg's file should open about twice as fast with lazy vars. |
I see. Sounds good. |
@mathomp4 can you try your original file tests with the latest master? Fast global att reads and lazy att reads are working in master, so that might address many of your performance issues. I am still working on lazy var reads, but they will not be in the upcoming 4.6.2 release. |
@edhartnett I'll give it a try as soon as I can. I'll be at SC18 the next week, so it might be a while (unless the tutorial/workshop/BoF gets really boring!). |
I think with PR #1262 I have reached the end of the effort to speed up file opens. I can't really think of anything else to do to make them faster. ;-) I will be doing some graphs of the final results, I will post them here before closing this issue. Those who have been suffering from slow opens are urged to try master after #1262 is merged. |
Here's another chart showing the improvement in file open times. The file used is a real-world file from the WRF chemistry module. It contains over 360 variables. The chart shows the time needed to open and close the file for different netCDF versions, and with current master + lazy var reads, and current master with both lazy var reads and use of hidden COORDINATES att. As we see, there is not much difference between versions until 4.6.2. That version included lazy attribute reading, and fast global att reads, and is faster. Doing lazy read of some var data (PR #1260) improves the open time. Adding use of the hidden coordinates atts (PR #1262) results in a large speedup. The end result is that the time to open this real-world file is reduced to less than 1/3rd of what it was in 4.6.0/4.6.1. |
@edhartnett Is this the "Coordinates" attribute listed in the netCDF docs here: https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html I looked at #1262 and I wasn't too sure how I'd enable this in my code. Or is there an example writing a file with COORDINATES in the netCDF source code? |
No, the coordinates attribute is a hidden attribute (called "_Netcdf4Coordinates"). You can see it with h5dump, but not with ncdump, because netCDF hides it. It is not the same as the "Coordinates" attribute you mention. Once #1262 is merged, there is no need to turn this speedup on, it will be available for all newly created netCDF-4 files. You don't have to write the _NetCDF4Coordinates attribute, the library will do it automatically. The presence of _NetCDF4Coordinates will also be detected automatically when reading a file, and it will be used to speed file opens. If it is not present, then files are opened in the usual (slower) way. |
👍 I like speedups/fixes that I don't have to do anything to get! Nice! |
BTW where does the merr sample file come from? That is, what real-world model or application? |
@edhartnett Looking back in time, it seems to be output from MERRA-2, a reanalysis generated by GMAO where I work. Looking at the metadata, it's the tavg1_2d_lnd_Nx aka M2T1NXLND collection on the date 2010-01-05: https://disc.gsfc.nasa.gov/datasets/M2T1NXLND_V5.12.4/summary It's 2D land-surface time-averaged output. It's one of the outputs with the larger number of variables: https://gmao.gsfc.nasa.gov/pubs/docs/Bosilovich785.pdf which is probably why I looked at it. Though it might have been that I had it at hand when this question/issue came up! |
Environment Information
configure
)C
code to recreate the issue?Summary of Issue
I work on a climate model and I/O is slowly becoming a major bottleneck. By default we try to use nc4 files as our main format (a few Fortran binaries remain, but we are trying to rid ourselves of that). We recently did an experiment to see if our slow performance might be due to the fact that many files we write are written with an unlimited time dimension, so when we read them, they are unlimited. So, we fixed the time dimension, no change. We also compress, chunk, etc. the files but still no big changes. Finally, a colleague suggested converting to netCDF Classic, and, hooboy. Muuuuuuch faster.
Eventually I went whole hog. I took our NC4 file, decompressed it, converted it using
nccopy -k ncX MERR.nc4 MERR.ncX
to nc3, nc5, nc6, nc7 and hacked together a simple program to test it out which you can see here:https://github.com/mathomp4/netcdfopentester/blob/master/testCNetcdf.c
All it does is open and close a file 1000 times (might be overkill). If I run this on my test files, I get the following output:
I added what I think is the correct HDF5 code to open and close a file, but I mainly program netCDF in Fortran and Python, so even this program is a huge success for me! (Also: why it's ugly.) It seems like a netCDF open and an HDF5 open are ~2 orders of magnitude in speed difference. Is there a way to tune netCDF to open like HDF5? Or do you pay for HDF5's quick opening later?
Or, assuming I might have got the HDF5 wrong, why does netcdf Classic open so much faster as well?
Note: The versions of the libraries I'm using are netCDF-C 4.4.1.1 and HDF5 1.8.19. I compiled with Intel 17.0.4 compilers (backed up by GCC 5.3.0). The file is one from the MERRA2 data that can be retrieved (some basic instructions in my README in boring repo mentioned above).
The text was updated successfully, but these errors were encountered: