Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing very slow in 4.7.4 (reported) #1757

Open
WardF opened this issue Jun 11, 2020 · 11 comments
Open

Slicing very slow in 4.7.4 (reported) #1757

WardF opened this issue Jun 11, 2020 · 11 comments
Assignees
Milestone

Comments

@WardF
Copy link
Member

WardF commented Jun 11, 2020

Mirroring an issue from the python interface, which hooks into the underlying C library.

In a nutshell, array slicing in 4.7.4 has been reported as being orders of magnitude slower than in 4.7.3. Creating an issue here in the C library to make it easier to keep track of.

@WardF WardF added this to the 4.8.0 milestone Jun 11, 2020
@WardF WardF self-assigned this Jun 11, 2020
@jswhit
Copy link

jswhit commented Jun 14, 2020

I have confirmed that the simple python test program in Unidata/netcdf4-python#1018 runs two orders of magnitude slower (using netcdf4-python master and hdf5 1.10.6) when netcdf-c 4.7.4 is linked, as opposed to netcdf-4.7.3. Should be fairly easy to create a simple C program to demonstrate this.

@jswhit
Copy link

jswhit commented Jun 14, 2020

simple C program to test (using this file):

#include <stdio.h>
#include <string.h>
#include <netcdf.h>
#define FILE_NAME "woa18_A5B7_s16_04.nc"
#define NDIMS 4
#define NLAT 100
#define NLON 100
#define NLVL 102
#define VAR_NAME "s_an"
#define ERR(e) {printf("Error: %s\n", nc_strerror(e)); return 2;}
int
main()
{
   int ncid, varid, lat, lon, retval;
   size_t start[NDIMS], count[NDIMS];
   float data_out[NLVL];
   if ((retval = nc_open(FILE_NAME, NC_NOWRITE, &ncid)))
      ERR(retval);
   if ((retval = nc_inq_varid(ncid, VAR_NAME, &varid)))
      ERR(retval);
   count[0] = 1;
   count[1] = NLVL;
   count[2] = 1;
   count[3] = 1;
   start[0] = 0;
   start[1] = 0;
   start[2] = 0;
   start[3] = 0;
   for (lat = 0; lat < NLAT; lat++)
       for (lon = 0; lon < NLON; lon++)
   {
      start[2] = lat;
      start[3] = lon;
      if ((retval = nc_get_vara_float(ncid, varid, start,
                                      count, &data_out[0])))
         ERR(retval);
   }
   if ((retval = nc_close(ncid)))
      ERR(retval);
   return 0;
}

with netcdf-c 4.7.3:

% time ./a.out
1.119u 0.096s 0:01.81 66.2%	0+0k 0+0io 875pf+0w

with netcdf-c 4.7.4, I killed it after waiting 10 minutes.

@DennisHeimbigner
Copy link
Collaborator

gprof output of non-zero percent time:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
45.45 0.05 0.05 10000 5.00 8.96 NC4_get_vars
27.27 0.08 0.03 20252 1.48 1.48 nclistget
9.09 0.09 0.01 10000 1.00 1.00 check_for_vara
9.09 0.10 0.01 main
9.09 0.11 0.01 nc4_find_nc4_grp

@WardF
Copy link
Member Author

WardF commented Jun 15, 2020

Thanks @jswhit, taking a look at this now to see if I can sort out what happened, and when.

@WardF
Copy link
Member Author

WardF commented Jun 15, 2020

I've got a test script set up that should give me an avenue for using git bisect to figure out where the change occured.

@DennisHeimbigner
Copy link
Collaborator

Good. I did a diff comparison of 4.7.3 and 4.7.4, but could not
see any obvious problem, so bisect is our only hope.

@edwardhartnett
Copy link
Contributor

I believe that you (Dennis) completely replaced the vara/vars code. Wasn't that between the 4.7.3 and 4.7.4 release? Or was that already in 4.7.3?

@WardF
Copy link
Member Author

WardF commented Jun 15, 2020

Ok, I've found the problem. In commit a4465da9 we see a change in the cache sizes as defined in the cmake CMakeLists.txt file. I presume if I change the tests to use the autotools-based build system with the defaults, we'd see the same behavior (I'll check this).

The variables being set specifically are CHUNK_CACHE_SIZE and CHUNK_CACHE_NELEMS. The values before this commit were 4194304 and 1009 respectively.

The test/result matrix is as follows.

Status CHUNK_CACHE_SIZE CHUNK_CACHE_NELEMS
Fast 4194304 1009
Slow 16777216 4133
Slow 16777216 1009
Fast 4194304 4133

Clearly, the increased CHUNK_CACHE_SIZE is causing the slowdown. I vaguely remember the discussion around this PR, and I can go remind myself; @edwardhartnett, any thoughts on this?

@jswhit in the meantime, if the netCDF-C library (v4.7.4) sets the chunk_cache_size and chunk_cache_nelems to the values from v4.7.3 at configure time, we should see this issue go away.

@edwardhartnett
Copy link
Contributor

Well crap! Who would have thought?

And I did some performance testing, so perhaps this speeds performance in some cases but slows it in other cases.

OK firstly we can check this by having @jswhit build and try it and set these two values at configure time. The can be set with --with-chunk-cache-size= and --with-chunk-cache-nelems=. So try the original two numbers and see if this solves the problem.

I will ask the HDF5 support group about this, and perhaps they can provide some guidance.

Also I kind of wonder whether using even smaller numbers might improve performance even more?

@DennisHeimbigner
Copy link
Collaborator

Wow- this is really counter-intuitive. Normally one expects larger
caches to be faster. Does anyone have any insight into why the
larger cache is slower?

@WardF
Copy link
Member Author

WardF commented Jun 22, 2020

We're now seeing multiple reports; I expect we will be reverting the related changes to restore the old behavior, while we figure out why this is occurring and how to best take a second run at it.

@WardF WardF modified the milestones: 4.8.0, 4.8.2 Aug 30, 2021
@WardF WardF modified the milestones: 4.8.2, 4.9.1 Jun 15, 2022
@WardF WardF modified the milestones: 4.9.1, 4.9.2 Feb 13, 2023
@WardF WardF modified the milestones: 4.9.2, 4.9.3 May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants