-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeWarning: invalid value encountered in equal #391
Comments
Can you post the file somewhere we can grab it? |
Also, what is the warning message? |
Never mind - I see the warning message is in the subject line. |
Yes, there are NaNs - they are essentially used to depict missing data (land pixels). I have put the file on dropbox at https://www.dropbox.com/s/i51d1xzp6ng2bb3/sst_geo-polar-blended_night_5km_2015001.hdf?dl=0 Note that it is large - about 230MB. I'll try the byte-swapping. |
The data is marked in the file as big endian, but it is apparently little endian. Byteswapping seems to fix it. |
Thanks! I've just about come to the same conclusion looking at the byteswapped data and dumping some charts to look over regions. Is this check a recent change in the HDF4 C lib or netCDF4? I'm curious as to why my code has read these files previously with no problem. But I am rapt that the issue has been identified! |
from
|
👏 Thanks again. I'll look into the dataset and contact the authors. |
Up until recently, netcdf4-python was ignoring the endian-ness attribute and just treating data as if it were the native endian-ness of whatever machine you read the data on. Issue #346 fixed this. |
I've been doing a bit more investigation on this. I believe the providers of the file referred to above use IDL and a colleague provided some IDL which I have extended a bit into: pro createHDF f = HDF_SD_START('hdf_test.hdf', /create) ds = HDF_SD_CREATE(f, 'data', [5,5], /dfnt_int16) arr = INDGEN(5,5) HDF_SD_ADDDATA, ds, arr HDF_SD_ENDACCESS, ds HDF_SD_END, f end Interestingly, these seem to map fairly directly to the HDF C library and looking at the IDL documentation for HDF_SD_CREATE, there is no option to write little-endian. I even tried using
netcdf hdf_test { dimensions: fakeDim0 = 5 ; fakeDim1 = 5 ; variables: short data(fakeDim0, fakeDim1) ; data:_Storage = "contiguous" ; data:_Endianness = "big" ; // global attributes: :_Format = "netCDF-4" ; data: data = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 ; } To read the data in C, I wrote: #include <stdio.h> #include "mfhdf.h" #define DIM1 5 #define DIM0 5 #define FILENAME "hdf_test.hdf" int main() { int32 sd_id, sds_id, istat, sd_index; int32 dims[2], start[2], edges[2], rank; int16 array_data[DIM0][DIM1]; intn i, j; // status = Hgetlibversion(&major_v, &minor_v, &release, string); start[0] = 0; start[1] = 0; edges[0] = DIM1; edges[1] = DIM0; sd_id = SDstart(FILENAME, DFACC_READ); if (sd_id != FAIL) printf ("Opened %s\n", FILENAME); sd_index = 0; sds_id = SDselect(sd_id, sd_index); istat = SDreaddata(sds_id, start, NULL, edges, (VOIDP)array_data); printf ("\nData: \n"); for (j = 0; j < DIM0; j++) { for (i = 0; i < DIM1; i++) printf (" %d ", array_data[i][j]); printf ("\n"); } printf ("\n"); /* Terminate access to the array. */ istat = SDendaccess(sds_id); /* Terminate access to the SD interface and close the file. */ istat = SDend(sd_id); if (istat != FAIL) printf("closed\n"); return 0; } which outputs Opened hdf_test.hdf Data: 0 5 10 15 20 1 6 11 16 21 2 7 12 17 22 3 8 13 18 23 4 9 14 19 24 closed I then tried this in python with the following: import Cython import numpy as np import netCDF4 if __name__ == '__main__': print 'Cython version ' + Cython.__version__ + ' installed' print 'netCDF4 version ' + netCDF4.__version__ + ' installed' file = 'hdf_test.hdf' ds = netCDF4.Dataset(file) data = ds.variables['data'][:] print 'RAW' print data print 'Byte-swapped' print data.byteswap() ds.close() and got: Cython version 0.22 installed netCDF4 version 1.1.8 installed RAW [[ 0 256 512 768 1024] [1280 1536 1792 2048 2304] [2560 2816 3072 3328 3584] [3840 4096 4352 4608 4864] [5120 5376 5632 5888 6144]] Byte-swapped [[ 0 1 2 3 4] [ 5 6 7 8 9] [10 11 12 13 14] [15 16 17 18 19] [20 21 22 23 24]] So unfortunately this means that the netcdf4-python output is at odds with what C code and ncdump report. If anyone wants to further clarify or point out any errors in the above code, I'd be keen to hear. |
Tim - that's very interesting. Looks like there may be some hidden byteswapping going on somewhere in the python interface. |
I presume you ran that C program on a little endian machine? If so I guess that means that the HDF4 library is doing the byte swapping for you, since the data is big endian in the file. The python interface is assuming the byte-swapping has not been done, and is merrily swapping the byte order again. I wonder if this is a difference between HDF4 and HDF5 - in HDF5 the data is not automatically byte-swapped if the native type is different that the type in the file. Apparently HDF4 does automatically swap the bytes. I suppose could add logic in the python code to not byteswap if the underlying data format is HDF4. |
Yes. I ran all code on a Mac laptop with OS X 10.9 so little-endian. I have put together some further C code that produces both a big-endian and a little-endian HDF4. The HDF utils report both with data as the same 5x5 array so byte swapping occurs as the data is written to the big-endian file. This is consistent with what the HDF4 User Guide states. Unfortunately ncdump and the Python library choke on the little-endian HDF4 file. I'll gather all details and post in a seperate issue. |
I think this (and the new issue you just created) are related to a bug in the netCDF C library. I've created a ticket here: |
A couple of bugs were recently fixed in the netcdf-c lib that should take care of this. Closing now... |
I think there is still a byte-swap occurring that doesn't need to. Using the latest netcdf-c $ ncdump hdf_bigendian.hdf netcdf hdf_bigendian { dimensions: fakeDim0 = 5 ; fakeDim1 = 5 ; variables: short data(fakeDim0, fakeDim1) ; data: data = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 ; } However with the current netcdf-python $ python Python 2.7.9 (default, Jan 7 2015, 11:50:42) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from netCDF4 import Dataset >>> ds = Dataset('hdf_bigendian.hdf') >>> foo = ds.variables['data'][:] >>> foo array([[ 0, 256, 512, 768, 1024], [1280, 1536, 1792, 2048, 2304], [2560, 2816, 3072, 3328, 3584], [3840, 4096, 4352, 4608, 4864], [5120, 5376, 5632, 5888, 6144]], dtype=int16) The example hdf4 file I am using is here |
The fact that ncdump is printing the byte-swapped data indicates that the byte-swapping is being done on the fly in the hdf4 library. If you create a big-endian variable in a netcdf file, ncdump does not print byte-swapped data, which confirms that the netcdf lib does not automatically do the swapping. I could check to see if the underlying data format is not hdf4 before attempting to byte-swap - but lacking any more information on what we should expect from the C libs I'm not sure that would be the right solution. |
pull request #426 is a hack to workaround this. Basically, it assumes the HDF4 lib always returns the data in the native endian format. Is this always true? |
http://www.hdfgroup.org/training/HDFtraining/UsersGuide/Fundmtls.fm3.html seems to indicate the byte-swapping to native endian is always done in HDF4. |
Yes, that's my understanding - that if the data endianness is different to the native os endianness, then the HDF4 lib will byte-swap for the user. HDF5 I believe never byte-swaps - it returns the raw data. |
OK, will wait a day or two and merge pull request #426 then. |
fix for issue #391 (big endian HDF4 data read incorrectly on little endian machine)
I've built a new NetCDF4 module from the current
master
branch library on OS X 10.9.5Opening an HDF4 file I see the above RuntimeWarning - the first time I've seen this, having used NetCDF4 pretty often on files from the this dataset. The code that produces the warning is pretty simple:
My first reaction was possibly a corrupted file but tools like
hdfls
andhdp
seem to have no problem investigating the file.Interestingly,
ds.variables['sst_analysis'].set_auto_maskandscale(False)
stops the RuntimeWarning but I still get screwy data. As the data is sea surface temps in Celsius, I should get between -2C and 35C but instead I get positive/negative values close to zero.On building from master, all tests passed and I built with:
HDF5 1.8.14
Netcdf 4.3.3.1
Numpy 1.9.1
Cython 0.22
Clang 6.0 (clang-600.0.57)
I'd appreciate any thoughts/input on this.
The text was updated successfully, but these errors were encountered: