Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add built-in support for complex numbers #2650

Closed
ZedThree opened this issue Mar 6, 2023 · 5 comments
Closed

Add built-in support for complex numbers #2650

ZedThree opened this issue Mar 6, 2023 · 5 comments
Assignees
Milestone

Comments

@ZedThree
Copy link
Contributor

ZedThree commented Mar 6, 2023

Some HDF5 wrappers, like h5py and HDF5.jl, write complex numbers using a transient datatype stored in the variable itself. This was discussed some years ago in cf-convention/cf-conventions#267, but that seems to be more about enums specifically.

Complex numbers are used in lots of domains, and while there is a lack of true consensus on how to store them (see cf-convention/discuss#369 for example), h5py and HDF5.jl both do the same thing: an HDF5 compound type with r and i members. Having complex numbers also Just Work in netCDF would be of great help to lots of scientists, and enable greater compatibility between files created by different tools.

Here's a concrete example of a file created using h5py:

import numpy as np
import h5py
with h5py.File("test.h5", "w") as f:
    f["x"] = np.ones(1, dtype=complex)
$ h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
   DATASET "x" {
      DATATYPE  H5T_COMPOUND {
         H5T_IEEE_F64LE "r";
         H5T_IEEE_F64LE "i";
      }
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): {
            1,
            0
         }
      }
   }
}
}

The example in the Python netCDF4 docs on how to create compound types uses complex numbers, and reading that file back in with h5py or h5netcdf, the conversion to complex is automatic.

C (in C99), C++, and Fortran all natively support complex numbers (and also all use the same representation as an array of two elements), so having built-in support for them in netCDF would be very useful.

This is probably the most relevant comment for implementation from the previous discussion: #267 (comment)

So, I'm proposing two things:

  1. add support for transient datatypes stored only in the variable
  2. add NC_COMPLEX built-in external type
@edwardhartnett
Copy link
Contributor

edwardhartnett commented Mar 6, 2023

Support for transient data types is a worthy goal but should be done first, as a separate issue, before tackling the complex number challenge, which seems far greater.

I think adding support to netcdf-c for complex data types is a great idea for the following reasons:

  • C and Fortran both support complex types natively.
  • Complex numbers have many uses in science and engineering, including in the current netCDF user base. For example we store complex numbers in some NOAA models.
  • Reading/writing complex arrays as a defined type will be much faster than reading the real/imaginary parts as two different variables, so this will speed up science codes.

However, there's a lot of work here. There is more than one complex type! C90 defines three:

The C programming language, as of C99, supports complex number math with the three built-in types double _Complex, float _Complex, and long double _Complex (see _Complex). When the header <complex. h> is included, the three complex number types are also accessible as double complex, float complex, long double complex.

For Fortran we have a slightly different situation:

Complex numbers in Fortran are stored in rectangular coordinates as a pair of real numbers (real part first, followed by the imaginary part). The default kind of complex numbers is always the same as the default kind of the real data type. So a complex variable of kind 4 will contain two reals of kind 4.

Unfortunately the long double type in C is not very standard: https://en.wikipedia.org/wiki/Long_double.

So if we just support complex float and complex double, that will probably handle most user needs. Call them NC_COMPLEX_FLOAT and NC_COMPLEX_DOUBLE.

The API would then need to add new put/get_var functions for all the different flavors of access we support, var, vara, var1, vars. (Not varm, which is deprecated.) However, all these functions call one master get/put function which does the work. Those function will need to be modified to handle two new complex types.

The type conversion function in netcdf-4 must be modified to allow conversion between the two complex types. However, conversion to/from any other type would generate an error.

The good news is that the dispatch table would not have to change. That make life easier and proves the value of the dispatch table.

Finally once that is all done and tested, corresponding wrapper functions need to be added in netcdf-fortran.

So a lot of work, but a very worthy goal which would be a great addition to netCDF.

@ZedThree
Copy link
Contributor Author

ZedThree commented Mar 6, 2023

Yes, you're entirely right about the different sized types, I'd glossed over that 🙂
NC_COMPLEX_FLOAT/DOUBLE should be sufficient, I don't think I've ever seen anyone use long double or quad-precision in the wild. Half-precision is becoming quite popular, but I think they're still mostly confined to compiler extensions at the minute.

Fortran single and double precision kinds map onto float and double for every compiler I've used, and there are specific c_float/double kinds that are guaranteed to map onto them.

Should I make a separate issue for transient types?

@edwardhartnett
Copy link
Contributor

I would suggest a separate issue for transient types, and that it be handled first, before complex types are introduced.

@WardF
Copy link
Member

WardF commented Mar 6, 2023

I can see how adding built-in support for complex numbers would be helpful. In addition to the comprehensive outline provided by @edwardhartnett (thanks Ed!), there are a few higher-level logistical concerns that spring to mind:

  1. NetCDF Java: We need to make sure that any functionality that goes into the core library has corresponding functionality added to other implementations in a reasonable time frame. This is easier for us to commit to with the fortran and c++ interface libraries, but the netcdf-java team would need to be involved, to ensure that this is a feature they're able to implement.

  2. Data Model concerns: adding a new supported data type would mean either modifying the existing netCDF4 data model, probably introducing a versioning mechanism at the same to help avoid confusion about 'which netCDF4 data model' a particular netCDF data set was using. This would require some thought to avoid introducing issues/confusion, moving forward.

  3. Tying this to libhdf5. As we continue our move towards introducing additional storage options (as we did with ncZarr), adding functionality that is not tied to a dependency will ideally make it easier to avoid maintaining multiple implementations in order to accommodate different formats. We've already tied some ncZarr functionality to libhdf5, but we still need to remain mindful of the implications moving forward, as we will certainly be introducing additional storage options. Given that we already have a mechanism by which we can represent complex numbers, without modifying the data model or impacting netCDF Java, we'll need to consider our approach.

These aren't necessarily deal breakers, although they mean we won't be adding this in the short term. They are just the sort of big picture issues we have to consider before adding the functionality. In the meantime, a possible solution may exist by adding a function in netcdf4-python. This function would use existing functionality to represent complex numbers, via a simplified API. The resultant data files would be readable by the C, Java, etc libraries, although interpreting the values would require intentional effort on the part of the reader. It might be worth having a discussion over at at the netcdf4-python project page, as well as on the Unidata netcdf mailing list, to get a sense of how big a priority this would be for the community.

@DennisHeimbigner
Copy link
Collaborator

To extend Ward's remarks. It is easy to add a complex type to a netcdf-4 file by creating a
user-defined type something like this:

compound NC_Complex_Float { float r; float i;}

@Unidata Unidata locked and limited conversation to collaborators Mar 6, 2023
@WardF WardF converted this issue into discussion #2652 Mar 6, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants