Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complex number support #558

Merged
merged 3 commits into from
Aug 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions doc/hdf5.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ write(g, "mydataset", rand(3,5))
Passing parameters
------------------------

It is often required to pass parameters to specific routines, which are collected
It is often required to pass parameters to specific routines, which are collected
in so-called property lists in HDF5. There are different property lists for
different tasks, e.g. for the access/creation of files, datasets, groups.
In this high level framework multiple parameters can be simply applied by
Expand All @@ -114,7 +114,7 @@ g["A", "chunk", (5,5)] = A # add chunks

B=h5read(fn,"mygroup/B", # two parameters
"fapl_mpio", (ccomm,cinfo), # if parameter requires multiple args use tuples
"dxpl_mpio", HDF5.H5FD_MPIO_COLLECTIVE )
"dxpl_mpio", HDF5.H5FD_MPIO_COLLECTIVE )
```

This will automatically create the correct property lists, add the properties,
Expand Down Expand Up @@ -198,7 +198,11 @@ This is in contrast to standard HDF5 datasets, where closing the file prevents f
Supported data types
--------------------

PlainHDF5File knows how to store values of the following types: signed and unsigned integers of 8, 16, 32, and 64 bits, `Float32` and `Float64`; `Array`s of these numeric types; `ASCIIString` and `UTF8String`; and `Array`s of these two string types. `Array`s of strings are supported using HDF5's variable-length-strings facility.
`HDF5.jl` knows how to store values of the following types: signed and unsigned integers of 8, 16, 32, and 64 bits, `Float32`, `Float64`; `Complex` versions of these numeric types; `Array`s of these numeric types (including complex versions); `ASCIIString` and `UTF8String`; and `Array`s of these two string types.
`Array`s of strings are supported using HDF5's variable-length-strings facility.
By default `Complex` numbers are stored as compound types with `r` and `i` fields following the `h5py` convention.
When reading data, compound types with matching field names will be loaded as the corresponding `Complex` julia type.
These field names are configurable with the `HDF5.set_complex_field_names(real::AbstractString, imag::AbstractString)` function and complex support can be completely enabled/disabled with `HDF5.enable/disable_complex_support()`.

This module also supports HDF5's VLEN, OPAQUE, and REFERENCE types, which can be used to encode more complex types. In general, you need to specify how you want to combine these more advanced facilities to represent more complex data types. For many of the data types in Julia, the JLD module implements support. You can likewise define your own file format if, for example, you need to interact with some external program that has explicit formatting requirements.

Expand Down
54 changes: 43 additions & 11 deletions src/HDF5.jl
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,13 @@ cset(::Type{ASCIIChar}) = H5T_CSET_ASCII

hdf5_type_id(::Type{C}) where {C<:CharType} = H5T_C_S1

# global configuration for complex support
const COMPLEX_SUPPORT = Ref(true)
const COMPLEX_FIELD_NAMES = Ref(("r", "i"))
enable_complex_support() = COMPLEX_SUPPORT[] = true
disable_complex_support() = COMPLEX_SUPPORT[] = false
set_complex_field_names(real::AbstractString, imag::AbstractString) = COMPLEX_FIELD_NAMES[] = ((real, imag))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need these global setter functions?

What's the rationale for being able to enable/disable complex support? Same with set_complex_field_names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case a better way to deal with such global constants is to use Ref's with appropriate setters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it this way following what is done by h5py and for compatibility with existing practice/workflows. Since there is no actual hdf5 standard for complex numbers allowing the fieldnames to be set dynamically lets you interface more easily with existing conventions. For example I think that octave uses real/imag and I have also seen re/im. Also it's conceivable that somebody is storing a compound type with these fieldnames that doesn't represent a complex number. I'm not sure if this flexibility is necessary/worth it but I decided to follow the h5py precedent in allowing it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why they would want to do that, but even if they did couldn't they just change the field names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kleinhenz that makes sense

Can we please update all the global constants to use refs (it's precompile friendly, as opposed to globals)

const COMPLEX_SUPPORT = Ref(true) 
const COMPLEX_FIELD_NAMES = Ref(("r", "i"))

enable_complex_support() = COMPLEX_SUPPORT[] = true
disable_complex_support() = COMPLEX_SUPPORT[] = false
set_complex_field_names(real::AbstractString, imag::AbstractString) =  COMPLEX_FIELD_NAMES[] = ((real, imag))

And update everywhere else to use COMPLEX_SUPPORT[] instead of complex_support

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the rational of being able to disable this feature. It allows to to read the complex numbers in its "raw" representation, which otherwise would not be possible. It also does not really hurt making this optional.

Copy link
Member

@musm musm Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at h5py they don't offer a seperate way to enable/disable complex support.

I kinda agree with @benchislett that even if they would want to do that they could just change the field names.

I'm not 100% convinced which approach is best. I'm leaning towards no option to enable disable support since that seems a little overboard. OTOH we can always remove this non-user facing functions in the future if deemed overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are correct that h5py doesn't provide a mechanism for completely disabling complex support. I don't have a strong preference on this either. I lean towards including it just to be the least disruptive to existing workflows but I could see it the other way also.

One point in favor is that changing the field names will prevent you from reading data as complex when you don't want to but it can't prevent you from accidentally writing complex data in an unwanted format where previously it would have thrown an exception so I think including the enable/disable thing is somewhat safer for people with a preexisting way of handling complex data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that's convincing enough 👍

## HDF5 uses a plain integer to refer to each file, group, or
## dataset. These are wrapped into special types in order to allow
## method dispatch.
Expand Down Expand Up @@ -1175,6 +1182,16 @@ datatype(dset::HDF5Attribute) = HDF5Datatype(h5a_get_type(checkvalid(dset).id),
datatype(x::HDF5Scalar) = HDF5Datatype(hdf5_type_id(typeof(x)), false)
datatype(::Type{T}) where {T<:HDF5Scalar} = HDF5Datatype(hdf5_type_id(T), false)
datatype(A::AbstractArray{T}) where {T<:HDF5Scalar} = HDF5Datatype(hdf5_type_id(T), false)
function datatype(::Type{Complex{T}}) where {T<:HDF5Scalar}
COMPLEX_SUPPORT[] || error("complex support disabled. call HDF5.enable_complex_support() to enable")
dtype = h5t_create(H5T_COMPOUND, 2*sizeof(T))
h5t_insert(dtype, COMPLEX_FIELD_NAMES[][1], 0, hdf5_type_id(T))
h5t_insert(dtype, COMPLEX_FIELD_NAMES[][2], sizeof(T), hdf5_type_id(T))
return HDF5Datatype(dtype)
end
datatype(x::Complex{T}) where {T<:HDF5Scalar} = datatype(typeof(x))
datatype(A::AbstractArray{Complex{T}}) where {T<:HDF5Scalar} = datatype(eltype(A))

function datatype(str::String)
type_id = h5t_copy(hdf5_type_id(typeof(str)))
h5t_set_size(type_id, max(sizeof(str), 1))
Expand Down Expand Up @@ -1203,7 +1220,8 @@ dataspace(dset::HDF5Dataset) = HDF5Dataspace(h5d_get_space(checkvalid(dset).id))
dataspace(attr::HDF5Attribute) = HDF5Dataspace(h5a_get_space(checkvalid(attr).id))

# Create a dataspace from in-memory types
dataspace(x::T) where {T<:HDF5Scalar} = HDF5Dataspace(h5s_create(H5S_SCALAR))
dataspace(x::Union{T, Complex{T}}) where {T<:HDF5Scalar} = HDF5Dataspace(h5s_create(H5S_SCALAR))

function _dataspace(sz::Tuple{Vararg{Int}}, max_dims::Union{Dims, Tuple{}}=())
dims = Vector{Hsize}(undef,length(sz))
any_zero = false
Expand Down Expand Up @@ -1301,18 +1319,20 @@ function read(obj::DatasetOrAttribute)
read(obj, T)
end
# Read scalars
function read(obj::DatasetOrAttribute, ::Type{T}) where {T<:HDF5Scalar}
function read(obj::DatasetOrAttribute, ::Type{T}) where {T<:Union{HDF5Scalar, Complex{<:HDF5Scalar}}}
x = read(obj, Array{T})
x[1]
end
# Read array of scalars
function read(obj::DatasetOrAttribute, ::Type{Array{T}}) where {T<:HDF5Scalar}
function read(obj::DatasetOrAttribute, ::Type{Array{T}}) where {T<:Union{HDF5Scalar, Complex{<:HDF5Scalar}}}
if isnull(obj)
return T[]
end
dims = size(obj)
data = Array{T}(undef,dims)
readarray(obj, hdf5_type_id(T), data)
dtype = datatype(data)
readarray(obj, dtype.id, data)
close(dtype)
data
end
# Empty arrays
Expand Down Expand Up @@ -1700,7 +1720,7 @@ for (privatesym, fsym, ptype) in
obj, dtype
end
# Scalar types
($fsym)(parent::$ptype, name::String, data::Union{T, AbstractArray{T}}, plists...) where {T<:ScalarOrString} =
($fsym)(parent::$ptype, name::String, data::Union{T, AbstractArray{T}}, plists...) where {T<:Union{ScalarOrString, Complex{<:HDF5Scalar}}} =
($privatesym)(parent, name, data, plists...)
# VLEN types
($fsym)(parent::$ptype, name::String, data::HDF5Vlen{T}, plists...) where {T<:Union{HDF5Scalar,CharType}} =
Expand All @@ -1723,7 +1743,7 @@ for (privatesym, fsym, ptype, crsym) in
end
end
# Scalar types
($fsym)(parent::$ptype, name::String, data::Union{T, AbstractArray{T}}, plists...) where {T<:ScalarOrString} =
($fsym)(parent::$ptype, name::String, data::Union{T, AbstractArray{T}}, plists...) where {T<:Union{ScalarOrString, Complex{<:HDF5Scalar}}} =
($privatesym)(parent, name, data, plists...)
# VLEN types
($fsym)(parent::$ptype, name::String, data::HDF5Vlen{T}, plists...) where {T<:Union{HDF5Scalar,CharType}} =
Expand All @@ -1732,7 +1752,7 @@ for (privatesym, fsym, ptype, crsym) in
end
# Write to already-created objects
# Scalars
function write(obj::DatasetOrAttribute, x::Union{T, Array{T}}) where {T<:ScalarOrString}
function write(obj::DatasetOrAttribute, x::Union{T, Array{T}}) where {T<:Union{ScalarOrString, Complex{<:HDF5Scalar}}}
dtype = datatype(x)
try
writearray(obj, dtype.id, x)
Expand All @@ -1750,7 +1770,7 @@ function write(obj::DatasetOrAttribute, data::HDF5Vlen{T}) where {T<:Union{HDF5S
end
end
# For plain files and groups, let "write(obj, name, val)" mean "d_write"
write(parent::Union{HDF5File, HDF5Group}, name::String, data::Union{T, AbstractArray{T}}, plists...) where {T<:ScalarOrString} =
write(parent::Union{HDF5File, HDF5Group}, name::String, data::Union{T, AbstractArray{T}}, plists...) where {T<:Union{ScalarOrString, Complex{<:HDF5Scalar}}} =
d_write(parent, name, data, plists...)
# For datasets, "write(dset, name, val)" means "a_write"
write(parent::HDF5Dataset, name::String, data::Union{T, AbstractArray{T}}, plists...) where {T<:ScalarOrString} = a_write(parent, name, data, plists...)
Expand Down Expand Up @@ -1993,7 +2013,19 @@ function hdf5_to_julia_eltype(objtype)
T = HDF5Vlen{hdf5_to_julia_eltype(HDF5Datatype(super_id))}
elseif class_id == H5T_COMPOUND
N = Int(h5t_get_nmembers(objtype.id))
T = HDF5Compound{N}
# check if should be interpreted as complex
if COMPLEX_SUPPORT[] && N == 2
membernames = ntuple(N) do i
h5t_get_member_name(objtype.id, i-1)
end
membertypes = ntuple(N) do i
hdf5_to_julia_eltype(HDF5Datatype(h5t_get_member_type(objtype.id, i-1)))
end
iscomplex = (membernames == COMPLEX_FIELD_NAMES[]) && (membertypes[1] == membertypes[2]) && (membertypes[1] <: HDF5.HDF5Scalar)
T = iscomplex ? Complex{membertypes[1]} : HDF5Compound{N}
else
T = HDF5Compound{N}
end
elseif class_id == H5T_ARRAY
T = hdf5array(objtype)
else
Expand All @@ -2007,7 +2039,7 @@ end
# These supply default values where possible
# See also the "special handling" section below
h5a_write(attr_id::Hid, mem_type_id::Hid, buf::String) = h5a_write(attr_id, mem_type_id, unsafe_wrap(Vector{UInt8}, pointer(buf), ncodeunits(buf)))
function h5a_write(attr_id::Hid, mem_type_id::Hid, x::T) where {T<:HDF5Scalar}
function h5a_write(attr_id::Hid, mem_type_id::Hid, x::T) where {T<:Union{HDF5Scalar, Complex{<:HDF5Scalar}}}
tmp = Ref{T}(x)
h5a_write(attr_id, mem_type_id, tmp)
end
Expand All @@ -2034,7 +2066,7 @@ end
function h5d_write(dataset_id::Hid, memtype_id::Hid, str::String, xfer::Hid=H5P_DEFAULT)
ccall((:H5Dwrite, libhdf5), Herr, (Hid, Hid, Hid, Hid, Hid, Cstring), dataset_id, memtype_id, H5S_ALL, H5S_ALL, xfer, str)
end
function h5d_write(dataset_id::Hid, memtype_id::Hid, x::T, xfer::Hid=H5P_DEFAULT) where {T<:HDF5Scalar}
function h5d_write(dataset_id::Hid, memtype_id::Hid, x::T, xfer::Hid=H5P_DEFAULT) where {T<:Union{HDF5Scalar, Complex{<:HDF5Scalar}}}
tmp = Ref{T}(x)
h5d_write(dataset_id, memtype_id, H5S_ALL, H5S_ALL, xfer, tmp)
end
Expand Down
48 changes: 48 additions & 0 deletions test/plain.jl
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,54 @@ end

end # testset plain

@testset "complex" begin
HDF5.enable_complex_support()

fn = tempname()
f = h5open(fn, "w")

f["ComplexF64"] = 1.0 + 2.0im
attrs(f["ComplexF64"])["ComplexInt64"] = 1im

Acmplx = rand(ComplexF64, 3, 5)
write(f, "Acmplx64", convert(Matrix{ComplexF64}, Acmplx))
write(f, "Acmplx32", convert(Matrix{ComplexF32}, Acmplx))

HDF5.disable_complex_support()
@test_throws ErrorException f["_ComplexF64"] = 1.0 + 2.0im
@test_throws ErrorException write(f, "_Acmplx64", convert(Matrix{ComplexF64}, Acmplx))
@test_throws ErrorException write(f, "_Acmplx32", convert(Matrix{ComplexF32}, Acmplx))
HDF5.enable_complex_support()

close(f)

fr = h5open(fn)
z = read(fr, "ComplexF64")
@test z == 1.0 + 2.0im && isa(z, ComplexF64)
z_attrs = attrs(fr["ComplexF64"])
@test read(z_attrs["ComplexInt64"]) == 1im

Acmplx32 = read(fr, "Acmplx32")
@test convert(Matrix{ComplexF32}, Acmplx) == Acmplx32
@test eltype(Acmplx32) == ComplexF32
Acmplx64 = read(fr, "Acmplx64")
@test convert(Matrix{ComplexF64}, Acmplx) == Acmplx64
@test eltype(Acmplx64) == ComplexF64

HDF5.disable_complex_support()
z = read(fr, "ComplexF64")
@test isa(z, HDF5.HDF5Compound{2})

Acmplx32 = read(fr, "Acmplx32")
@test eltype(Acmplx32) == HDF5.HDF5Compound{2}
Acmplx64 = read(fr, "Acmplx64")
@test eltype(Acmplx64) == HDF5.HDF5Compound{2}

close(fr)

HDF5.enable_complex_support()
end

# test strings with null and undefined references
@testset "undefined and null" begin
fn = tempname()
Expand Down